Accessing Nextcloud’s web UI, using http, seems reasonably fast but accessing the webdav interface is painfully slow. Like, it doesn’t even seem to work at all slow.
What’s up with that?
Let’s embark in a journey on how to profile Nextcloud.
Did anyone stumbled onto this issue already?
Searching for slow Nextcloud webdav brought up this post. It details what the OP tried to fix their issue but with a not satisfying conclusion. Anyway, they provide very useful curl commands to test timing:
Http timing:
time curl \
-H 'Cookie:$SESSION' \
'https://$MYINSTANCE/apps/dashboard/#/'
$MYINSTANCE
would for example be nextcloud.domain.com
, which I’m going to use from now on.
Webdav timing:
time curl \
-X 'PROPFIND' \
-H 'Depth: 1' \
-u '$USER:$PASSWORD' \
'https://$MYINSTANCE/remote.php/dav/files/$USER/'
I advice instead to leave out the $PASSWORD
as curl will ask for it. Otherwise, the password will
appear at least in your shell history.
Try to replicate
I tweaked the above commands by adding the -I
flag so I could see the headers but not the
response:
time curl -I ...
Http timing is not bad:
$
real 0m0.464s
user 0m0.086s
sys 0m0.012s
Webdav timing is abysmal:
real 0m26.931s
user 0m0.095s
sys 0m0.011s
Is the issue in the Nextcloud code or elsewhere?
I need to know first if the issue lies in the Nextcloud php code, in the phpfpm server serving the php code or in the nginx proxy that’s serving Nextcloud publicly.
To know that, I enabled the access log in nginx with the following snippet. The convoluted string actually produces a valid json format which makes it super easy to investigate.
"stderr warn";
services.nginx.logError = ''
services.nginx.appendHttpConfig = log_format apm
'{'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"time_local":"$time_local",'
'"request":"$request",'
'"request_length":"$request_length",'
'"server_name":"$server_name",'
'"status":"$status",'
'"bytes_sent":"$bytes_sent",'
'"body_bytes_sent":"$body_bytes_sent",'
'"referrer":"$http_referrer",'
'"user_agent":"$http_user_agent",'
'"gzip_ration":"$gzip_ratio",'
'"post":"$request_body",'
'"upstream_addr":"$upstream_addr",'
'"upstream_status":"$upstream_status",'
'"request_time":"$request_time",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_connect_time":"$upstream_connect_time",'
'"upstream_header_time":"$upstream_header_time"'
'}';
access_log syslog:server=unix:/dev/log apm;
'';
Take a look in the nginx
manual to see what the
upstream_*
variables mean.
This is what I see for the curl
request above. I formatted the json output for easier reading.
Sep 17 21:08:45 myserver nginx[287808]: myserver nginx:
{
"remote_addr": "$ADDR",
"remote_user": "$USER",
"time_local": "17/Sep/2023:21:08:45 +0000",
"request": "PROPFIND https://nextcloud.domain.com/remote.php/dav/files/$USER/ HTTP/2.0",
"request_length": "185",
"server_name": "$INSTANCE",
"status": "207",
"bytes_sent": "1401",
"body_bytes_sent": "633",
"referrer": "-",
"user_agent": "curl/8.2.1",
"gzip_ration": "-",
"post": "-",
"upstream_addr": "unix:/run/phpfpm/nextcloud.sock",
"upstream_status": "207",
"request_time": "26.058",
"upstream_response_time": "26.057",
"upstream_connect_time": "0.000",
"upstream_header_time": "26.057"
}
This indicates the problem lies in the Nextcloud code as the time spent in the “upstream” part is the same as the total request time.
Let’s profile Nextcloud
We need to enable the xdebug extension and add some lines to the php.ini config. In NixOS, that’s done quite easily by adding:
{
services.nextcloud = # Disable the minifier and outputs some additional
# debug information.
extraOptions = {
"debug" = true;
"filelocking.debug" = true;
};
# Enable profiling xdebug mode and save file only
# if the trigger_value is given in the request.
phpOptions = {
"xdebug.mode" = "profile";
"xdebug.trigger_value" = "debug_me";
"xdebug.output_dir" = "/var/log/xdebug";
"xdebug.start_with_request" = "trigger";
};
# Adds xdebug extension
phpExtraExtensions = all: [ all.xdebug ];
};
# Create output folder with correct permission.
-nextcloud.preStart = ''
systemd.services.phpfpm mkdir -p /var/log/xdebug
chown -R nextcloud: /var/log/xdebug
'';
This will create a cachegrind.out.XXXXXX
file under the /var/log/xdebug
directory for each
request having the cookie XDEBUG_PROFILE=debug_me
set.
You could leave out the start_with_request
option but I wouldn’t advice doing that as normal usage
of the Nextcloud instance will also produce a lot of cachegrind.out.XXXXXX
files which are just
noise.
Anyway, the full curl
request to time webdav becomes:
time curl -I -s \
-X 'PROPFIND' \
-H 'Depth: 1' \
-u '$USER' \
--cookie 'XDEBUG_PROFILE=debug_me' \
'https://$MYINSTANCE/remote.php/dav/files/$USER/'
This is the full output of the response:
Enter host password for user '$USER':
HTTP/2 207
server: nginx
date: Sun, 17 Sep 2023 21:08:45 GMT
content-type: application/xml; charset=utf-8
x-xdebug-profile-filename: /var/log/xdebug/cachegrind.out.552470
set-cookie: oc_sessionPassphrase=XYZ; path=/; secure; HttpOnly; SameSite=Lax
content-security-policy: default-src 'none';
expires: Thu, 19 Nov 1981 08:52:00 GMT
cache-control: no-store, no-cache, must-revalidate
pragma: no-cache
set-cookie: occXYZ=XYZ; path=/; secure; HttpOnly; SameSite=Lax
set-cookie: cookie_test=test; expires=Sun, 17 Sep 2023 22:08:44 GMT; Max-Age=3600
vary: Brief,Prefer
dav: 1, 3, extended-mkcol, access-control, calendarserver-principal-property-search, nextcloud-checksum-update, nc-calendar-search, nc-enable-birthday-calendar
x-request-id: 9Yjd8BK6bGf92s7qXpWW
x-debug-token: 9Yjd8BK6bGf92s7qXpWW
strict-transport-security: max-age=31536000; includeSubDomains
real 0m28.410s
user 0m0.050s
sys 0m0.007s
Again that abysmal execution time. Here, we’re interested in the x-xdebug-profile-filename
response header which shows us the corresponding profiling file.
I will open that file in KCacheGrind. I downloaded that file on my laptop then started
KCacheGrind with nix run nixpkgs#kcachegrind
.
Here is what I saw, and my jaw dropped (open image in a new tab to be able to zoom):
I’m not a KCacheGrind expert but let me give you a primer. Each line shows:
- a function call in the
Function
column, - how many times that function got called in the
Called
column, - how long was spent somewhere inside that function and any of its descendants in the
Incl
column, - and how long was spent somewhere inside that function and none of its descendants in the
Self
column.
Here we can see the overwhelming majority of the time is spent inside 1 call to the php::usleep
function which is called once by the OC\Security\Bruteforce\Throttler->getDelay
function.
All the slowness was thus due to Nextcloud’s bruteforce avoidance feature. Somehow it thought someone was trying to bruteforce access to the server. But why did it activate at all as I’m the only user?
The issue was seemingly unrelated
Looking back in the logs I could see that the remote_addr
of the request was pointing to my router
instance and not to the user’s real IP address.
Thanks to @bb77 over in the Nextcloud forum, I learned my issue was due to NAT hairpinning. This allows, I quote wikipedia, “a machine on the LAN is able to access another machine on the LAN via the external IP address of the LAN/router”.
This is all great but this means the Nextcloud server was seeing all internal requests as coming from the same IP address, the one of my router, leading to more bruteforce mitigation than necessary.
I circumvented by using split DNS by adding nextcloud.domain.com
to my internal dns server with:
{
services.dnsmasq = extraConfig = ''
address=/$MYISNTANCE/192.168.1.10
'';
Where 192.168.1.10
is the internal IP of the server hosting Nextcloud.
Now, the webdav request finishes in 8.188s
and the remote_addr
shows correctly the one from my
laptop. It’s still slow and more profiling showed the issue lies now in ldap calls. But I’ll leave
that for a later blog post.