[plug] NFSv4 issues

Benjamin zorlin at gmail.com
Tue Feb 22 09:54:39 AWST 2022


As the adage goes, "It's not DNS, there's no way it's DNS..."

Easy enough to test, swap out the hostname for the direct IP and see if you
still have the issue :)

On Tue, Feb 22, 2022 at 9:39 AM Bill Kenworthy <billk at iinet.net.au> wrote:

> A long shot ... dns issues? I've seen a similar access pattern on a
> master/slave disk when one went flakey.
>
> BillK
>
>
> On 22 February 2022 9:19:53 am AWST, Brad Campbell <brad at fnarfbargle.com>
> wrote:
>>
>> G'day all,
>>
>> I have a relatively simple client/server system here with a central
>> server that exports a pile of stuff using NFSv4. No authentication.
>>
>> This has been flawless since I upgraded from nfs v3 to v4 8-10 years ago.
>> After a "recentish" kernel update on the server, I've started to get
>> intermittent and hard to reproduce timeouts on the clients. No problem
>> on the server and all other clients remain responsive.
>>
>> [ 2901.432422] nfs: server srv not responding, still trying
>> [ 2901.432423] nfs: server srv not responding, still trying
>> [ 2901.592410] nfs: server srv not responding, still trying
>> [ 2901.952426] nfs: server srv not responding, still trying
>> [ 2902.392426] nfs: server srv not responding, still trying
>> [ 2902.392432] nfs: server srv not responding, still trying
>> [ 2903.402412] nfs: server srv not responding, still trying
>> [ 2903.622411] nfs: server srv not responding, still trying
>> [ 2903.892413] nfs: server srv not responding, still trying
>> [ 2931.012132] nfs: server srv OK
>> [ 2931.012147] nfs: server srv OK
>> [ 2931.012220] nfs: server srv OK
>> [ 2931.012237] nfs: server srv OK
>> [ 2931.012243] nfs: server srv OK
>> [ 2931.012255] nfs: server srv OK
>> [ 2931.012285] nfs: server srv OK
>> [ 2931.012889] nfs: server srv OK
>> [ 2931.036638] nfs: server srv OK
>> [ 3129.162392] nfs: server srv not responding, still trying
>> [ 3129.162399] nfs: server srv not responding, still trying
>> [ 3129.702387] nfs: server srv not responding, still trying
>> [ 3130.262377] nfs: server srv not responding, still trying
>> [ 3130.412397] nfs: server srv not responding, still trying
>> [ 3130.482477] nfs: server srv not responding, still trying
>> [ 3130.912386] nfs: server srv not responding, still trying
>> [ 3130.912392] nfs: server srv not responding, still trying
>> [ 3131.412397] nfs: server srv not responding, still trying
>> [ 3131.912392] nfs: server srv not responding, still trying
>> [ 3157.574579] nfs: server srv OK
>> [ 3157.574654] nfs: server srv OK
>> [ 3157.574658] nfs: server srv OK
>> [ 3157.575214] nfs: server srv OK
>> [ 3157.575487] nfs: server srv OK
>> [ 3157.575496] nfs: server srv OK
>> [ 3157.575501] nfs: server srv OK
>> [ 3157.575977] nfs: server srv OK
>> [ 3157.631782] nfs: server srv OK
>> [ 3157.652340] nfs: server srv OK
>> [ 3176.012394] rpc_check_timeout: 1 callbacks suppressed
>> [ 3176.012407] nfs: server srv not responding, still trying
>> [ 3176.922393] nfs: server srv not responding, still trying
>> [ 3177.992389] nfs: server srv not responding, still trying
>> [ 3177.992393] nfs: server srv not responding, still trying
>> [ 3178.052380] nfs: server srv not responding, still trying
>> [ 3178.422382] nfs: server srv not responding, still trying
>> [ 3179.202386] nfs: server srv not responding, still trying
>> [ 3182.622375] nfs: server srv not responding, still trying
>> [ 3183.812376] nfs: server srv not responding, still trying
>> [ 3188.052371] nfs: server srv not responding, still trying
>> [ 3204.945036] call_decode: 1 callbacks suppressed
>> [ 3204.945051] nfs: server srv OK
>> [ 3204.945063] nfs: server srv OK
>> [ 3204.945176] nfs: server srv OK
>> [ 3204.945208] nfs: server srv OK
>> [ 3204.945224] nfs: server srv OK
>> [ 3204.945229] nfs: server srv OK
>> [ 3204.946453] nfs: server srv OK
>> [ 3205.035067] nfs: server srv OK
>> [ 3205.041453] nfs: server srv OK
>> [ 3205.048524] nfs: server srv OK
>>
>> I do see this on the server when it happens :
>> [285997.760395] rpc-srv/tcp: nfsd: sent 509476 when sending 524392 bytes
>> - shutting down socket
>> [286884.809688] rpc-srv/tcp: nfsd: sent 131768 when sending 266344 bytes
>> - shutting down socket
>>
>> So I know it's likely to be a network issue of some kind, but as it
>> happens on a VM on the same server it's not NIC related. There's no
>> firewall rules involved.
>>
>> This happens to all clients having tried :
>> - A kvm VM on the server.
>> - My desktop
>> - My laptop
>> - A raspberry pi v4
>>
>> The fault manifests with that particular client freezing all NFS I/O for
>> ~10 minutes (the log example above was after mounting with -o timeo=10).
>>
>> All using different kernels. I think it started sometime after Kernel
>> 5.10.44 on the server, but I had a lot going on and my notes are "sparse".
>>
>> Much reading intimates that a request from the client gets lost, and
>> things lock up until the client hits the timeout value and re-sends the
>> request. That is backed up by changing the mount timeout value.
>>
>> My real problem is to reproduce it I need to move a significant amount
>> of traffic over the NFS connection, and that makes a packet trace using
>> tcpdump "a bit noisy".
>>
>> If it were udp then I could understand a request going astray, but as
>> it's tcp I can only think it requires a connection drop/reconnect to do
>> that and I've not been able to capture one yet in a usable packet trace.
>>
>> I'm building a test box to attempt to replicate it in a way I can
>> bisect, but as it can take an hour or to to manifest that's going to be
>> a very slow burn if I can reproduce it on the test hardware.
>>
>> Unfortunately the server is a production machine, so I'm looking for
>> ideas on how one might debug it. Yes, I've searched, but it's not common
>> and there's potentially many causes. Any nfs gurus here?
>>
>> Regards,
>> --
>> An expert is a person who has found out by his own painful
>> experience all the mistakes that one can make in a very
>> narrow field. - Niels Bohr
>> ------------------------------
>> PLUG discussion list: plug at plug.org.au
>> http://lists.plug.org.au/mailman/listinfo/plug
>> Committee e-mail: committee at plug.org.au
>> PLUG Membership: http://www.plug.org.au/membership
>>
>> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20220222/10198a74/attachment.html>


More information about the plug mailing list