Tuning NFS on Solaris 10, x4500 Thumper (timeouts)

forrie · Mar 1, 2011

My environment here consists of primarily RHEL NFS clients and a Sun Thumper (x4500). I'm trying to determine how to handle these recurring errors:

Code:

Mar  1 17:47:54 de-prod-app5 kernel: nfs: server de-prod-nas.de-gb.domain.com not responding, still trying
Mar  1 17:48:02 de-prod-app5 kernel: nfs: server de-prod-nas.de-gb.domain OK

I've read this is a common item.

My ncsize = 70554 and maxusers = 1022. v.v_proc is 16362.

I tuned in /etc/default/nfs

Code:

NFSD_LISTEN_BACKLOG=32
NFSD_PROTOCOL=ALL
NFSD_SERVERS=1024
LOCKD_LISTEN_BACKLOG=256
LOCKD_SERVERS=128
LOCKD_RETRANSMIT_TIMEOUT=5
GRACE_PERIOD=90
NFS_SERVER_VERSMAX=3
NFS_CLIENT_VERSMAX=3

There seems to be plenty of bandwidth on the server itself, tho the client side is reporting a lot of RPC problems:

Code:

# nfsstat -rc
Client rpc stats:
calls      retrans    authrefrsh
358112858   1179312    0

On the Sun/Solaris 10 server I see a lot of badcalls:

Code:

Server nfs:
calls     badcalls  
860862909 513567 
        
Version 3: (12014350 calls)
null        getattr     setattr     lookup      access      readlink    
64 0%       1960117 16% 535881 4%   304907 2%   380380 3%   324 0%      
read        write       create      mkdir       symlink     mknod       
4890148 40% 3468354 28% 151040 1%   1826 0%     0 0%        0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
70748 0%    2037 0%     1167 0%     0 0%        2406 0%     59759 0%    
fsstat      fsinfo      pathconf    commit      
140738 1%   117 0%      9 0%        44328 0%

Now, we typically have been using UDP, which I understand has it's drawbacks. To experiment, I took one of our application servers and tuned their NFS mounts to "tcp,intr,rw,bg,rsize=32768,wsize=32768". I'm still seeing the errors, but not as frequently.

"prtconf" reports:

Memory size: 65536 Megabytes

But that doesn't seem right, I'm sure this system was shipped with more RAM than that.

I'm trying to determine where the problem is -- or if it's a combination of tuning that's needed on both client and server.

I would be grateful for any advice and pointers. This Thumper is a NAS and this is it's primary function.

If there's more info needed, I will gladly provide.

Thanks!

forrie · Mar 1, 2011

A couple more bits of info:

Code:

# vmstat -s | grep cache
70347622717 total name lookups (cache hits 100%)

I don't appear to be near my configured thread limits:

Code:

# prstat -c -p `pgrep nfsd`
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 10472 daemon   2504K 1660K sleep   60  -20   0:00:15 0.0% nfsd/11
Total: 1 processes, 11 lwps, load averages: 0.18, 0.20, 0.18
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 10472 daemon   2504K 1660K sleep   60  -20   0:00:15 0.0% nfsd/11
Total: 1 processes, 11 lwps, load averages: 0.18, 0.20, 0.18

forrie · Mar 2, 2011

More info. Our /etc/sysconfig/nfs (RHEL) shows:

Code:

RPCNFSDCOUNT=8

I think this probably needs to be increased. Probably other variables in there; as some of our app servers are pretty busy.

solaries · Mar 21, 2011

is the df -k command hanging on client machine? could you send /var/adm/messages log for this nfs server?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Tuning NFS on Solaris 10, x4500 Thumper (timeouts)

forrie

MIS

forrie

MIS

forrie

MIS

solaries

Technical User

Similar threads

Part and Inventory Search

Sponsor