Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Tuning NFS on Solaris 10, x4500 Thumper (timeouts)

Status
Not open for further replies.

forrie

MIS
Mar 6, 2009
91
US
My environment here consists of primarily RHEL NFS clients and a Sun Thumper (x4500). I'm trying to determine how to handle these recurring errors:

Code:
Mar  1 17:47:54 de-prod-app5 kernel: nfs: server de-prod-nas.de-gb.domain.com not responding, still trying
Mar  1 17:48:02 de-prod-app5 kernel: nfs: server de-prod-nas.de-gb.domain OK

I've read this is a common item.

My ncsize = 70554 and maxusers = 1022. v.v_proc is 16362.

I tuned in /etc/default/nfs

Code:
NFSD_LISTEN_BACKLOG=32
NFSD_PROTOCOL=ALL
NFSD_SERVERS=1024
LOCKD_LISTEN_BACKLOG=256
LOCKD_SERVERS=128
LOCKD_RETRANSMIT_TIMEOUT=5
GRACE_PERIOD=90
NFS_SERVER_VERSMAX=3
NFS_CLIENT_VERSMAX=3

There seems to be plenty of bandwidth on the server itself, tho the client side is reporting a lot of RPC problems:

Code:
# nfsstat -rc
Client rpc stats:
calls      retrans    authrefrsh
358112858   1179312    0

On the Sun/Solaris 10 server I see a lot of badcalls:

Code:
Server nfs:
calls     badcalls  
860862909 513567 
        
Version 3: (12014350 calls)
null        getattr     setattr     lookup      access      readlink    
64 0%       1960117 16% 535881 4%   304907 2%   380380 3%   324 0%      
read        write       create      mkdir       symlink     mknod       
4890148 40% 3468354 28% 151040 1%   1826 0%     0 0%        0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
70748 0%    2037 0%     1167 0%     0 0%        2406 0%     59759 0%    
fsstat      fsinfo      pathconf    commit      
140738 1%   117 0%      9 0%        44328 0%

Now, we typically have been using UDP, which I understand has it's drawbacks. To experiment, I took one of our application servers and tuned their NFS mounts to "tcp,intr,rw,bg,rsize=32768,wsize=32768". I'm still seeing the errors, but not as frequently.

"prtconf" reports:

Memory size: 65536 Megabytes

But that doesn't seem right, I'm sure this system was shipped with more RAM than that.

I'm trying to determine where the problem is -- or if it's a combination of tuning that's needed on both client and server.

I would be grateful for any advice and pointers. This Thumper is a NAS and this is it's primary function.

If there's more info needed, I will gladly provide.


Thanks!
 
A couple more bits of info:

Code:
# vmstat -s | grep cache
70347622717 total name lookups (cache hits 100%)

I don't appear to be near my configured thread limits:

Code:
# prstat -c -p `pgrep nfsd`
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 10472 daemon   2504K 1660K sleep   60  -20   0:00:15 0.0% nfsd/11
Total: 1 processes, 11 lwps, load averages: 0.18, 0.20, 0.18
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 10472 daemon   2504K 1660K sleep   60  -20   0:00:15 0.0% nfsd/11
Total: 1 processes, 11 lwps, load averages: 0.18, 0.20, 0.18
 
More info. Our /etc/sysconfig/nfs (RHEL) shows:

Code:
RPCNFSDCOUNT=8

I think this probably needs to be increased. Probably other variables in there; as some of our app servers are pretty busy.



 
is the df -k command hanging on client machine? could you send /var/adm/messages log for this nfs server?

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top