Backups are hanging everyday on the same savegroup

brtsmpsn · Jul 30, 2002

Hi all,

We are running Legato Networker 6.01 on Solaris 8 on a Sun E3000 machine and have a mix of NT and Unix Clients.

Here is the description of the problem.

We have a savegroup (22 clients, NT+ Unix) that starts at 10:30 p.m, which never completes; this prevents the rest of the scheduled backups from completing as well.

A (ps –ef | grep nsr ) the next morning shows all the clients in that savegroup are still queued for backup.

Networker seems to freeze, I cannot get the console i.e nwadmin to execute and I cannot even run simple commands like nsrjb.

The only way around it is to run a kill –9 pid (for client nsrexec or on savegrp). After running this kill on a number of pid’s, it seems to unfreeze Networker and I have my nwadmin console again, the backups resume, however unsatisfactorily.

The other and only option, as suggested by the 110K/year support vendor we have, is to run nsr_shutdown and manually restart networker. This screws up the backups for the day and we have tried this twice already with the same result on the next days backup as well.

Strangely enough the daemon.log and the messages file have last entries at 10:30 p.m. and no more. So there are no error messages to debug this problem.

If any one has come across this problem and has solved it can you please tell me what I need to do fix this?

Truly yours,

A desperate Sysadmin.

ashingal · Jul 30, 2002

Have you tried to change the timing of this Savegroup? Unix System Administrator
GE Medical Systems

ulsj · Jul 30, 2002

Hi!

I have seen simular problems on NT platforms. where the nsr.res has been corrupted. The console freezes and every thing takes forever.
If you can try renaming nsr.res and startup networker again.
you will then have default settings for networker. And then add some clients and try backup. If this works you may have a bad nsr.res file.

/Ulf

jgarmer · Aug 2, 2002

Try upgrading to 6.11 first
Then fix you indexes nsrck -L6
Then make sure no nwadin is sitting with a bug message wait for so one to hit the okay button
also verify that all your clients are not auto-negotitate

joe

JimTaylor · Aug 2, 2002

See what the last touch of nsr/mm/nsrck-prv is? ls -la on file. Is the associated time near the the start of group that fails??? It could that the Legato networker "nsrck -MX" resource hog is sucking up all resources and not allowing your backups to complete. I found this process to be a server stopper. It kicks off a literally thousands of processes by trying to check all your indexes at one time.

6.1.2 experience and needed workaround
---------------------------------------
I noticed with an upgrade to 6.1.2, nsrd could be a resource hog and at times would bog the system down to no end. I found the speciifc issue to be the automated running of "nsrck -MX" by Legato networker. Specifically, I determined that if "nsrck -MX is running" and backups or other processes are started the system will grind to a halt. The solution is controlling the runnning of "nsrck -MX" to times when system is idel.

From what I have been able to determine, "nsrck -MX" runs following savesets if the last touch of /nsr/mm/nsrim.prv is more than 24 hours. Ours happen to be running just after our first backups heading into the evening and could impact all the backups that followed. Now, we don't allow it run on during full backups on the weekends and have it run after all backups on weekdays. To accomplish this the following script runs on SAT. twice, once at 7am and again at 9am.

#!/bin/sh
#
# nsrck_skip.sh
#
# This script touches the /nsr/mm/snsrim.prv file
#
# This script is run on weekedns to avoid running nsrck -MX.
#
##############################################################################
#*******************************************************************************
# Modification History:
# Date By Description
# ======= ================== =================================================
# 01Dec01 Jim Taylor Initial script.
#*******************************************************************************

touch /nsr/mm/nsrim.prv
echo "done"

# End of script

My support vendor, Datalink provided following info and indicated the process "nsrck -MX" is not a bug and needs to run periodically:

Solution Title: Purpose of nsrck -MX
Solution ID: legato8902

Here is the solution:
The "M" means it's running in Master mode. In other words, it's been called by nsrd or another NetWorker daemon and it logs information into the daemon.log

The "X" is equivalent to -L3. It cross checks the index entries with the
media database and compresses and deletes redundant records, thus keeping the index sizes down.
Moreover, the nsrim.prv file present under /nsr/mm directory is used to determine if an index check is done or not. nsrd checks this file occasionally to see if it is over 24 hrs old. If so, a check is performed.

Barks · Aug 15, 2002

Desperate Sysadmin,
Is it possible you have a stale NFS mount on one of the clients that is in the savegrp?
Kent

brtsmpsn · Aug 20, 2002

I have changed the timing by touching the file /nsr/mm/nsrim.prv so that nsrck -Mx does not interfere with my backup schedule. I have also moved around the schedules of different savesets. I have been observing my backups for the past 3 weeks and it has not failed yet.

Appreciate all your suggestions, especially Jim Taylor for taking the timing to give such a detailed explanation.

Two cheers to the forum and Tek-Tips.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Backups are hanging everyday on the same savegroup

brtsmpsn

Technical User

ashingal

Technical User

ulsj

Technical User

jgarmer

MIS

JimTaylor

IS-IT--Management

Barks

Technical User

brtsmpsn

Technical User

Similar threads

Part and Inventory Search

Sponsor