Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Networker automatic and manual cloning failing

Status
Not open for further replies.

ee01akk

MIS
Dec 3, 2008
19
0
0
GB
Hi,


Networker Server (NetWorker Management Console version 3.5.2.Build.477 based on NetWorker version 7.5.2.Build.477) is fibre attached through a SAN switch to an Clariion CX3-40f and a Quantum Scalar i2000 tape library. All backups are directed through the Networker Server (storage nodes have been disabled for now) directly to the EDL, and then cloned off to tape.


We have been experiencing a problem with scheduled and manual cloning. All backups to the EDL complete successfully, however daily (incremental) and weekly (full)clones of all savesets created are not all getting completed. Further investigation shows that the clones of the daily/ weekly backup server (i.e. just the Networker server itself) seem to complete successfully, however almost all the other client backups (50+) failed to clone to tape. Each day we will see different client clones working, some not and there is no pattern to the failures.


There have been no changes on the server that we are aware of aside from an upgrade of Networker (as suggested by EMC support) which has not helped in resolving this problem (we recently upgraded from 7.4.4 to 7.5.2.2). Before this problem happened we had to delete and rebuild one of the EDL jukeboxes due to a drive ordering issue however this was rebuilt successfully with no issues. Manual cloning via GUI (NMC> Media> Savesets created in last night's backups shows "browsable" or "recoverable") seems to fail, however when we run the clone I can see the EDL volume getting mounted into a drive along with a physical tape in the Scalar i2k. Both tapes will sit there for about 2-3 minutes (this is correct according to the jukebox timers) then unload and the operation times out/ fails. A clone by command line of a saveset shows this in more detail:


E:\Program Files\Legato\nsr\bin>nsrclone -vvv -b "Weekly Clone" -S 4276408269
6215:nsrclone: Cloning the following save sets (ssid/cloneid):
b9ba1066-00000006-fee4cfcd-4be4cfcd-12030000-0ac90808/1273286605
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
6217:nsrclone: ...from storage node: ukhubedl01.plan-int.org
39078:nsrclone: RPC error: Server can't decode arguments

5777:nsrclone: Cannot open nsrclone session with plukhubmgt07.plan-int.org
6218:nsrclone: Cannot open nsrclone session with plukhubmgt07.plan-int.org. Erro
r is 'Server can't decode arguments'

5882:nsrclone: Failed to clone any save sets

E:\Program Files\Legato\nsr\bin>


However when the tapes are pre-mounted first and the above command is run again (or if I run the same command after the operation has failed the first time) it works without any problem:


E:\Program Files\Legato\nsr\bin>nsrclone -vvv -b "Weekly Clone" -S 4276408269


The tape status under Media> Savesets changes to "browsable has clones". I have tested this for a number of different savesets for different clients, all with the same results.
6215:nsrclone: Cloning the following save sets (ssid/cloneid):
b9ba1066-00000006-fee4cfcd-4be4cfcd-12030000-0ac90808/1273286605
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
5884:nsrclone: Successfully cloned all requested save sets
5886:nsrclone: Clones were written to the following volume(s):
000152


Analysis:

- testing seems to indicate that:

- if the tapes required for reading or writing is not ready, n/w give the error:

39078:nsrclone: RPC error: Server can't decode arguments

- if the tapes are preloaded and ready, then the clone completes without problems


- setting the idle device timeout=0 , and increasing the load sleep=2 minutes has no impact on problem (we have tried with various other load/ unload/ idle timeout sleep parameters to no effect).

- we have checked the hosts files on the Networker Server and this contains an entry for the Networker server itself and the EDL, and vice versa the EDL host file (/etc/hosts) contains an entry for the Networker server.

We had this problem on Networker server version 7.4.4. We have tried various things like ensuring the consistency of the Networker catalog:


nsrim -X

nsrck -m

nsrck -L6


We have also stopped the services, renamed the "tmp" folder and restarted them, performed more integrity checks and we still have the same issue. We have checked all the flags are ok for the savesets using "mminfo -avq -S". EMC suggested a corruption of the binaries, which have now all been replaced with the upgrade to 7.5 SP2 and we still have the same problem.


Can you suggest a cause for this problem? I'm thinking it's a timeout issue but I would have thought the binaries would be the cause??

We have tried running a manual clone with debug level 9 and have seen the following errors:

39078:nsrclone: RPC error: RPC server is unavailable (severity 5, number 10)

39078:nsrclone: RPC error: Server can't decode arguments (severity 5, number 11)


5777:nsrclone: Cannot open nsrclone session with plukhubmgt07
6218:nsrclone: Cannot open nsrclone session with plukhubmgt07. Error is 'Server
can't decode arguments'

gen_clone_result_cur_sn: ENTER
5882:nsrclone: Failed to clone any save sets
gen_clone_result_cur_sn: EXIT
alldone(): ENTER
session id: 1, User defined message not understood.
Message type: 170, request type: 0
session id: 1, User defined message not understood.
Message type: 170, request type: 0
session id: 1, User defined message not understood.
Message type: 170, request type: 0
JOBATTR_SUCCESS_SS:
JOBATTR_FAILED_SS:
failed savesets: 4192810157;
JOBATTR_CLONED_VOLUMES:
JOBATTR_COMPLETION_STATUS:
JOBATTR_COMPLETION_SEVERITY:
alldone(): EXIT


Thanks.
 
It just looks like the media is loaded but for whatever reason not mounted.

It would be worth to check where the media will not be mounted. I suppose the problem is the source media in the EDL, because otherwise NW would just ask for (another) media from the clone pool. Could be that there is a defect with the information interchange.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top