nwrecover hangs when recovering large # files.

nsew45 · Sep 25, 2001

I have a problem with recovery hanging when trying to recover a large number of files. I successfully backed up 12 directories with several hundred thousand files. When I went to recover a single directory with 9.7 Gb and 214K files, nwrecover just hangs when I "Start the restore". I am running Solaris 8 at the current recommended patch level and Legato 6.1 Build 186. It looks as if the process that reads the index just runs forever. I have nsrck -L6 all my indexes. I have a case open with Legato, they don't have an answer yet.

Anybody else with the same problem? Fix?

esserc · Sep 25, 2001

Do you see the server waiting for a tape even after it's loaded? And the device status id "Ready for reading, Idle"?
If so, this is bug LGTpa28133, and it has a fix via Legato. If you have Sun support, you may be out of luck.

nsew45 · Sep 26, 2001

I have support from Legato.

No, it is not waiting for a tape. I went to the backup server and did a truss on the nsr process that seemed to be hanging. It appeared to be reading the index or extracting the list of files to restore. I let it run for an hour and it never completed. I have done a nsrck -L7 on the index for the clients multiple times. I can drill down and recover a single file or two and everything works fine. If I select the entire directory structure with a lot of files and subdirectories, it just hangs.
The saveset was run from the server. I can do a save set recovery and all works fine because it is not going through the index.

I'm stumped.

esserc · Sep 26, 2001

Hmmm...
Try doing an nsrck -L6 rather that a -L7 this will crosscheck the indexes rather than recover them off tape.

Also, How big is the index for this client? and how many files in the /nsr/index/<host>/db6 directory? I have a server that has over 200,000 savesets in it's index and it can take awhile to recover a subset of files.

What type of hardware is your server, # of proccesors and RAM?

When it's reading the indexes what is the I/O wait on that disk and what is the CPU utilization on the nsrindexd process that is serving the request?

Are you swaping? It could be a simple as you don't have enough RAM.

Try touching /nsr/debug/noimmediate and restarting networker. This disables the use of shared memory.

nsew45 · Sep 26, 2001

I have done the L6. No change.

488 files in the db6 dir.

My Legato backup server is a Sun E220R with 2Gb RAM and two SParcII 450's. I am running Solaris 8 with the latest recommended patches. The nsr process is only using about 400 Mb RAM. I have 1.6 Gb free. No IO wait on the disk. Some page in swapping when it originally starts but then goes to 0 after a few seconds. CPU utilization is maybe 25-30% but tapers off to less than 10 after a few seconds.

I have lots of available disk and swap. Server or client is barely breathing hard. Server and client are connected to a Gb Ethernet that just runs between the one server and six clients. No bottleneck there.

I have sent all my indexes to Legato. They have not found anything wrong with them.

????

nsew45 · Oct 16, 2001

After six weeks of working with support, I just conducted a test running nwrecover on moe-g. I selected /u01 231409 files 9932MB, /u02 136127 files 7687MB and /u03 134922 files 6497MB. Performance was acceptable. The system created the worklist and began to read from the tape in less than 3 minutes.

It would appear that application of the patches 31270 and 31689 on the server and all clients and/or the repair of the mmvolume6 index resolved the issue.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

nwrecover hangs when recovering large # files.

nsew45

MIS

esserc

Technical User

nsew45

MIS

esserc

Technical User

nsew45

MIS

nsew45

MIS

Similar threads

Part and Inventory Search

Sponsor