Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Large DB and striping question

Status
Not open for further replies.

Rakerr

Technical User
Oct 13, 2004
31
0
0
US
Has anyone had any problem with running multiple streams (4) on large SQL DB's?

I have a 4.7TB SQL db that i have yet to get to stream using legato.... used SQL native and it worked fine.

I am dying here....

Help me Obi'one, you are my only hope. :)

FRB
 
This looks weird, as NW more or less just asks the DB via the standard inface to "send data". NW is not generating it.

It is important to look at the scenario - running 4 streams in parallel, each to a DIFFERENT device, requires a pretty good hardware (depending on your drive speed, of course). It is obvious that all devices must be kept in streaming mode for optimal throughput.

However, if all streams go to only one device, it is most likey not a hardware problem. As the streams will be interleaved there should be no problem to maintain the speed.

I do not know but are you sure that native SQL supports a backup to multiple devices? - If not, then in fact you compare pears with apples.
 
I have a Storagetek L40 library with 4 LTO Ultrium2 drives
Each drive goes to its own SCSI channel on the server (Unisys ES7000 64bit)
NW backed up 4.5TB of the 4.7tb then failed at the end of the job.
I was averaging about 40-50mb/sec on each drive and it took NW 12 hours to backup the 4.5 (granted server load has a lot to do with it).
SQL native took 6 hours to backup the same data with 4 streams (Speed was probably due to user load again)

Since NW uses the Microsoft API just like SQL does why would it be different?

system landscape:
Unisys ES7000 64bit (Clustered)
MS 2003 Datacenter
SQL entr 64 fully SP'd

I can backup small DB's on the same system with 4 streams no problem. It is just my production DB that fails.

Was wondering if there was some issues with the VDI coding on the NW side as it pertains to LArge DB's.

Thanks for the reply :)

Frank



 
With respect to the failing backups, I do not expect a general problem. However this needs to be carefully investigated via the log files.

Some numbers are strange:

4.5 TB = 4608 GB = 4718592 MB

4718592MB/(4x45MB/s) = 26214s = 3600s = 7.3h

If your numbers are correct, the rest of the time (about 40%) must have spent on repositiong and/or error correction.
This is weird, especially as NW does not use specific device drivers.

If you have such free space on disk somewhere (or just a fraction), i would love to see how fast NW would operate backing up to a file type device. This would also show that the API is capable to deliver the speed.
 
Granted the average speed was only at the times that I checked it...I did not sit in front of the console for 12 hrs watching it...

I did see some 20mb/s rates also....

We are using a Hitachi 9900 with 10TB of space

I did see bursts of 90-100mb/susing SQL native
I think 80mb/s was the highest I noticed with NW

Thanks,

Frank
 
How are your savesets defined in te client definition and are you sure you are getting at least 4 streams running at once (look in the monitor window of Networker Administrator) for this client, all going to the same device?
Also, you say it failed after 4.5 Tb. Define failed please. Did it run out of time in the backup window or did you get an error message, if so, what?
 
Yes I am sure it is writing to all 4 drives at once.

Here is an exerp from the deamon.log

10/03/04 18:55:24 nsrd: erpsql2:MSSQL:pRD done saving to pool 'Default' (000011) 1201 GB
10/03/04 18:55:24 nsrd: erpsql2:MSSQL:pRD done saving to pool 'Default' (000033) 1195 GB
10/03/04 18:55:24 nsrd: erpsql2:MSSQL:pRD done saving to pool 'Default' (000012) 1248 GB
10/03/04 18:55:25 nsrd: erpsql2:MSSQL:pRD done saving to pool 'Default' (000010) 1145 GB
10/03/04 18:55:35 nsrd: erpdb1:index:erpsql2 saving to pool 'Default' (000011)

They only anomoly i can see is below

10/03/04 06:55:49 nsrd: media notice: LTO Ultrium-2 tape 000017 on \\.\Tape3 is full
10/03/04 06:55:49 nsrd: media notice: LTO Ultrium-2 tape 000017 used 45 GB of 200 GB capacity
10/03/04 06:59:01 nsrd: media info: verification of volume "000017", volid 3680487797 succeeded.
10/03/04 06:59:47 nsrd: write completion notice: Writing to volume 000017 complete

The only "abort" save set is the set include with this tape
But from what support has told me if NW sees bad media then it will attempt to finish the current write and mark the tape full then move on to new media as it did above.

Thanks for the input

10/03/04 18:55:42 nsrd: erpdb1:index:erpsql2 done saving to pool 'Default' (000011) 4 KB
10/03/04 18:55:51 nsrd: erpdb1:bootstrap saving to pool 'Default' (000011)
10/03/04 18:55:51 nsrmmdbd: media db is saving its data. This may take a while.
10/03/04 18:55:51 nsrmmdbd: media db is open for business.
10/03/04 18:55:53 nsrd: erpdb1:bootstrap done saving to pool 'Default' (000011) 229 KB
10/03/04 18:55:59 nsrd: savegroup info: Added 'erpdb1' to the group 'Default' for bootstrap backup.
10/03/04 18:55:59 nsrd: savegroup alert: Default completed, total 2 client(s), 0 Hostname(s) Unresolved, 1 Failed, 1 Succeeded. (erpsql2 Failed)
* erpsql2:MSSQL:pRD erpsql2 is a SQL 2000 Virtual Server
10/03/04 18:55:59 nsrd: runq: NSR group Default exited with return code 1.
10/03/04 18:55:59 nsrd: write completion notice: Writing to volume 000012 complete
10/03/04 18:55:59 nsrd: write completion notice: Writing to volume 000033 complete
10/03/04 18:56:00 nsrd: write completion notice: Writing to volume 000010 complete
10/03/04 18:56:31 nsrd: write completion notice: Writing to volume 000011 complete
 
NW will in fact mark the tape full if it can not recover from (20, default) CONSECUTIVE write attempts. As this may occur on every block, it could be that a lot of "shoe-shining" would take place on that media until the worst case has been detected.

From here it is not really easy to determine the problem as it must not necessarily be the media. It would be a good idea to find out about error details. If NW does not provide that you may try whether you can get this from the jukebox' internal statistics. Most manufacturers supply such feature.
 
Rakerr, can you help Please to find any UnixWare SK Kit ( 7.1.2 or 7.1.3 ) for the ES7000 ? this is the Driver kit for make a ES7000 Run whit SCO UnixWare 7.1.x. thank you.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top