Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

200gb data xfer to array via LAN not working-help!!

Status
Not open for further replies.

krawz187

MIS
Mar 19, 2002
27
US
Hello,

At my office we are working to restore nearly 200gb of data that was lost in a RAID array failure on our F50. According to IBM, 3 of our Disk Drives on the array failed within a span of less than two weeks. We have been working on the problem for several months due to problems with budget and technical issues.

Initially we tried to 'tar' up the data to DLT media at a remote office on the west coast, ship that here and extract it to the array. Initially the extraction of the data was going fine, but eventually it stopped with an error. At this point I don't remember the error message, but I could not resume the extraction from any of the tapes, or even overwrite what was already there.

We then decided to purchase a 250gb IDE drive, copy the data to it at our remote site, then ship it here to restore the data from a PC to the array over the LAN via a NFS mount or FTP session. We tried NFS on the first attempt, then FTP the second day. The NFS and FTP sessions started out fine (2000kb/s), but when left overnight they would all slow down to a crawl when we checked the next morning. We added a UDMA controller to the PC since the BIOS was detecting it at 157gb(even though Windows saw it as 250gb), then we tried to transfer by FTP again. Same problem, slowed to a crawl by morning, only transferring 90gb or so of the data.

We then decided to send the 250gb IDE drive and the RAID array to our HQ office for some of the senior AIX guys to look at and try the process from a newer PC. They hooked up a PC with the 250gb drive, and started to FTP the data over to the array. Unfortunately the results were the same, and the data is still not restored to the array.

I'm wondering if the DLT failure and the network transfer failure are related, and perhaps failed after approximately the same amount of data was transferred. I'm not sure where to go from here other than speak to IBM directly, which is probably already in the works. Anyone else run into anything like this before?

We are using JFS and the F50 is running AIX 4.3.3 (can find out maintenance level if needed)

Thanks for any help! :)
 
From this story it's possible that you have a problem on your destination disk,to which you are copying.
You did not mention which type of disk it is - but I would check errpt greping for this destination AIX disk errors.
Also,I'd fsck the filesystems you are using to copy to on the above disk.
Also,once the system comes down to such a slow copy - use topas to find out it's bottleneck.

Long live king Moshiach !
 
Thank you for your response! I'm not sure what you mean by the type of disk it is. I'm pretty new to AIX/RISC systems..

It is an external RAID array of SCSI drives attached to an F50. Is there other info needed that would help narrow the problem down?

I've asked one of our techs to try the data transfer again so I can try the "topas" command and check for a bottleneck. Hopefully that will shed some light.

We completely remade the filesystem yesterday, and tried a copy overnight again, but it had the same negative results. I'm guessing re-making the filesystem would cover the "fsck" base, right?

Here is the error report:
------------------------------------------------------------
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
E18E984F 1213094904 P S SRC SOFTWARE PROGRAM ERROR
2BFA76F6 1213092404 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1213094804 T O errdemon ERROR LOGGING TURNED ON
192AC071 1213092204 T O errdemon ERROR LOGGING TURNED OFF
E18E984F 1209173804 P S SRC SOFTWARE PROGRAM ERROR
2BFA76F6 1209170404 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1209173704 T O errdemon ERROR LOGGING TURNED ON
192AC071 1209170204 T O errdemon ERROR LOGGING TURNED OFF
8AC43378 1209123004 T H scraid0 ADAPTER ERROR
------------------------------------------------------------
on 12/10 we removed the SCSI card and sent it and the array to our HQ to try the process there. I'm not sure what the errors on 12/9 are related to..the only thing we did that day was try to copy the data directly to the array via a crossover cable from the PC to the second NIC on the F50, which failed.

Thanks again.
 
Hi, I just wanna throw some ideas that you may want to check.

After or during your restore, check the status of your filesystems. Is /, /var, /tmp full? Is the filesystems where you dumped your data got populated at all? What's the percentage of Inode used/free? What and how large is the log filesystem for your data filesystem? You may also want to use iostat to see your i/o transaction during data transfer.



 
You might want to try the DLT again, using either "backup" or "pax". tar can't handle files over 2GB.

Or you could install the 250GB drive in a Linux machine and see if the slowdown is a Windows problem. You don't even have to install Linux, just use a Live CD, like Knoppix.



Rod Knowlton
IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
Thanks for the good tips guys, I'm going to check them all out and see how it goes.
 
On the off-chance anyone was still following this thread....

Apparently there was a NFS or filesystem setting that was causing the issue...but I couldn't get a more detailed answer from our internal AIX guru.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top