Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Error (Cyclic Redundancy Check) 2

Status
Not open for further replies.

axb123

MIS
Oct 4, 2004
7
0
0
NL
I've seen many posts about this error message but all the suggestions don't seem to work.

Every day the backup failes with the Data Error (Cyclic Redundancy Error).

This Problem started about 2 months ago and HP quickly shipped a new backup unit. This fixed the problem untill this week when the problem returned. This time however I do not see a prolbem with the drive, ran the HP utils but drive looks fine.

I then proceded to follow the step as outlined in veritas support doc (192216). But no luck the problem continues to exisist. I've tried about everything to get this error resolved. Cleaned the drive several times, new drivers / firmware, different scsi port, new tapes the list is endless.

Does anybody out there have an idea what could possibly solve this error?
 
axb123 -

Just out of curiousity, what version of Veritas and media are you using?

- alex
 
I just received this message on my backup from last night. This is the veritas support doc i used (266260). Of course I have to wait to see if it works or not.

good luck.
 
The veritas version that i'm using is: Backuk Exec for Windows version 9.1

And the type of backup unit / tapes is:
HP Autoloader 1/8 with Ultium 200/400 gb tapes, running the latest drivers and firmware.

 
If you check the Windows System Log and it is generating any errors with Event ID 7, 9, 11 or 15, then it is a hardware error. most probably it is generating Event ID 7.

Replace the tape drive again.
 
When was the tape drive last cleaned? Cleaning often makes this error go away.
 
Is this happening consistanly on a certain tape??

CRC errors are usually caused by either:

1. Drive fault (usually damaged/dirty/worn heads)
2. Media fault

If you have already replaced the drive i would check your media.
 
axb123 is this job a remote job? I would bet good money it is. How many servers are being backed up in this one job? Is it possible to break them up? Does the CRC error point to the same drive or does it "roam" between drives.
 
It is indeed a remote job, we are using one server to backkup 3 servers. All server have the backup agent installed.

I've kept a record of each failure to verify if the backup fails on the same file, drive or server.
However there does not appear to be a patern on where the job failes.
The job crashes whilst it's working on different servers, drives and files.

Needless to say it's all very frustrating. And for some strange reason the backup we made Monday night went through without problems. And last night (Tuesday) the backup didn't come further then 24Gb before crashing again (a full backup would be 420Gb).
 
Rols is correct.

CRC errors are usually caused by either:

1. Drive fault (usually damaged/dirty/worn heads)
2. Media fault (faulty tape(s) )

Tapes wear. Dont think you can re-use them over and over again. So do tape drive heads (wear).

Skew happens! ;o)

(skew being the tape is being drug across the R/W head differently during a read operation than it was during a write operation thus causing read errors i.e. CRC's)

Dont necessarily expent tapes that were created with the old drive will read accurately with the new drive either. Test your b/u's from the old tape drive with the new tape drive!!!



Cheers!

Chuck
 
It is a common response that CRCs or Semaphore Timeouts are hardware related but this is far from it. Don't get me wrong I've seen a drive correct the problem but only about 1 out of 10 times. You can replace a drive, media, or the entire library and still see these errors. The biggest response you'll get is it has been working all this time so it has to be a hardware failure. I've seen the most wacked out configuration work for a year and then finaly blow up. You fix what should have been configured properly a year ago and all is well. Any type of small change can finaly expose the true problem. Is this the case here? Could be. I just wanted to bring this up because so many people insist on hardware related errors that they end up takeing months to get to the real issue. Media age is a good point. Heck even the brand of media can introduce errors. I've seen people search on the web for the cheapest media they can find to put in the $100,000 tape library. They are backing up critical data to the cheapist media they can find. Yes, quality of media or lack of it is seen across Media vendors. Good example is a company's media that starts with an "I". Although there is a "standard" for the way media is produced the way the case itself is not part of that standard. This case is known to be troublesome for pickers in tape libraries. It is also not able to contact with sensors in some libraries thus giving the illusion of the slot being empty to the library.
Ok back to the issue.
If you have not done so already break the jobs into three jobs one for each server. You can set the remote agents into a debug mode. The steps can be found on Veritas' site. Run SGMON.EXE on the Master durring the backups. A few things to check and note. Are you teaming NICS? This has been known to cause issues with Veritas. If so un-team the nics for at least a short testing period. Is the NIC(s) set to Auto Negotiate or are they hard set. If they are set to Auto Negotiate hard set them to the highest through put. Make sure your switches are hard set as well. Make sure to check this stuff on the Master and the three servers causing the problem. Make sure your drivers are up to date for your NICS, SCSI/HBA card(s) and your tape devices.
The data is written to the tape in chunks. This is done in a burst. The tape will contain many chunks of the data being backed up. This is where the chunk size option comes in. You know that one drop down box that lets you choose chunk size. It usualy defaults to 64 and that is excepted industry wide. Now for any reason a chunk is skipped or is corrupted durring the write you can see a multitude of issues. The header information on the tape will be inconsistant with the data that is acctualy written to the tape. One way to avoid this in the future is to do a quick erase on the media you are rotating in for that backup cycle. I would do this no matter what. Keep an eye on how long you have had media. People forget how valuble there data is untill it is called for a restore. Check your System event viewer and Application viewer as well around the time of failure. You can use the event ids to search Veritas'website as well. Good chance you will see event id 7, 9, 11 in the system event viewer. In the Application event viewer you will get a five digit number I think. This can be used to search thier site for answers. I like to start a spreadsheet, put down the event id and then the description. Eventhough at first you see at lot of different ids that seem to not be related it will all come together to point you in a good direction for resolution. CRC is a way the data can be checked in a fairly quick manner to what was backed up. Now heres where network or SCSI timeouts can come about (SCSI timeouts are not allways a hardware failure) to cause CRCs. If at anytime durring a Cyclic Redundancy Check "communication" is lost between the data and the data on the media you will get a CRC. This is due to the fact that the CRC failed to confirm the written data on the tape was correct and consistant to the data being backed up.
Oh man sorry to ramble. It's late and I ramble when I'm tired. Just to shorten it up...There are a few things that can be done to test functionality of a backup before getting hardware replaced. The steps can be performed a lot faster than waiting for hardware, replacement, backups to run and get the failure email we all do not like to see.
Can't spell worth beans so forgive me in advance. Grammer is not a specialty as well.
Again sorry to ramble.
 
Suggestion---DO NOT CLEAN A DRIVE everytime you have a problem. The newer drives and Cleaning tapes actually have an abrasive that can prematurely wear out a drive. ALWAYS check your media FIRST. I have had this problem and 99 out of 100 times it was bad tape...not bad drive. A bad drive usually gives (but not always) SCSI errors. I have also had Brand New (Supposedly New) media give me problems. I didn't believe it until I finally broke down and opened another package of NEW Media. The second package worked where the first did not.
 
sniderweb,

First of all... THANKS !
When I first read your messge at first I thought... that's a bit strange, however the comment about the cleaning tape having an abrasive made sence.

To be sure I phoned HP tech support to ask their comments about this. And they confirmed your comments about the cleaning cycle.

They suggest you only use a cleaning tape when the drive asks for it. Where as we always worked under the assumption that it would be better to prevent the drive becoming dirty and needing a cleaning tape.

In hignsight it always makes sense..

We have replaced the backup unit and the tapes and won't clean it now until it prompts me to. ;-)



 

I have this CRC problem persist after replacing drive 3 times, changing tape media, breakdown backup jobs. Anyone, more advise please ...
 
have you tried replacing the unit, tapes et cetera all at the same time?

In our case we had one item that wasn't functioning correctly but we couldn't identify what it was so the only solution was to replace the whole backup solution in one go.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top