Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Meridian Mail backups failing

Status
Not open for further replies.

glarkin

IS-IT--Management
Feb 26, 2002
175
US
I have Mermail 13.14.2 on an Option 81c with Symposium 4.2 and Web client 4.5 and apologize for the long post.

During our nightly backup last night (22:15 hours) Meridian Mail generated a lot of errors in the SEER log:

65-99 (Debugger traceback ouput) A bunch of these saying different things:

"DBTRACE 6 Task error 30:30 in task 267B1687, task name VS"
"DBTRACE 6 executing VOLINIT line: unknown in unit VS"
"DBTRACE 6 called from -> VS line: 0"
"DBTRACE 6 Initiated by task Monitor line: unknown"
"DBTRACE 6 locale: 36"
"DBTRACE 6 stack pointer: 001C122E steap pointer: 001C000"
"DBTRACE 6 initial space: 5120 SR: 0000 PC: 002D2832"
"DBTRACE 6 D0 = 007E975A D1 = 00000006 D2 = 0018240C D3 = 04000000"
"DBTRACE 6 D4 = 00840000 D5 = 0064028C D6 = 0000FC00 D7 = 00000018"
"DBTRACE 6 A0 = 001C1344 A1 = 00000000 A2 = 00840000 A3 = 002CACAA"
"DBTRACE 6 A4 = 001C13A4 A5 = 001C1398 A6 = 00183F8A"

11-1 (Failed intertask communication) "VS901 Error finding server"

Then at 02:00 hours there are a lot more of these 65-99 errors mentioning:

"DBTRACE 7 Run error 4: value range error in task 507B266a, task name VMSTASK"
"DBTRACE 7 Executing VHSETVOL line: 0 in unit VHPROCS"
"DBTRACE 7 called from -> LOGOFF line: unknown in unit VMCOMM2"
"DBTRACE 7 called from -> CLOSEUPS line: 0 in unit VMSU"
"DBTRACE 7 called from -> DISCALL line: 0 in unit VMSU"
"DBTRACE 7 called from -> VMSTASK line: 0 in unit VMSU"
"DBTRACE 7 Initiated by task VSS line: unknown"
"DBTRACE 7 Initiated by task MONITOR line: unknown"
"DBTRACE 7 locale: 29"
"DBTRACE 7 stack pointer", etc, just like above.

I do a daily backup manually of all volumes (VS1, VS2 and VS202 voice and data) and ALL selective Messages, PDLs and Services in addition to a scheduled nightly backup and have never had one fail like this before before. I tried running the nightly backup manually and got an error when it tried to backup volume VS202T. I did a manual backup of ALL selective Messages, PDLs and Services and it completed, but SEER showed error 15-40 (Backup Error, Lookup return code in NTP SEERs for VS202T. 1101) and then 76-12 about not being able to read FID for mailbox SCCS8888. I've also noticed that two DSP ports on each node are showing ACTIVE all the time on a Saturday (we're closed), is that normal? I'm opening a ticket to have our vendor out Monday morning but I'm wondering if anyone can say if this is a critcal issue leading to catastropic failure, or something a reboot might cure. As far as I can tell, Mermail works fine for voice messages and Symposium GIVE IVR announcements. Sorry for such a long post and thanks for any input you can provide.

Greg

 
back up everything you can, i wouldn't worry about system, if you can't backup vol 202, try moving boxes to another vol. tools will speed that up. doesn't sound like a tape, in as much that it fails on only one volume. i wound suspect a small area of coruption on the hard drive, that would explain one mem address failing, migrating data off the area, may clear the problem long enough to replace the hd and do a restore.. to change the drive, i would, do a new install from the original install tape, then do a resore from a partial backup, one without system. a backup tape with the system data on it, formats the hd, if the system data has a problem and will not load, you will not be able to use the backup tape at all..

john poole
bellsouth business
columbia,sc
 
John gave you really great advice...
One question I would ask is, why do you do daily backups on voice mail? If it is a provision of you SLA (Service Level Agreement) contract, then you should always seek to meet that requirement. Either way, you might want to discuss this with your manager. I would advise that you run weekly backups if your SLA allows that interval, but I would run two tape backups, and verify that each backup completed without erors.

While your problem sounds more like a HDD problem, should you encounter a failed backup that points to a tape drive or tape media problem, I recommend the following protocol:

After you run a tape backup, you should always verify that the backup completed without errors. If your backup has errors on the first tape, DON'T INSERT THE 2ND TAPE. (You now know you have one good previous backup) Inspect the tape that failed and look for any apparent defects. ie tape twisted or case is damaged. If no damage is evident, insert your cleaning tape. Clean the tape drive and retry the backup on the same tape that failed. If it fails again, mark this back up tape as defective and get a new tape and try again. If the backup fails again, clean the drive again and retry your new tape. If you are not able to perform a backup at this point, you should look at replacing the tape drive.
 
Thanks very much for your advice. I've been dealing with this issue all weekend, pouring over the SEER logs and documenting everything I can in Mermail, and tomorrow Nortel will take a stab at it. We're only seeing errors when backups are run, otherwise the system is running and functions normally. The remote engineer found some stale mailbox warnings in the SEER log and said to delete those mailboxes, run an audit from the TOOLS menu and see if that worked. I deleted what I could and ran the audit but it failed each time with a task error. If we were having hard drive problems, wouldn't I see SEER messages 66? How can I move users from one volume to another...I didn't see an option for it in the TOOLS menu and the user manual doens't cover it.

We do full backups every day because we thought that would give is the greatest protection against failure, but now that turns out to not be true I guess. From the SEER log it seems thing started to go bad around June 4th, and I have good backups from the 2nd and 3rd that I could hopefully restore from if need be. I just wonder why the backup status screen said the backups completed sucessfully when they didn't. In the future we're going to do as you both suggested and revise our backup strategy to prevent this kind of exposure in the future.

When kind of backup can I do to get the mailboxes, voice menus, recorded announcements, VSDNs, TOD controllers, and through-dial information? I tried doing bulk provisioning to tape and although the backup status said it completed, there were some errors in the SEER. I had tried backing up user info so maybe that was what caused the problem.

Thanks again for your advice...I truely appreciate it.


Greg
 
a hd problem would give multi seer codes only if the os could see the problems.. a seer is not carved in stone, and sometimes you don't get one, or get one that is a problem with a loop that you don't have. switch codes are correct 90 percent, mail around 80.. you have done almost everything nortel can do unless you get to level 3 or 4, btw your local support was correct, i delete stale mb's every 30 days, looking for boxes stale past 45. i do that for customer service, people that leave msgs and don't get a reply, are usually not your best repeat customers.. i also watch the seer for non users forwarded to mail and remove fna on the set, users that forward to mail and don't have a box, sorta make me mad.. of course i work for the phone company and they tend to live on the far side of their voice mail..

john poole
bellsouth business
columbia,sc
 
Sounds like you may have some bad blocks on the hard drive that need to be reassigned to see if you can get a completed back.
 
that was my original thought, i've had that trouble before. hd didn't give me seers, bu the backup crashed.. i would move users off of that vol, attempt a backup and replace the drive..

john poole
bellsouth business
columbia,sc
 
We appear to be back to normal now and are able to complete backups.

Our vendor's engineers dialed into our Meridian Mail and said they found no problems with our disk. They had us clean the tape drive and then ran another backup, which completed. I checked the SEER and verified that all volumes had been backed up successfully, even VS202, which blew up the backups yesterday. They claim it was a dirty tape drive causing our problems, but I don't think I buy that explanation. I could complete a backup of the system on VS1 to tape just fine yesterday (no SEER errors at all), but if I included VS202 the backup would halt with the DBTRACE messages.

I had deleted a bunch of stale mailboxes on Sunday (some had gone 1200+ days without being checked) and the rest I had logged into to reset the "Number of days since last logon" counter to 0. I wonder if that had more to do with correcting our problems than cleaning the tape drive? In any case, I'm going to heed the advice given here and incorporate daily selective backups into my routine to give us a better restore position. Thanks again for the great replies!

Greg
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top