Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A scare this past week...What do you do? 1

Status
Not open for further replies.

bessebo

IS-IT--Management
Jan 19, 2001
766
US
We have SQL2000 for our ERP database and I perform full backups every night and transaction log backups during the day every 15 minutes. Every Sunday I perform a database integrity check of all of our databases and after that I perform a rebuilding of indexes on all of our tables. I noticed this past week that our database integrity job failed and it never has before. The error was 2 consistency errors with a specific table in our ERP database. We think this issue stemmed from a hardware issue on the previous Thursday. So users had been transacting for two full days after we noticed this issue.

On Monday morning we placed a call to Microsoft and they said that in the worst case (if I cannot repair the database) situation you can restore from the previous full backup (previous Wed night) and roll forward the transaction log backups. Well I could not do that because we perform full backups every night and the transaction log backups are overwritten if the full backup runs clean.

There is no way in hell we could've re-keyed 2 days of transactions. As it turned out, we figured the thing to do would be to restore the last backup (Sunday night's) to our test environment and try to run the repair at least in 'test' first. So I restored the database and figured I would at least run an integrity check in there just to make sure I get the same error message as I got in production about the consistency errors. When I ran the DBCC command, the database was fine (no errors). So then I got to thinking that maybe the rebuild of the indexes that occurred after the integrity check might have corrected the problem? ANy help on this would be appreciated.

Another question I have is how often do you run a database integrity check on your databases? Maybe we should do this each night? I don't want to experience this again ever!

Regards,
Bessebo
 
I run my db check nightly. I run it nightly just for the reason you described above.

Does the problem still exists in you prod database. You test database was corrected by the database restore.

Does the problem still exists???
If so
Try this.
Run a full backup of the database with the error. Restore that database to your test server then run CHECKDB. The problem should be gone.
The consistency error should be corrected in the restore.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
I am sorry but I forgot to mention that after I noticed that the problem was gone in the Test environment I then ran the database integrity check on our production database and it was CLEAN. So, this tells me, that the restore did not resolve the problem in our production environment because I never did a restore there...So the only thing that occurred on the production database was the rebuilding of indexes. Or maybe there are processes running in the background to correct consistency issues?
 
There wouldn't be any background processes that would correct it unless you ran it. It may have been corrected by the re-index.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
Yeah, I was actually trying to test that out last night but the re-index ended up eating up all of our space in our Test Room. I am restoring our test room now and will just rebuild the index of the one table that had the consistency error.

Ok so this begs another question from me. Is your database integrity check part of your backup script. In other words, if that fails you probably don't want to perform a full backup because that problem would be in the full backup (our problem is that once the full backup is clean the transaction log is cleared out). You would have to restore the full backup from the night before and then roll forward to the problem and then re-key any info that was entered after the problem.

I will add a nightly database integrity check but not sure how much that will help other than telling me there is an issue earlier..

By the way, I think you were the one who responded to my torn page in the msdb database and suggested to stop sqlagent, restore from an older backup of msdb, then start sqlagent. I tried that this morning but when I right-click on the msdb database to click on restore database it tells me that I am disconnected from the server and enterprise manager just hangs. So I think I just have to perform the restore within query analyzer and I should be OK...You see the problem is I get the torn page error when I click on a database in Enterprise Manager. So that error is then preventing me from restoring in the gui environment. Thanks for all of your sage advice. I really appreciate it. My administration duties are typically relegated to problem scenarios...

Regards,
Bessebo
 
Is your database integrity check part of your backup script. In other words, if that fails you probably don't want to perform a full backup because that problem would be in the full backup (our problem is that once the full backup is clean the transaction log is cleared out). You would have to restore the full backup from the night before and then roll forward to the problem and then re-key any info that was entered after the problem.
No, It is a separate job. No matter what you want to get the database backed up. Also, you can correct some errors with that backup. For example a torn page can be corrected with that backup.

To fix your msdb problem run your restore command from QA.

Good luck,

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
Quote (I don't know how to get that cool window to display:

"Also, you can correct some errors with that backup. For example a torn page can be corrected with that backup."


How do you correct those errors? Use the Repair_with_data_loss qualifier with DBCC CHECKDB? I'm interested in how you would correct this? Let's say that you come in, in the morning and the integrity job failed. You also had a full backup from the night before that includes those errors. Do you still have a complete transaction log backup that you can roll forward or are you just confident you can correct those errors without having to roll forward?

Thanks,
Bessebo


 
to format your posts look at the process TGML link above the Submit Post button.
Here is an example of quote.
[ignore]
your quote here
[/ignore]

You can repair database problems with checkdb.

BOL said:
REPAIR_ALLOW_DATA_LOSS | REPAIR_FAST | REPAIR_REBUILD
Specifies that DBCC CHECKDB repair the found errors. The specified database must be in single-user mode to use one of the following repair options.

REPAIR_ALLOW_DATA_LOSS
Tries to repair all reported errors. These repairs can cause some data loss.

REPAIR_FAST
Maintains syntax for backward compatibility only. No repair actions are performed.

REPAIR_REBUILD
Performs both minor, quick repairs, such as repairing extra keys in nonclustered indexes, and time-consuming repairs such as rebuilding indexes. These repairs can be performed without risk of data loss.

Important:
Use the REPAIR options only as a last resort. To repair errors, we recommend restoring from a backup. Repair operations do not consider any of the constraints that may exist on or between tables. If the specified table is involved in one or more constraints, we recommend running DBCC CHECKCONSTRAINTS after a repair operation. If you must use REPAIR, run DBCC CHECKDB without a repair option to find the repair level to use. If you use the REPAIR_ALLOW_DATA_LOSS level, we recommend that you back up the database before you run DBCC CHECKDB with this option.



- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
I take it you are fairly confident that the DBCC CheckDB repair options work? Have you ever tried to use them? If they do not work then you cannot get back to before the issue unless you restore an old full backup and roll forward the transaction log backups since that backup. Would you still have the transaction log backups from the previous day even if you have had a successful full backup? We would not but I'm thinking that maybe we should start keeping them...
 
>>I take it you are fairly confident that the DBCC CheckDB repair options work?

I have used it in the past but if I can I will use the restore method first.

I try to keep 3 days worth of logs. Part of my log backup job is to copy the log .bak file off of the primary server on to a server that the DBA group owns.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
Yes, that is the thing to do. I am going to change our current procedure so that I copy off the days transaction log backups off to tape before I run my full backup.

Do you write each of your transaction logs off to tape or just append to a file on disk? At the end of each night how do you clear out your transaction log backups? Just wondering what others do..

Thanks,
Bessebo

 
I use backup devices. This is what I do for level 1 servers. (level 1 servers are considered mission critical.)

12 AM Full data backup with init (overwrites previous backup)

1 AM backup tLog with init. (Overwrites previous backups)
Copy device to a folder on the DBA server named current logs.
The .bak devices in that folder were backed up to tape at 12 AM. We keep 3 days worth of logs on tape.

1:30 AM and every 30 min. Backup TLog with noinit. This appends my tlog to the backup device. Then I copy to DBA server overwriting the previous .bak device.

So I have a copy of the database and Tlog backups on each local server and a copy of the tlogs on another server. We backup to tape at 3 AM every night.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
That sounds about right. Due to the issues that occurred this past week our backup strategy has changed. I also use devices but they just point to disk files on our database server. It will be like this:

-Starting at 6:00AM we take transaction log backups every 15 minutes with NOINIT so they just append during the day

-Full Backups run every night at 11:30 PM

-At 2:30AM both our full backups and the full transaction log backups from the day are written to tape

-At 5:00AM I perform a Backup transaction log with INIT to overwrite previous transaction log backups

--At 6:00AM it starts all over again.


The change I made is that I wasn't keeping the transaction log backups from the full day if the full backup was clean (previously I was just INIT-ing it if the full backup was clean). Now I will be writing both off to tape whether or not the full backup is successful. I like this change as it will allow us to recover to a point in time should there be a database integrity issue. I also added a nightly database integrity job to run, instead of weekly.

The only exposure we may have is that during the day the transaction log backups are simply written to disk on the database server. If we had a catastrophic event we would lose them. That begs the question, do you write your transaction log backups to tape during the day or do you run the risk of losing them as I do?

You have helped me immensely with my backup strategy. You go through a problem like this and you learn from it.

Regards,
Bessebo
 
>>You go through a problem like this and you learn from it.

Yes you do. I think most DBA's at one point or another have been burned by a lost database backup.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
That begs the question, do you write your transaction log backups to tape during the day or do you run the risk of losing them as I do?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top