Hi,
I have a job that processes a file of SQL statements. The latest file was about 16M in size and contained 60K SQL statements. Recently, this job has been failing, seemingly at random (we had zero failures last week) then three this Tuesday, issuing the following message.
Once this message is issued, the table ZY8K contains duplicate records which need to be removed to allow the job to be rerun.
The message from the SQL logs is
Our client is running Microsoft SQL Server 2000 - 8.00.534 (Intel X86) Nov 19 2001 13:23:50 Copyright (c) 1988-2000 Microsoft Corporation Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 4), and yes we are suggesting that they upgrade from SP2 to SP4.
One thing I noticed is that recently, among others, a job TLIST.EXE was running which was grabbing 25% CPU and had been running for a number of days and the server as a whole was consistenly running at 60 - 80% CPU. A Recent reboot has removed the TLIST and reduced CPU to under 10%.
Unfortunately, I do not have the SQL skills or Access to our clients server to analyse any SQL logs, the one above contains the 3 errors associated with the job failures and 1 message every 15 mins sayins the log has been backed up.
I have been suggesting that because of the high cpu usage on the server and possible disk contention etc, SQL hasn't been able to keep up with the speed at which the job has been submitting SQL commands and the transaction queue (is there such a thing?) has filled up, causing the issuing job to fail. Why this would mess up the indexes and/or data I'm not sure.
Is my thinking reasonable, or am I talking complete rubbish and does anyone know of any other explainations given the error message.
Thanks in advance.
Graham
I have a job that processes a file of SQL statements. The latest file was about 16M in size and contained 60K SQL statements. Recently, this job has been failing, seemingly at random (we had zero failures last week) then three this Tuesday, issuing the following message.
Code:
*GE01BQL-BBAD0001-------------------------------------------------------------------------------------------------------
*GE01BQL-BBAD0002-Program identification: BQL/4.200/2001-05-07-14.56.36/U/
*GE01BQL-BBAD0003-Start of processing - timestamp at start: 2005-09-06-09.27.26
*GE01BQL-BBAD0015-Access error (relational table): S1 /EXECUTE /10/PK/000000000000021/S1000/Warning: Fata
l error 644 occurred at Sep 6 2005/ 9:32AM
*GE01BQL-00000000- DELETE FROM HRUKPARI.ZY8K WHERE NUDOSS = +712434825
*GE01BQL-BBAD0011-*** BQL: abnormal end - timestamp at end : 0001-01-01-00.00.00 **** return code 99 ******
*GE01BSI-BBAD0015-Access error (relational table): TD12/8F/KD/SELECT/00000000001000008S01 ZO Communication link
Once this message is issued, the table ZY8K contains duplicate records which need to be removed to allow the job to be rerun.
The message from the SQL logs is
Code:
2005-09-06 09:32:30.23 spid60 Could not find the index entry for RID '16602e95930200' in index page (1:175169
2005-09-06 09:32:30.23 spid60 Error: 644, Severity: 21, State: 5
Our client is running Microsoft SQL Server 2000 - 8.00.534 (Intel X86) Nov 19 2001 13:23:50 Copyright (c) 1988-2000 Microsoft Corporation Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 4), and yes we are suggesting that they upgrade from SP2 to SP4.
One thing I noticed is that recently, among others, a job TLIST.EXE was running which was grabbing 25% CPU and had been running for a number of days and the server as a whole was consistenly running at 60 - 80% CPU. A Recent reboot has removed the TLIST and reduced CPU to under 10%.
Unfortunately, I do not have the SQL skills or Access to our clients server to analyse any SQL logs, the one above contains the 3 errors associated with the job failures and 1 message every 15 mins sayins the log has been backed up.
I have been suggesting that because of the high cpu usage on the server and possible disk contention etc, SQL hasn't been able to keep up with the speed at which the job has been submitting SQL commands and the transaction queue (is there such a thing?) has filled up, causing the issuing job to fail. Why this would mess up the indexes and/or data I'm not sure.
Is my thinking reasonable, or am I talking complete rubbish and does anyone know of any other explainations given the error message.
Thanks in advance.
Graham