644 Access Error

ghalewood · Sep 8, 2005

Hi,

I have a job that processes a file of SQL statements. The latest file was about 16M in size and contained 60K SQL statements. Recently, this job has been failing, seemingly at random (we had zero failures last week) then three this Tuesday, issuing the following message.

Code:

*GE01BQL-BBAD0001-------------------------------------------------------------------------------------------------------
*GE01BQL-BBAD0002-Program identification: BQL/4.200/2001-05-07-14.56.36/U/                                              
*GE01BQL-BBAD0003-Start of processing  - timestamp at start: 2005-09-06-09.27.26                                        
*GE01BQL-BBAD0015-Access error (relational table): S1             /EXECUTE    /10/PK/000000000000021/S1000/Warning: Fata
                  l error 644 occurred at Sep  6 2005/ 9:32AM                                                           
*GE01BQL-00000000- DELETE FROM HRUKPARI.ZY8K         WHERE NUDOSS = +712434825                                          
*GE01BQL-BBAD0011-*** BQL: abnormal end - timestamp at end  : 0001-01-01-00.00.00  **** return code 99 ******           
*GE01BSI-BBAD0015-Access error (relational table): TD12/8F/KD/SELECT/00000000001000008S01 ZO Communication link

Once this message is issued, the table ZY8K contains duplicate records which need to be removed to allow the job to be rerun.

The message from the SQL logs is

Code:

2005-09-06 09:32:30.23 	spid60	Could not find the index entry for RID '16602e95930200' in index page (1:175169
2005-09-06 09:32:30.23 	spid60	Error: 644, Severity: 21, State: 5

Our client is running Microsoft SQL Server 2000 - 8.00.534 (Intel X86) Nov 19 2001 13:23:50 Copyright (c) 1988-2000 Microsoft Corporation Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 4), and yes we are suggesting that they upgrade from SP2 to SP4.

One thing I noticed is that recently, among others, a job TLIST.EXE was running which was grabbing 25% CPU and had been running for a number of days and the server as a whole was consistenly running at 60 - 80% CPU. A Recent reboot has removed the TLIST and reduced CPU to under 10%.

Unfortunately, I do not have the SQL skills or Access to our clients server to analyse any SQL logs, the one above contains the 3 errors associated with the job failures and 1 message every 15 mins sayins the log has been backed up.

I have been suggesting that because of the high cpu usage on the server and possible disk contention etc, SQL hasn't been able to keep up with the speed at which the job has been submitting SQL commands and the transaction queue (is there such a thing?) has filled up, causing the issuing job to fail. Why this would mess up the indexes and/or data I'm not sure.

Is my thinking reasonable, or am I talking complete rubbish and does anyone know of any other explainations given the error message.

Thanks in advance.

Graham

mutley1 · Sep 8, 2005

Graham,

I'd remove all your duplicates etc. to start with so you know it is 'clean'-ish, back it up and try the following from Microsoft.

http://support.microsoft.com/kb/822747

Might be worth running a reindex as well and see if that resolves it afterwards, but would read the MS one first & see if that works.

Code:

 DBCC DBREINDEX (ZY8K, '', 100)

Cheers,

M.

vongrunt · Sep 8, 2005

Error #644 is often related to uncommitted reads (isolation level 0, NOLOCK hint) during INSERTs/UPDATEs.

And with not up-to-date version of SQL Server, there are other weird possibilities - some of which happen exclusively on heap tables (no clustered index). Check MSDN for that error.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]

mutley1 · Sep 8, 2005

Good spot VG - sorry, missed the service pack not being there. Prob be a better start point!

BR,

M.

ghalewood · Sep 8, 2005

Hi M,

Thanks for that. I had seen that problem and it is one of the reasons that we have suggested to our client they update to SP4.

I have worked on this system for 6 months and have never seen this problem until recently. The week before last it was failing multiple times a day, no problems at all last week, then 3 failures on Tuesday, yesterday fine. We have applied no system changes recently to account for possible errors. The only difference that I can see is that when I started checking the server usage after the first failure, it was way up around 80% whereas from Tuesday evening it was and still is around 10%.

The way we found to correct the problem was to drop the table, remove the duplicate data, reload, then rerun the job.

Also, after the very first failure I ran a simple query against the table "SELECT * FROM ZY8C WHERE NUDOSS = 12345" and I was shown
NUDOSS OTHER DATA
12345 xxxxxxxxxxx
12345 xxxxxxxxxxx
45678 xxxxxxxxxxx
12345 xxxxxxxxxxx

so it was if the idex was pointing at the wrong table row.

We had occasion to run the job again without correcting the table data and this error was produced, so it seems like the first failure is messing up the indexes and/or data.

Code:

*GE01BQL-BBAD0001-------------------------------------------------------------------------------------------------------
*GE01BQL-BBAD0002-Program identification: BQL/4.200/2001-05-07-14.56.36/U/                                              
*GE01BQL-BBAD0003-Start of processing  - timestamp at start: 2005-08-22-15.15.50                                        
*GE01BQL-BBAD0015-Access error (relational table): S1             /EXECUTE    /10/PK/000000000002601/23000/Cannot insert
                   duplicate key row in object 'ZY8C'/with unique index 'X1                                             
*GE01BQL-00000000- INSERT INTO HRUKPARI.ZY8C        (NUDOSS,NULIGN,SOCDOS,FILLE1,PERTRT,FILLE2,NUM                      
*GE01BQL-00000000-TRT,NUMBUL,NUMCUM,CUMANN,CUMPER,DATCUM,MONCUM,IDCY00) VALUES ( +782649715, +0000                      
*GE01BQL-00000000-15, 'OSD', '20', '0505', ' ', '0', '01', '900', '04', '01', '1753-01-01', +00000                      
*GE01BQL-00000000-009859.7900, '   ')                                                                                   
*GE01BQL-BBAD0011-*** BQL: abnormal end - timestamp at end  : 2005-08-22-15.15.51  **** return code 99 ******

The input data was exactly the same but the job worked. To provide additional evidence to our client, I need to be fairly sure that server stress and only being at SP2 is what caused the problem. If what I have said about SQL not being able to keep up with the BQL job is reasonable, I am happy to go with it as an explaination. However, if what I have said is not possible or reasonable, I need to provide them another reason.

Sorry for rambling

Graham

mutley1 · Sep 8, 2005

As VG has said, firstly suggest they apply the latest SP as a start point. It's not really a case of the SQL not being able to keep up with the job. a lot would depend on the table size, rows involved, the actions the job is performing, server spec (esp CPU / memory if thats where the problem is) and also, if the index is knackered then it's going to make it a little more difficult for the statements to make sense of whats in there when inserting.

No worries on rambling - some things are hard to describe!!

BR,

M.

vongrunt · Sep 8, 2005

Considering the fact server is SP2, one of following scenarios is very likely:

- mass concurrent INSERTs/UPDATEs on heap table (without clustered index). See KB827178.
- weirdo with Unicode column, as mutley1 pointed out. See KB822747.

Both problems were solved in SP4.

Dirty reads are another posibility (see KB328736), though they often fired all kinds of dreaded 6xx errors.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

644 Access Error

ghalewood

Programmer

mutley1

MIS

vongrunt

Programmer

mutley1

MIS

ghalewood

Programmer

mutley1

MIS

vongrunt

Programmer

Similar threads

Part and Inventory Search

Sponsor