Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with use of "is null"

Status
Not open for further replies.

Moptop

Programmer
Sep 24, 2003
35
0
0
EU
Hello
I have been getting some random errors when building our data warehouse.
Generally, the sequence that we add a new field to the data warehouse is:
1. Add a new column
2. Build an index on the column
3. Set column to 'Yes' based on certain conditions
4. Set all values with NULL to 'No'

The specific table that I am dealing with has roughly 2.5 million records.
What happens is that the "step 4" listed above sets all values with NULL to No and quite randomly some "Yes" values back to "No".

I am certain that there is no fault in the code as various members of our development team have checked the code. I have added the code to the bottom of the note.

I have tried the code on 3 servers with different configurations and all have the same issue. The current server that I am looking at has SQL2000sp3.

Has anyone else experienced anything like this.

Incidentally, the problem appears to go away if I force option (Maxdop 1).
Adding/removing clustered indexes makes no difference.
One of our DBAs thinks that there may be a fault with the index collation - but I don't know.

Any advice would be welcomed.

In the example below, the second update should update 2172560 records.

We are able to re-write each statement where the problem exhibits itself - but it takes a long time to find, check and test and we can only spare two developers to recode.

Thanks

Mark Elvin

------------------------------------------
ALTER TABLE dbo.imp_tblPerson_1 Add [GeneralRetailContactFlag] varchar(3) NULL
go
create index I_tblPersonGeneralRetailContactFlag on dbo.imp_tblPerson_1 (GeneralRetailContactFlag)
go

update dbo.imp_tblPerson_1
set [GeneralRetailContactFlag] = 'No'
from dbo.imp_tblPerson_1 p
where p.Age > 0 and p.Age < 16
--(284374 row(s) affected)

select GeneralRetailContactFlag, count(*) from dbo.imp_tblPerson_1 group by GeneralRetailContactFlag
-- GeneralRetailContactFlag
-- ------------------------ -----------
-- NULL 2172560
-- No 284374


update dbo.imp_tblPerson_1 with (tablock)
set [GeneralRetailContactFlag] = 'Yes'
where [GeneralRetailContactFlag] is null
--(2350671 row(s) affected)

select GeneralRetailContactFlag, count(*) from dbo.imp_tblPerson_1 group by GeneralRetailContactFlag
-- GeneralRetailContactFlag
-- ------------------------ -----------
-- No 106263
-- Yes 2350671




 
Okay, I'm confused. You say this is your criteria:
3. Set column to 'Yes' based on certain conditions
4. Set all values with NULL to 'No'

But your code shows you first set values to NO on this criteria: where p.Age > 0 and p.Age < 16. That is NOT setting NULL to NO.

Then you set your NULL values to YES with this...update dbo.imp_tblPerson_1 with (tablock)
set [GeneralRetailContactFlag] = 'Yes'
where [GeneralRetailContactFlag] is null

So, what are you really trying to do? Set NULL to NO? Or set NULL to YES?

-SQLBill

Posting advice: FAQ481-4875
 
I assume you expect the GeneralRetailContactFlag to be either yes or no depending on the age.

You should be able to do all this in one query.

update dbo.imp_tblPerson_1
set [GeneralRetailContactFlag] = Case When Age Between 0 And 16 Then 'No' Else 'yes' End
from dbo.imp_tblPerson_1 p

Without knowing the design of your database, I can't tell, but you may have a problem if the age is NULL. With my query, I think NULL ages will cause GeneralRetailContactFlag to be 'Yes'.


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Hi
Thanks both for coming back to me.
The code above was merely an example to prove that the problem exists.
In the example, there are 2172560 records with null in the field before the update. However, the update in this case updated 2350671 records. Sometimes, it updates a different number of records.

George - thanks for the advice. We have a workaround where we set the fields to blank straight after the fields are added. This looks like it works and there is a minimum amount of code to test and regression test. Generally, the fixes are fairly straight forward, however because of the high number of fields in the database, the amount of work to change, test, etc will keep us busy for a couple of months.

Bill - Yes you are correct, I made a mistake in my definition of the problem. The sequence of events listed was for a different field update. However, the problem remains the same.

I am very concerned that there is a fundamental problem in SQL server somewhere that we have unearthed. We are in the process of contacting Microsoft now, but from experience, I know that there are some extremely knowledgable people on this forum.

Thank you
Mark Elvin
 
My $.02, VAT included

Do it with one query as George explained (you are updating all records anyway, right?).

And... create index after UPDATEs. No mass nonclustered index updates (= much faster), no SQL2kSP3 going bonkers with parallelism - and this index is already useless for optimization of these two (one?) UPDATEs.

And #2... general usefulness of this index is doubtful since there are only two distinct values, especially if (when) percentage of 'No' values exceeds 20%.

And #3... what is wrong with tinyint instead of varchar(3)? Less storage space, no collation things, same functionality. IMHO varchar(3) is somewhat high price for making column value human-readable.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Hi
Thanks for your response. What do you mean by "no SQL2kSP3 going bonkers with parallelism". As you can see, when I force MAXDOP to 1, the query works fine. On another separate issue, we have been receiving deadlocks with Business Objects queries and we have had to set MAXDOP to 1 to stop the issue.
Is there a known problem with SP3?
Mark
 
> Is there a known problem with SP3?

As far as I remember, at least 3 or 4 known problems involve(d) parallelism and updates of nonclustered indexes... quite possible.

What I'm also trying to say: if you create this index after UPDATEs, this problem will very likely not manifest. And total exec time will become much shorter.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Hello
I can recreate the problem with this script. It takes a little while to run, but it is interesting.
Mark
------
set nocount on

declare @t table (i int)
print 'populate temp table with 3 million records'
declare @count int
set @count = 0
while @count < 3000000
begin
if @count % 100000 = 0 print @count
insert into @t (i) values (@count)
set @count = @count + 1
end

print 'create a test table'
select i, replicate('Z', 1000) f
into test
from @t

--delete from @t


set nocount off

print 'get total records (3 million)'
exec('select count(*) from test')

print 'add a new column and index'
exec('alter table test Add [v] varchar(3) NULL')
exec('create index i1 on test (v)')

print 'update every 2nd row to ''Yes'' (must be 1.5 million)'
exec('update test
set v = ''Yes''
where i % 2 = 1')

print 'count records (must be 1.5 million ''Yes'' records + 1.5 million NULLS)'
exec('select v, count(*) from test group by v')


print 'update remaining rows to ''No'' (must be 1.5 million)'
exec('update test
set v = ''No''
where v is null')

print 'count records (must be 1.5 million ''Yes'' records + 1.5 million NULLS)'
exec('select v, count(*) from test group by v')


exec('drop table test')

print 'done'

------
My result is
count records (must be 1.5 million 'Yes' records + 1.5 million NULLS)
v
---- -----------
No 2958877
Yes 41123

done
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top