Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Null values vs. No value at all 3

Status
Not open for further replies.

Rexolio

Technical User
Aug 29, 2001
230
I was told my someone recently that if there is no value to insert into a sql field, you should NOT let it be NULL, rather just insert a "" (blank) because SQL processes for NULL values last.

Is this true, or am I not making any sense? :)

[bugeyed]
rexolio@yahoo.com
"I'm not dumb. I just have a command of thoroughly useless information." - Calvin, of Calvin and Hobbes
 
I don't know about processing NULLs last, but there are other issues to consider. NULL means the data is unknown. Let's say you have a table with one column and three rows:

tbl_Numbers
col_Numbers
1
NULL
2

SELECT COUNT(*) FROM tbl_Numbers
returns
3
SELECT COUNT(*) FROM tbl_Numbers WHERE col_Numbers = 1
returns
1
SELECT COUNT(*) FROM tbl_Numbers WHERE col_Numbers <> 1
returns
1

Why doesn't the last one return 2? Because SQL Server 'sees' the NULL as unknown, so it doesn't know if it is equal to one or is not equal to one.

-SQLBill
 
This is a question of semantics and theoretical issues. SQL theory advises the use of nulls because of true set theory results. For instance, if an employee does not request Medical Insurance and you run a report against Medical Insurance, if the field was set to null, then SQL will not report it (unless directed to). If set to spaces, SQL will report that the employee has "blank" medical insurance and you will need to filter out the blanks for the report. It is even worse for a numeric field. Now you must use zero, negative numbers, or some extremely large value to represent your "missing" or Null data. Hope this helps.

Sometimes the grass is greener on the other side because there is more manure there - original.
 
Hmmm, okay. Thanks for the info. So, in the event that there is a table the collects info from a form on a web site, and the user leaves one of the fields blank, would it be best to convert it to a NULL or just leave it "blank"?

[bugeyed]
rexolio@yahoo.com
"I'm not dumb. I just have a command of thoroughly useless information." - Calvin, of Calvin and Hobbes
 
Processes NULLs last? I hadn't heard that but, in any event, why would it matter? It must process them (first, last or whatever) before it returns a result. Further, the SQL optimizer is a very sophisticated gadget and I would be wary of generalizations about it "always" working in a particular way unless there is concrete documentation to that effect.
 
That would depend on what you intend to do with the field. If the user skipped the field, I would make it null. If the user entered a space, and the application let the user enter a space, and if space is a valid member of that field's domain, I would let the space remain.

BTW, dates get pretty screwed up too when you try to use a default to represent Nulls.

Sometimes the grass is greener on the other side because there is more manure there - original.
 
Right on SQLBill.

It is that behavior that you see many db designers quote as rational for having very few columns that are nullable.

Frankly, the most frequent use of nullable columns that I support are for END_DATE type datetime columns where the end date might not be known (especially useful for tables with a temporal dimension).

Most everything else I require values for. It makes stored proc development and view development easier as it avoids extra checks for IS (not) NULL and use if ISNULL(). However, this standard isn't for everyone. And, the UI guys/gals go nuts because it makes everything a required field...which then gets us into the notion of default values.

TR
 
Thanks, Guys - I appreciate the info!!!

[bugeyed]
rexolio@yahoo.com
"I'm not dumb. I just have a command of thoroughly useless information." - Calvin, of Calvin and Hobbes
 
I checked two different books on performance tuning (SQL Server Query Performance Tuning Distilled by Sajal Dam and SQL Performance Tuning by Gulutzan and Pelzer) and they both say that the use of null will slow down processing. Certainly checking for nulls and replacing them with some other value (the isnull function) adds an additional processing step and could make a query against a long table slower. (I suspect however, you would have to have a very large number of records before seeing a noticable difference.) Therefore it should be avoided if possible. The if possible is the important part though. It is better to use a null if the data is not required or able to be provided at the time the record is created than to use "" or 0 or some impossibly large number or 1/1/1900 to avoid the use of null. But it is best to make required fields required and have them disallow the use of null not only for performance reasons, but for data integrity reasons.
 
Someone who has a little time should write a FAQ on this subject. How about it, Bill? Sister? Jay? Nigel?

Sometimes the grass is greener on the other side because there is more manure there - original.
 
see also thread183-816502

Sometimes the grass is greener on the other side because there is more manure there - original.
 
In my opinion, slight differences in processing are less of an issue than the logical one. Nulls are an important concept in logical processing, especially with aggregate sql statements, which (correctly) ignore nulls when, for instance, a Count() or Avg() is used.

I gave examples in thread183-803508

Trying to use 'default' values where a value simply can't be known will cause problems down the road, and the slowdown will be when the query has to

1: look up the 'default' value
...then
2: restrict fields (field-by-fiels) whose values match that 'default'

With nulls, the sql engine is built to deal with this in a known, predictable way.

The predictable way is simply, in vertical processing, ie
SUM:
Code:
2
NULL
10
15
NULL
5
-----
32
(Nulls are ignored)

In linear processing:
Code:
2 + NULL + 10 + 15 + NULL + 5 = NULL
(Null poisons the entire expression)

--jsteph
 
Personally I have no problems replacing NULLs with 0, or even "" (though it reminds me on bloody dBase), but for dates... no thanks. Two days ago I had to pull out simple age distribution report from existing database. Most of people were 104 years old. Guess why :).

Other than that, too many NULLs indicate flaws in DB/application design (rarely used fields, low normalization, loose data validations etc).
 
vongrunt,
<...flaws in DB/application design (rarely used fields, low normalization,...>>

In the real world, however, the type of normalization that would be needed to eliminate nulls is simply not feasible. A NULL is ideal for a field that may apply to all records in a table, but may not. Now, techically if you were a Normalization fanatic, you'd have that field in a separate table and join it on the pk/fk.
So take the example (granted, not a real-world example but for academic purposes...) a table 'TruckDrivers' with the field 'TruckModelYear'. Now, some drivers may not have been issued a truck yet. What do you put there? What happens when you run a query that needs 'Average age of trucks assigned to drivers' ?

The 0's instead of nulls will pull the average down to the point where it's a useless query. If it's Null, those are ignored.

And granted--in my example you may well normalize, but there many times when it's just not feasible, but most of us live and work and develop in the real world. And the hard-line ef codd followers with the elbow patches who have never seen the outside of a classroom may not stop arguing the point until their tenure ends (meaning never), but that's the way it goes.


Now, you could add the criteria WHERE truckmodelyear > 0...BUT, what if the query needs other info besides that? ie, 'Give me a query that lists the number of all Drivers, grouped by 'DriverCategory' with the average truck age for the group'

It starts to get hairy with the sql needed to exclude the 0's from the average, but include those records in the count for the group--and that's just a simple example.

Having NULL's in these fields make it a no-brainer. And it's my contention that once one understands what null mean to the application and what nulls mean to the database engine, then they are no problem at all.
--jsteph
 
> In the real world, however, the type of normalization that would be needed to eliminate nulls is simply not feasible.

There is a difference between "too many NULLs" and no NULLs at all.

> Now, some drivers may not have been issued a truck yet. What do you put there?

This is no-brainer - foreign key values are either valid (from primary table) or NULL.

> The 0's instead of nulls will pull the average down to the point where it's a useless query. If it's Null, those are ignored.

IMO good point. I said "I have no problem..." because my servers at work are full of data inherited from COBOL days, and some black-box apps that rely upon this data are sensitive to NULLs.

> And granted--in my example you may well normalize, but there many times when it's just not feasible, but most of us live and work and develop in the real world.

Look... saying "normalize" doesn't mean I am phD in Vorlon technologies. Everything is matter of balance.
 
vongrunt,
I didn't mean to paint you specifically with that brush, I just never know when I'm going to have to justify what others would call bad design.

And I know what you mean about the legacy stuff, I spent quite a bit of time dealing with old as/400 data where a space (not even an empty string, but an actual space) was used as 'null'. Somehow they got by but it wasn't pretty.
--jsteph
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top