Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Anyone can explain null in database concept? 6

Status
Not open for further replies.

prosper

Programmer
Sep 4, 2001
631
HK
I know primary key field is not null. I think the field for record the time of transaction is also not null.
Are the fields that are not for primary key can be allowed null?
But other field seems to be not null for me. Are there any web tutorial on explaining on null?
Thanks for any help!
 
Do a search here...

It's confusing at first.

Null simply means "I don't know what the value is"
That is not the same thing as blank or zero-length string because those ARE values.

Most significantly for me, when a query encounters a null value and you're not specifically looking for nulls, that row will NOT be returned, even if all other criteria are satisfied. That can lead to problems in calculations - averages will not be correct if the row needed to be included.

Most folks recommend that allowing nulls be avoided.

It can be useful if you know part of the information, but not all. E.G., if you have an address table that has a separate field for box number, blank would mean there isn't one whereas null means the information is not available.



HTH,
Bob [morning]
 
Null can also interpreted as 'there is no definable value'. In contrast a blank or zero length string is a definable value.

zemp
 
Just to further complicate your life ... some DBMSs set a text field to NULL when you enter a zero length text string even though ... as BobJacksonNCI states ... those two should be different things.

Further NULLs have different rules ...

Technically, NULL is not a value ... even though phrases like "... has a NULL value ..." are common.

Because it is not a value, the operators that work for real values do not work for NULL.

For example If NULL = NULL ... will return FALSE because NULL is not equal to anything ... not even itself. Similarly 10 + NULL = NULL. Any arithemetic operation that involves NULL returns NULL.

To test for a NULL you must use If [Value] IS NULL or If IsNull([Value]) .... You cannot test If [Value] = NULL for the reasons stated above.
 
As NULL is for the database an "empty" value, may be useful when you are using referential integrity between tables or even on a date field when you don´t have a date to store in.

In the referential integrity case is useful because at the moment of entering some data the user don't know certain information and need to leave it blank.
Example:

You have this two tables:
Table Customers (idCustomer integer Not Null, Name char(50) not null default '', idCustType integer)
Table Customer_Types (idCustType integer not null, Description char(50) not null default '')

idCustomer and idCustType are primary keys in Customers and Customer_Types respectively.
idCustType in Customers is a reference to idCustType in Customers_Types with a restricted integrity.

1- If you set idCustType in Customers as NOT NULL the user will be forced to enter an ID that exists in Customer_Types, so if the user does't know the type type of customer the Customer_Types table must have an 'Unknown' type already created.
2- If you set idCustType as NULL the user can leave the field blank and change it when the type is defined.

Several programmers use the first approach, but personally I prefer the second one as for me it has several advantages over the first.

Gerardo Czajkowski
ltc.jpg
 
several clarifications:

null is not the same as an empty string

foreign keys can (and often must) be null -- here's a good example: Optional foreign key can be NULL (registration may be required, but it's free)

most people do not advise avoiding nulls (well, except the folks over at dbdebumph.com, but let's not go there, okay?)


rudy
SQL Consulting
 
r937

My statement "Most folks recommend that allowing nulls be avoided." was actually a quote from a respected contributor - unfortunately I cannot find the original reference.

Just do an Advanced Search here using "null" and selecting "Subject Line" and you'll get hundreds of threads back. Look through some of them and you quickly realize many of the problems would have been avoided by not allowing nulls.

The ratio of problems resulting from allowing nulls versus useful applications is substantial.

If you're a novice, as it appears prosper is, he will avoid grief by disallowing them until he has a situation that warrants them.

While I am replying to r937, the reason I took my time is for the benefit of the less experienced readers, not to fan flames of practicing database experts.


HTH,
Bob [morning]
 
i know the thread you are referring to, i was in on it, and it was SQLSister who said that

see thread183-803508

i respect her too, but avoiding nulls was not one of her better suggestions

i.m.h.o. of course

rudy
SQL Consulting
 
Just another twist on avoiding NULLs ...

You should use NULLs when the information is really unknown or not applicable. For example, the ship date of an order that has not yet been shipped is unknown and is a legitimate candidate for a NULL.

You should not use nulls as some internal flag that has some private meaning. For example, setting an employee's last appraisal date to NULL means that there was no last appraisal but setting it to "12/31/1899" means that the last appraisal did happen but its date is unknown. Those kinds of distinctions tend to be buried inside applications and the contents of the database cannot be correctly interpreted without examining the application code.
 
what a beautiful and concise description

better than anything i've written, so i hope you don't mind if i steal it (with attribution, of course)

another star from me, golom



rudy
SQL Consulting
 
Rudy
Be my guest ... but please see if you can straighten out my tortured grammar.
 
For an example of the difficulties that can arise, see:

thread705-822355

I'm not arguing, just illustrating. I managed programmers for a number of years and it was necessary to encourage productivity over perfection.

If a developer has a firm grasp of the concept, let'er rip. Otherwise...



HTH,
Bob [morning]
 
That example is someone mis-using nulls.

They're really simple... they're non-values. Something is null or not null, there is no concept of greater than or less than null. You should never use math on a Null field, you should never use anything other than a boolean on the field.

In my opinion...

If a field does not require data to retain data integrity in your DB/Application you should allow it to be Null, simple as that.

 
I don't know if NULLs are simple: however, the NULL concept is fundamental to understanding RDBMSs, and there is additional logic that must used to handle them correctly.

Oh, if you completely control the domain you can add values that mean "Dunno" and "Not Applicable", but then those values will vary from column-to-column. NULL is consistent throughout the database for dates, strings, numbers, etc.
 
skiflyer, could you please explain a couple of your remarks

"You should never use math on a Null field"

what about calculating an average value? aggregate functions like COUNT() and SUM() ignore nulls, so why would you say you shouldn't do math on these columns, just because a few column instances have NULL?

"If a field does not require data to retain data integrity in your DB/Application you should allow it to be Null, simple as that."

this merely shifts the burden from deciding what NULL means to deciding what "retain data integrity" means

what exactly did you have in mind there?

by the way, i am one of those people who feels that all columns except for primary keys should allow nulls

:)

rudy
SQL Consulting
 
Hi
The database guru most critical of nulls in database theory is C.J.Date. Or at least in the system of three valued logic that SQL does so poorly. As C.J.Date points out that SQL itself is not completely logical but that's another story. So yes, be careful when counting and summing - check the query results.

The best you can say, is that they represent "missing values". We agree.

Now you can attach all kinds of meanings to missing values, just be aware that in theory that is completely open.
In other words there is not supposed to be any specific meaning apart from "missing".

We know that the database can generate nulls when doing queries, but then we also know from the SQL manuals that these missing values are put there to represent the fact that the query engine could not find a matching row. So there is no mystery about the nulls that the engine inserts during query processing, they mean "not matched". SQL has attached a meaning in this case! Luckily we can have the manuals to tell us that.

A value that is mandatory cannot be missing, so it is fair to associate nulls with optional columns. If we dont have a value, SHOULD we use a null? That is the question.

Here is a simple case involving numbers. Suppose we have a rating scale -2 -1 0 1 2. If this was a Likert scale, -2 might mean "strongly disagree" and the zero might mean "neither agree nor diagree". It would obviously be wrong to use the value zero when we dont have a rating.

There are two solutions to this problem.

A denormalised "solution" is to attach a status column to the row, that carries the meta-data about the rating. There could be one, or a whole bunch of status reasons why the value was missing. Then it would not matter what value was carried in the rating, the meta-data is available in the status column. Ideally some sort of "set-valued" datatype would do nicely. (Is this denormalised? You could argue that the metadata in the status column is in fact about the meaning of the whole row. Its a messy issue, and this is a messy solution anyway.)

The preferred solution is to simply assign a null value and use whatever SQL lets you do to manage them.

That's all very well, but what about strings? Strings have a third competing solution, the empty string! But if we use nulls for missing numbers, then by rights we should use nulls for missing strings. Some databases simply enforce that. We should view the empty string in the same light as we viewed our zero rating value.

So then do we avoid empty strings altogether?
Well no. Just as we can attach meanings to zero, we can attach meanings to the empty string.

Suppose we have a customer who submits a web form.
You know for sure that they have been given the opportunity to comment. But the comment arrives at the server empty. The user has exercised their right to not comment.

And what if some of the existing rows have been obtained from another table using an outer join, with missing comments as nulls because the users of that database were simply not invited to comment.

The empty string is an opportunity here - it is tempting to capture additional semantics with empty strings to represent "no comment" metadata in this instance, with particular semantics. Fair enough. That doesn't conflict with the use of nulls.

So IMHO the long and the short of it is, that whatever your design means, try to avoid using nulls, let alone empty strings. When you use nulls, avoid attaching meanings.

And if you are using internal meanings at all, given the world is not ideal, then document them. Caveat emptor.
 
Sorry r937, this thread fell off my radar somehow.

What I meant is... no math on nulls.

If you want a SUM, you don't include the row with the NULL value. Sounds pretty unimportant. But same when you take an average... so

Avg(5,0) = 2.5
Avg(5,NULL)=5

this merely shifts the burden from deciding what NULL means to deciding what "retain data integrity" means

It means the database is incorrect if that field is unknown. Primary and foreign keys are the obvious first point here... but also, say there's a field for a timestamp, meant to be updated everytime a row is written... this field also cannot be NULL, say it's a gender field for a person... this field cannot be NULL because every person has a gender (you may need to go beyond simple genders, but that's a discussion for a different thread)... of course maybe in your database that's unimportant information, so your definition of integrity changes.

And so and so forth... it all depends on the DB, personally in most of DB it's all NOT NULL, and as posted at the end of the last post... if I need to have null, it's more likely that value is stored in its own record which just doesn't get created when that value isn't needed.

 
you and i shall have to agree to disagree

i will give you just one simple example but i assure you this same argument applies to all your other columns, including foreign keys

gender can be male, female, unknown

one option is to create a "dummy" value for unknown

another option is to split gender off into its own table (you suggested this, too) so that an unknown value is represented by a row that isn't there

yet another option is NULL

in my view, when you add the pros and cons, there is only one sane choice

by the way, when you compute an aggregate like an average, the aggregate functions like SUM() automatically exclude NULL, there's nothing you have to do to avoid including those rows (like you would with a dummy value)

:)

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top