Should I have empty records or just get rid of them? 1

BigRed1212 · Dec 1, 2010

I have payments to entities over years. Not each entity gets a payment in each year.

For spreadsheet purposes I like to see an entry and a value even if no payment is made. This reassures me that the entity wasn't overlooked.

Simplified example:

Code:

entity year payment
1      1    10
2      1    20
3      1    0
4      1    10
1      2    20
2      2    0
3      2    10
4      2    20
1      3    0
2      3    10
3      3    15
4      3    10

For database purposes, however, as I am trying to think good thoughts about normalization, I wonder if I ought to drop the rows with zero for a payment. We're not talking huge numbers of records or a major commercial application, I'm more just interested in what is best practice.

If one replaced the zero payments with blanks (functionally the same thing- no payment) I would view the database as not rectangular and not normalized and drop those rows.

But like I say, I sort of like the zeros. If the records were just missing instead, I would feel like I didn't know if the payment amounts were really zero or if the data were actually really missing.

tcsbiz · Dec 1, 2010

I would set the fields with no payments to NULL. that would indicate to me there was no payment. A zero indicates to me that a payment was made, but, it was zero. Some applications need to know whether the payment was entered in error (zero or any other number for that matter) or if a payment was never made (NULL).

Why would having a record with zero make the database un-normalized?

[rockband]

alvechurchdata · Dec 1, 2010

In general you should drop the rows holding zero because there's sometimes a difference between zero and null. We for example sometimes have a sale of zero value between sites. A zero in the sales file means "There has been a sale but we didn't charge" whereas a null would mean "Nothing at all has happened".

This isn't to say you should be explicitly adding null values. A Left Outer Join between sites and sales will return all sites whether or not they've got any sales and and will automatically give null as the sales value for those sites that have no sales.

Geoff Franklin

http://www.alvechurchdata.co.uk

JarlH · Dec 2, 2010

What does the payment column data mean? (The 10, 20 etc numbers.) Is it a transaction id, or number of sales that year, or customer id, or year revenue, or year profit???

If its a sum of something I'd say keep 0 value if the entity existed that year. Set to null if the entity was (temporarily) shut down. Remove row if entity didn't exist.

BigRed1212 · Dec 2, 2010

Why would having a record with zero make the database un-normalized?

I don't know about these things. I just always thought that if some of your records had data in x number of fields and some in x+2 fields and some only in x-3 fields that you weren't fully normalized yet.

BNPMike · Dec 2, 2010

Your table is in Strong Third Normal Form.

I don't know precisely what your data is but it looks like the zeros are valid, and better than null. (Null means 'doesn't apply'/'don't know'/'not available'/'could be allsorts'. If you use nulls you can't easily ask Access for those months when turnover was less than 15).

Also remember that normalisation is not mandatory. De-normalisation is just as valid a DBA activity as normalisation.

Olaf Doschke · Dec 6, 2010

I'd add, the question to keep or delete these rows is not a normalization question.

I assume this table stores which entity got which payment in which year and you would not store records with no payment. I further assume there is another table with all entities and to get the information which entity got paid what in which year you'd do a left join of the entities with this, filtering for a certain year and thereby get a NULL for each entity with no payment in that year at all.

If you add that record with 0 payment you get a joined value of 0 instead of a non join leaving the field NULL.

Neither is wrong and I assume it's more natural to store no record for no payment in some cases and to explicitly store a record with 0 in other cases. Your reason to assures that no entity was overlooked seems plausible. Eg if that data reflects bonus payments of employees it's plausable to have that extra record there to denote no bonus is assigned, but the employee wasn't overlooked.

What I would never do is make the payment NULLable, you'd rather not at all insert a record in case an employee isn't among those receiving a bonus anyway instead of storing that with 0 or NULL payment. The NULL is created by a join of the entities table with this payments/transactions table.

Bye, Olaf.

BNPMike · Dec 6, 2010

I'd add, the question to keep or delete these rows is not a normalization question."

Normalisation is about avoiding substructure in your data.

Examine each table and see if you can find a set of fields which always determines the value of another field (a determinant). If there isn't one, then you don't have First Normal Form ie you must have a key of some sort for a table to be a relation. If on the other hand you find such a group of fields that isn't also a key (not necessarily the declared key - you can have more than one), then you don't have Third Normal Form which is the one we most commonly refer to.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Should I have empty records or just get rid of them? 1

BigRed1212

Technical User

tcsbiz

Programmer

alvechurchdata

Programmer

JarlH

Programmer

BigRed1212

Technical User

BNPMike

Technical User

Olaf Doschke

Programmer

BNPMike

Technical User

Similar threads

Part and Inventory Search

Sponsor