Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

To normalize or not to normalize

Status
Not open for further replies.

Prosperity4

IS-IT--Management
Oct 6, 2003
3
GB
Hi All,

I'm currently designing the tables and relationships for a new database application. It is coming apparent to me that one of the tables should not be normalised and nulls are necessary. I can not find any guidelines on when not to normalise. Does anyone know of where I can find information on when not to normalise tables?

Thanks
 
Hi,

The basic rules as far as normalisation goes is use your gut reaction. Remember that non normalised databases will be slower and queries more complex to write (I did a degree project to prove this) with less flexibility as a result. However, for something like data warehouses/OLAP cubes, denormalisation is very acceptable as the data is stored only in summary format - rather than actual details.

John
 
I agree that it is basically up to the developer. You have to factor in or consider the database and app performance, your development skill (how much time will it add and is it worth it), the size of the project (present and possible growth), the tools and hardware available, etc., etc..

There is no real right or wrong. Just make the best decision that you can under your circumstances.

Thanks and Good Luck!

zemp
 
I also agree with John (jrbarnett). A highly normalized database is, in general, slower and harder to write queries for. As John said, you need to go with your gut. If you anticipate any possible changes to a table, I think it's better to err on the side of normalization.

When you need to make changes to the underlying table structure, the scripts needed to make the changes aren't hard to write. The hard part is explaining to your customer, or your customers IT department why you have to.

Steve
 
sfvb, i believe john said a nonnormalized database will be slower and harder to write queries for

let me add a wrinkle to this discussion

whether the database is normalized or not shouldn't matter to the application developers, as all they should see are views anyway

thus, the database structure can change underneath the views and the application should not need to be modified

no need to explain anything to anybody

;)
 
My take on this "grey" or it "gray" issue...

If you need to frequently get info on the the many side that is only available on the parent side, then it make sense.

If need to grab a calculated number on a customer or account table... For example, current outstanding amount owing. Or monthly balances in a GL account. Sure you can go through the transaction file and calculate the value, but code for this can be cumbersome, and slow if there are tons of transactions. A real good example would be ending balances for an account for each year. Are you going to go through the transactions for all of 2001 to grab a number that will never change, or just include it in a balance file?

If you are end up having almost a one-to-one for a one-to-many relationship. For example, selling products at a craft show where every invoice will result in a new customer.

In the first case, if the information is changed on the parent, then most likely code will have to update the child records to reflect the same value.

For the second, updates to the transaction have to also update the total on parent table, and have to have the ability to recalculate.

For the third, a proper normalized table would include a cutomer table plus the invoice and shipping table(s). Yea -- you have been there, done that -- relationship databases 101. Since just about every sale at a craft show will involve a new customer, then storing this info on the invoice table alone would seem to be more pragmatic.

So I guess that is the rule -- when it is more pragmatic to do so.

Richard

 
r937,

You're right. I misread the post. I'm not really sure my post is at odds with John's. From practical experience, extremely normalized table(s) are generally slower to query from than moderately normalized table(s), and the queries definitely harder to write, (at least write properly from a speed standpoint.)

As far as "views" are concerned. A view is just a virtual table based on actual table(s). A query on a view can't be any faster than a query on the actual table(s). Tables can be optimized (read normalized) for fastest query time, fastest update time, etc... Properly normalized tables will yield the best time results based on the expected frequency of a particular DML statement. For that reason, the application developer cares how normalized the table is.

Once the underlying table is changed, it may or may not effect the view. If it effects the view, then that view has to be replaced, and thus, any application that relies on that view.

As far as not needing to explain anything to anyone, try changing the table structure to a 24/7 database that uses existing applications to write to a table that has been changed. Just see how fast the IT department calls you with all sorts of complaints reported by the staff. I can guarantee, they'll want to know what your did, and why you did it.

Steve
 
steve, you're right, since speed is so important, there's not much point in trying to define views that are independent of the underlying structure

thus guaranteeing continued employment for all

sarcasm?

only a little

 
Data normalization aims to eliminate data redundancy. But in many cases it results in processing redundancy.

Say you are producing a report that shows total sales for each of the past 12 months for each product. In a normalized database with no data redundancy you would have to repeat each month's processing 12 times. In a case like this it would be better to de-normalize and store each month's summarized sales figures in the database the first time you calculate them and simply retrieve those summarized figures for the next 11 months.

This does mean that if any of the past months' data changes the change(s) will need to be programmatically applied to the stored summary figures. That should be relatively rare. Accounting audit restrictions serve to make it more rare. However, it does mean extra programming.

I actually faced such a situation in one of my projects. At the end of each month the computer spent 30 hours doing nothing but producing this report. I changed it to store the monthly summary figures and modified the nighly batch transaction entry program to update the current month's summary figures at the same time.

This increased nightly processing from 20 minutes to 40 minutes but reduced month-end processing to less than two hours for just retrieving the summary figures and formatting and producing the report. This gave the users the additional benefit of having upto-date MTD figures for the current month whenever they wanted it.

Gunny
 
Prosperity4 (if you are still following),

Why don't you give us a little more information about your table and why you believe it shouldn't be normalized? Why do having NULL values in the table indicate to you that it shouldn't be normalized?

Then maybe we can help you determine the best thing for you to do.

Leslie
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top