Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Should I normalize

Status
Not open for further replies.

jtrembla

ISP
Jul 6, 2006
104
US
I have the following data, and I was wondering if I should create a ErosionRateType and ErosionModel (with name and rate type) table or just leave the data as is since we are only going to have 3 erosion models at max. Btwm geography is an id field to the geography table.

Example data

Geography | Year | ErosionModel | RateType
==============================================================
France 1 GenericErosion Aggressi
France 2 GenericErosion Aggressi
France 3 GenericErosion Aggressi
France 4 GenericErosion Aggressi
France 1 GenericErosion Moderate
France 2 GenericErosion Moderate
France 3 GenericErosion Moderate
France 4 GenericErosion Moderate
USA 1 GenericErosion Aggressi
USA 2 GenericErosion Aggressi
USA 3 GenericErosion Aggressi
USA 4 GenericErosion Aggressi
USA 1 GenericErosion Moderate
USA 2 GenericErosion Moderate
USA 3 GenericErosion Moderate
USA 4 GenericErosion Moderate


 
I would, but then I don't trust Strings to be entered correctly every time. Also, it is a lot faster to look up "1"
than it is 'GenericErosion'. Not to mention you have left yourself room to expand in the event you do go beyond 3 types.

Just my 2 cents.

--------------------------------------------------
"...and did we give up when the Germans bombed Pearl Harbor? NO!"

"Don't stop him. He's roll'n."
--------------------------------------------------
 
Should you normalize? Yes. Absolutely.

Here's the thing. SQL Server stores data on the disk in 8K sections (called pages). With your table as is, you have a relatively wide table because of all the string data. If you normalize this data, 'GenericErosion' would only be stored once, and this table would have an INT ErosionModelId column (4 bytes) instead of the string data (more than 4 bytes).

This means, SQL Server would be able to store more rows from this data in a single page. By storing more rows per page, SQL Server will be able to read more rows when it actually goes to the disk (because it reads the entire page when it does this).

For a better improvement, use a TinyInt column which can store numbers in the range of 0 to 255. This will allow you to pack even more data in to a page, further improving your performance.

Suffice to say, your database will be smaller, and your performance will be better. If you only have a couple thousand rows, you may not notice the difference, but with millions of rows, you will.

The other benefit is that you can change the data in the lookup tables in one place without having to update all the data in your "big" table. Example... suppose you want to change Aggressi to Aggressive, with normalized tables, you change one column in one row of the table. Since the normalized data would have an ID pointing to the rate type table, you wouldn't need to change any data in the "big" table. Just the lookup table.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
thanks everyone. Points well taken.

So my final tables would be this.

ErosionModel
------------
ErosionModelID
ErosionName
ErosionRate (should i have a seperate table for slow, medium, fast values?)

Geography
----------
GeographyID
GeographyName

GeographyErosionModel
--------------------
ErosionModelValuesID
GeographyID
ErosionModelID
Value
 
should i have a seperate table for slow, medium, fast values?

Read this:


Without knowing your data, I would say, "YES". You should have a separate table for this. But... it all depends on your data.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
well I am concerned I am introducing more joins than necessary. At max I will only have 12 models.

I think a full normalization would consist of 5 tables. All erosion models are year based and only last for the first 4 years.

Should I break out the time info into a separate table to account for possible future changes?

example,

ErosionPeriodID
PeriodType (Yearly, Monthly)
PeriodOrder

Final Tables
===============
ErosionPeriod
ErosionModel
ErosionRateType
ErosionModelRateType
Geography




Final table: GeograpyErosionModelRates

Fields:

GeographyID
ErosionPeriodID
ErosionModelRateTypeID
Value
 
I once heard:

Normalize till it hurts, de-normalize till it works

When normalizing tables, it's important to think about the future. I first created my database 10 years ago. There are parts of my database that are not normalized. Clearly, I know better. Unfortunately, normalizing those parts will take a lot of effort. Had I normalized the DB 10 years ago (and wrote lots of code for normalized tables), I wouldn't be in the mess I'm in now.

Personally, I don't think 5 tables is any big deal. It wouldn't bother me.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top