Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

When to break a table and when not to 1

Status
Not open for further replies.

tdion

MIS
Dec 18, 2003
61
US
We are taught to break tables into multiple tables to reduce redundancy. However, sometimes it's hard to know if you are really benefiting when you exercise that practice or not. Specifically, you are told to break down tables according to functional dependency.

In looking at my table, I can't tell if I am saving any storage space by breaking it down. It has 4,000 records in it, with only 1,700 entries. There are 900 dupes.

Does anyone have a method for deciding when to break their tables down further and when to leave them alone?

 
Use normalisation. Design your tables to Boyce-Codd Normal Form and only deviate if you really need to. Forget space considerations until you get a message saying you're out of space.

 
No BNPMike.

I can look at my table and tell that I've got redundancy. The question is, in the long run, will I benefit more from leaving it alone or splitting it up.

When you split a table into two seperate tables, it is costing you four bytes per record in each table. In other words, the "link" fields will take four bytes whether you link or not.

I'm not one to just "follow rules" when I design my dB. I think it through. Anyone else out there with any ideas?
 
"When you split a table into two seperate tables, it is costing you four bytes per record in each table"

As it happens, the link takes up whatever the space of the join key is. There is no 'link field' per se in a relational database. That was one of the fundamental designs 30 years ago.

The issue of space is of no consequence. In fact generally normalisation reduces space required. The problem of normalisation is you have to carry out joins which create huge amounts of i/o, and also complicate your coding.

Normalisation gives you the greatest flexibility to develop your data model as time goes on without ruining all the programs you have written so far.

Breaking this rule is called 'de-normalisation' or 'collapsing'. This used to be common years ago but with huge and fast computers like modern pcs are, you hardly need to do it nowadays - anyway not with an Access/Jet application.

 
Of course there is a link field! How else can one table reference another one? The fact is, there are 8 bytes spent on every link in one table to another.

Normalization is great in many circumstances, but I believe it can bloat the database when there is little duplication in a field of a table.

Lastly, the fact is that when the table grows, it becomes slower, which is my primary concern. My concern is not to design a database in BCNF that is larger than if I hadn't followed all the rules.

TD
 
"Of course there is a link field! How else can one table reference another one?"

There are no links. When you issue a join the database will examine the two fields you have mentioned for each join and will carry out a series of calculations on how it might link them together. It may involving sorting the two sub-tables and merging them, carrying out repeated loops or it might manipulate indexes (Jet is proud of its Rushmore Technology).

Sometimes you can influence the decision the database takes (as with Oracle and MySQL) and usually you can get the database to explain how it will carry out the link at run time (DB2 and MySQL for example). As far as I am aware Jet cannot be directly influenced about how it carries out joins and I don't think you get it to do EXPLAINs.

As you appear to be somewhat sceptical about how relational databases work you can verify what I am saying for yourself. Create a database with just two tables. Put a million rows in each. Look at the size of the mdb. Now 'create links' ( I presume you mean declare Access relationships) and you'll see the size has gone up a little bit but not by 1 million times 4 bytes.

Also if you declare 'links' on two tables of million rows each, you'll notice that there isn't a long pause and a load of disc activity. So how does the information get into the 'link' fields?

 
tdion

Just found this in
For queries involving more than one table, the optimizer must consider the cost of joins, choosing from the following five types of joins:

Nested iteration join
Index join
Lookup join
Merge join
Index-merge join
The Jet query optimizer uses statistics about the tables to determine which join strategy to use. Each possible join combination is considered to determine which will yield the least costly query execution plan.
 
This is my rule: when I start repeating data, it's time for a new table.

space is cheap. I've been doing this for years and have never had problems. I have also taken over a few non-normalized databases and spent a long time getting them up to snuff. i vote that normalization takes precidence over any other concerns.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top