database design question

AlbertAguirre · Apr 11, 2007

I am building a table that has26 columns. This is uncommon in my opinion because I have found that if you use normalization techniques you will usually create smaller tables that relate to each other.

In this case I have no choice but to create this large table but half of the columns are normally used in reporting.

The question is, does it make sense to split this large table into two tables with a one to one relationship?
One table for commonly used data elements, the other with the uncommon data.

johnherman · Apr 11, 2007

How many rows do you expect to have in the table? If just a few thousand, then probably no difference. Also, how many rows will the average report return? If small, again, no difference. If large, might want to consider offline (datamart) reporting.

You could also set up a prototype and do your own performance testing, which since it would be in your environment, would be almost ideal.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

AlbertAguirre · Apr 11, 2007

Good questions. I should have included that in my description.

Expected records are 43,000 starting off but will increase daily. My projection is 200,000 in the first year.

Average report will return 10 to 30 records but potentially could return a few hundred for certain users.

yes performance testing is in the plan BUT i am under a time crunch now and need to get this out. It will be released BEFORE the performance testing would be complete.

Your thoughts?

johnherman · Apr 11, 2007

200k is not a lot of rows for Oracle, SQL Server, etc. It might be for Access. Did you do any capacity planning with the hardware or database vendors, or is this sort of an ad-hoc project on existing hardware using existing database licenses, etc? If a new purchase, the DB and or hardware vendor each have a questionnaire which will determine the optimal hardware for your plans (number of cpu's, memory, etc). In other words, the reporting size is not large, but if the database is being hit by large numbers of online users processing transactions, it could be a problem.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

AlbertAguirre · Apr 11, 2007

I am told hardware is top of the line, new and has horsepower. The database is postgres.
I am expecting approx 1000 users.

What do you think?

lespaul · Apr 11, 2007

And really the table design is based on the data, not the capabilities....Are all 26 pieces of data information about a single entity?

Do you have any fields like SomeIdentifier1, SomeIdentifier2, SomeIdentifier3, etc.? That's an indication you need at least one other table.

Leslie

Anything worth doing is a lot more difficult than it's worth - Unknown Induhvidual

Essential reading for database developers:
The Fundamentals of Relational Database Design
Understanding SQL Joins

AlbertAguirre · Apr 11, 2007

lespaul
No not at all. I am quite good at database design and normalization comes easy to me.
There are no columns that I can put in a child record table. I wish there was.
Yes all columns contain data unique to this record.

So given all the above, what do you guys think? Should I split the table into two and query the second only as needed? Will this increase performance? makes it more efficient?

-A

lespaul · Apr 11, 2007

I'm with you, normalization has always been something that come naturally to me...I'd have to say that keeping it all in one table would be my choice.

Good luck!

leslie

AlbertAguirre · Apr 11, 2007

Its killing me to see a table with 26+ columns. Not natural.

Anyone else?

RiverGuy · Apr 11, 2007

I see tables in vendor supplied systems with column counts exceeding 26 on a very regular basis (there are some with over 200 columns). It makes doing analysis and queries a bit harder when you have to scroll lots of columns. I would think about splitting the table into two if some column values are optional-instead of nulls, they could simply have no record in the second table. But I would only do this if it logically made sense. For example, I wouldn't put AddressLine2 in the second table if it allows nulls while keeping AddressLine1 in the first table.

I'll end this by stating that in my opinion, 26 columns in a single table will probably make things easier on your developers than constantly joining/inserting into/deleting from 2 tables.

You can always use views for your report writers to select from.

AlbertAguirre · Apr 11, 2007

Agreed. It will be easier for the developers but what about performance for the users?
Is there an advantage to splitting up the table?

AlexCuse · Apr 11, 2007

AlbertAguirre -

I have some tables that contain as many as 50 columns (containing customer's transactional data). I don't need to report on all these columns, so I set up views to run my reports off of that contain only the columns needed for reporting.

I have been experimenting with adding indexes to these views lately (not sure if this is an option in postgre), and noticing a nice little performance boost in my testing. It is only when this becomes too slow that I would consider adding a table.

Hope this helps,

Alex

Ignorance of certain subjects is a great part of wisdom

SQLSister · Apr 11, 2007

In SQL Server there is an absolute limit to the number of bytes per record. You can set up more fields if you are using varchar type data but once a record goes over the total number of bytes allowed it cannot be inserted. You might check your databse's record limits to see if you need to split this up to make a one-to-one table to prevent possible insert problems. Only you can assess whether this is needed for your particular data or not. Sometimes you know that if one file has a large value init then other fields will not based on your business rules, but without seeing your data and business rules it would be hard to assess how much of a risk you are taking having all the fields in one table that potentially exceeds the record lsize limit.

Questions about posting. See faq183-874

elsenorjose · Apr 11, 2007

Albert, I'm no DB expert but this stands out:

In this case I have no choice but to create this large table but half of the columns are normally used in reporting.

Does that mean that only half of the columns are reported on? If so, why are the other half in the table?

I've been working on a project very similar to this and am given log files which have about 30 columns for import into my DB. Of those, only 5 or so columns are used in reporting but we need to store all columns in the DB in case someone wants to report on them later. Is that what you're facing?

For me, the answer was to use materialized views. Since I don't need to query all the columns, I just create 'subject area' views which are grouped differently depending on the analysis being done.

Another less ideal solution is vertical partitioning. Although not fully supported by most SQL DBMSs, this is similar to horizontal partitioning where you split your table on ranges or lists. Vertical partitioning is splitting a table on columns. I think this is what you alluded to earlier but I wasn't sure if it was what you meant. If so, the main drawback is maintenance. Updates are made to each table you create since there isn't a relationship between the tables. Google 'vertical partitioning' for more info to see if it's a solution that might make sense for you.

-Jose

Olaf Doschke · Apr 12, 2007

What speaks against a 1:1 split of the table? You don't need a record in the secondary table everytime, which saves space. Do you expect the join performance to be bad? Or do you expect due to Murphys law there will be some records with more than one detail record?

Make those two tables and a view, that joins them to one.

Bye, Olaf.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

database design question

AlbertAguirre

Programmer

johnherman

MIS

AlbertAguirre

Programmer

johnherman

MIS

AlbertAguirre

Programmer

lespaul

Programmer

AlbertAguirre

Programmer

lespaul

Programmer

AlbertAguirre

Programmer

RiverGuy

Programmer

AlbertAguirre

Programmer

AlexCuse

Programmer

SQLSister

Programmer

elsenorjose

Technical User

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor