Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Should I split a "many columns" table into some "less columns" tables? 2

Status
Not open for further replies.

florindaniel

Programmer
Dec 4, 2009
120
RO
Hello,
I've searched for this topic but the approach I've found is about splitting a table into many small tables BY ROW;
the issue here is about the necessity of spliting a large table BY COLUMN.

My problem is that I have this 500 column table, and, let's say 300.000 records.
Should I use it like this or it's better to split it into 5 101 columns (1 extra column for the primary key) tables,
each having 300.000 records? There's nothing to do with normalisation, it's simply about 500 different parameters
for each record. Actually, the storage space increases by dividing the table (due to the extra primary key field needed
in each table).

Thank you,
Daniel
 
500 columns in one table?
WOW!

What you keep in that table?


Borislav Borissov
VFP9 SP2, SQL Server
 
I find it difficult to believe that in any main stream business application there's an item with 500 different parameters.

A table should have one purpose, and one only. Likewise, a column should have one (and only one) purpose.
Can you post some information about what the table and its fields do?

This does indeed seem like a candidate for splitting into numerous lesser tables.

Regards

T
 
Well, 500 parameters it is. Those are different, non redundant values.
As you can see, row number is not so big but each sample is 500 params wide.

So, is it more efficient one table like:

ID, p001, p002, p003, ..., p500

or 5 related by ID tables like

ID, p001, p002, ..., p100
ID, p101, p102, ..., p200
ID, p201, p202, ..., p300
ID, p301, p302, ..., p400
ID, p401, p402, ..., p500

Thank you
 
Can you split it logically (i.e. first 150 fields are parameters for this, next 99 are parameters for that)?
If you can that you can d this physically also.
If not then better keep this table single.


Borislav Borissov
VFP9 SP2, SQL Server
 
Why are you asking? Are there performance problems? I know it would be very difficult to work with such a wide table but unless you have some compelling reason to break it up, I'd leave it alone. Would there be other problems caused by breaking it up? What about stored procedures? Would splitting this table up to multiple tables cause more work in changing stored procedures than the change is worth?

Just a few thoughts.

Tom

[frosty]




I keep trying to do something about my procrastination but I keep putting it off until tomorrow.
 
Thank you, Tom

I'm asking because I am in the design phase and I (still) can have a choice:
one or several (same No. of records) tables.
I am not familiar with the intimate aspects of how SQL servers work and I try
to avoid some future problems.
The truth is I incline more for the one table formula since splitting it into 5
or more, or less, related tables, with the same number of records seems having no
sense. But if SQL server manages better 5 tables than a single larger one...
The parameters can be logically grouped (internal, environnment, external) but
that doesn't change the situation since there is the same number of readings/records
in each table.
Thank you,
Daniel
 
p001, p002, ..., p100 etc. looks like misuse. Even though just storing one parameter field in each record would make it 5000 times as much keys and records. But then I have the same question as Thargy, what has 500 parameters? What is the real world purpose of the table? Are all p-fields always filled up with 500 values, or are they nullable?

Bye, Olaf.


 
Thank you all for help,
(pxxx can, sometimes, be null but there's only one index key/table: the ID)
It seems it's better to stick to my initial mind and keep the values in
a single table.

Thank you,
Daniel
 
Daniel,

you should be aware that RIGHT NOW is your best ever chance to get this right. If you want to make a change now it's relatively easy.
Once you've created this monster table, and someone has written code to use it, change will be much harder, expensive, longer and therefore will likely not be allowed.

If you are sticking to a 500 column table just because it's what you first thought of, then that's wrong. You should invest a lot of time in design, as it will save even more time later on.

Please describe (in an English narrative) what you're trying to do here. Please post a few of the columns and their purpose so we can understand your situation.

Regards

T
 
One question to ask yourself when presented with a table like this is: what do I do if parameter 501 is required at some point in the future? If the answer is "Update the table schema", then the design is wrong.

So each row in your table represents a unit of something. It has a description. It also has (today anyway) five hundred describing elements or "parameters".

Think about a table that describes the thing, and then a related table that contains the parameters. I'll post an ERD in a few minutes.

-----------
With business clients like mine, you'd be better off herding cats.
 
Ugh, can't post my drawing here.

Anyway, the idea is (tables in CAPS)

Code:
WIDGETS <--> WIDGETS/PARAMETERS (param values go in this table) <--> PARAMETERS

So you can have as many parameters defined as the DBMS can handle. This is a bit more high-maintenance than a flat table, but it makes more sense from a design and storage standpoint.

-----------
With business clients like mine, you'd be better off herding cats.
 
Thank you all for help and kindness,

My concern was/is related to the way SQL server optimises the queries and,in other words
the question would be: what's more efficient? a single (1ID + 5n) fields table or five (1ID + n)
fields tables related by ID, n being an integer. (5 is also an example)

It seems to me there's no gain by splitting a wide table into several narrower ones since
the number of records is the same in each table (and the same as in one large table) .

Thank you again,
Daniel
 
Why would it not be advisable to normalize this table?
[tt]
ID, ParameterNbr, ParameterValue
[/tt]


Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
When you hear people talk about "expensive" queries, they are usually referring to performance. Expensive = slow.

Usually, the slowest part of SQL server is interacting with the hard disk (reading or writing). Therefore, it is usually in your best interest to minimize this interaction (usually called I/O).

SQL Server stores data to disk using 8KB at a time. When you have a table that is narrow (just a few columns), you will get multiple rows of data per 8kb page. When you have a wide table, with a lot of columns, you will get less rows (maybe even just one). Because of the performance with I/O, it's best to get as many rows per data page as possible.

Now... If you split your horrendously wide table in to narrow tables, it COULD give you better performance as well as REDUCING the size of the database.

Performance
I have no doubt that there is likely to be several queries that use all of the columns. However, you will probably have other queries that only use some of the columns. If you split your monster table in to smaller tables, the queries that use all of the columns will probably not get slower, but the queries that only use some of the columns are likely to get faster.

Database Size
Your database size will depend on a lot of things, and it is possible that the database will get smaller. It's not likely to get smaller, but it is possible. Remember earlier when I said that SQL Server stores data in 8KB pages to the disk? SQL server will not split data in a single row across multiple pages. Suppose your 501 columns requires 4.01 kb to store. SQL Server will not be able to store 2 rows per page, so it will only store 1 row per page. This means that almost 1/2 of the storage space is wasted on nothing. There's almost always some wasted space in a database, but wide tables are likely to waste more space than narrow tables.

You said earlier that there are 300,000 columns. If your ID is an integer, which takes 4 bytes to store, duplicating this integer in multiple tables will only require 1.2 megabytes per extra table. If you split your table in to 20 smaller tables, this would require (at most) 30 megabytes of additional storage, which is practically nothing at all.

My suggestion
Split your monster table in to smaller tables, but group the data logically so that some of the queries you may have only need the data from 1 (or 2) smaller tables. For example, if you are storing data about people, you could put address information in one table, demographics (like date of birth) in another, and so on. This way, if you are writing a query that requires address, but not any demographics, you will be able to use just the smaller table.

Now... please understand that this is my suggestion. You should do what is right for you. My advice is based on my many years experience of working with SQL Server, as well as my in-depth knowledge of the database engine. Please take this for what it is, advice.


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I'm not a SQL Guru, nor do I play one on TV, nor did I stay at a Holiday Inn Express last night... that being said...

I agree with Skip on this one. Looks like a job for normalization.
Two tables
Widgets
Widget_Params

Let's say that each of the parameters is 10 bytes. That means that each time you read a single row, you're reading 5000 bytes plus the Widget information (name? description?), parsing it, loading up a bunch of fields (depending on how you're reading it)

BUT... if it's normalized and then joined... you would be reading the widget information, and the 10 bytes (plus a few) for the parameter. The perception of the decrease in speed because of the join would be made up for by the increase in speed by not reading 5K every time. *PLUS*, it allows for future expansion of that 501'st parameter later.



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top