Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Database design tip

Status
Not open for further replies.

kim1969

Programmer
Oct 6, 2010
4
DK
Hi,

I am new to designing large databases. I could use some advice regarding the best database structure.

We are in a database design plan, so how would be the best way to do this.

I have a table, called transactions and this table holds all financial transactions within a company.

Our plan (dream) is to have thousands of companies on the system, that each has 1,000-5,000+ transactions, so that means we can potentially have up to 25,000,000+ records in the table.

My questions is – should I have one table for all companies ie:

table.transactions – holds all 5,000 companies * 5000 transactions = 25,000,000 records

or should I give each company their own table ie:

table.transactions_company_0000001 – holds company 1 and 5,000 transactions (records)
table.transactions_company_0000002 – holds company 2 and 5,000 transactions (records)
table.transactions_company_0000003 – holds company 3 and 5,000 transactions (records)
etc. * 5,000+ tables

If I give each company their own table it would also be more secure I think, but the administration of the tables will be more difficult, as if I need to add a field or make changes, I need to go through all tables or at least program a script that does it.

Is it possible for each table to inherit a scheme or something from a main table, so I only need to make changes to one table.

What is the best way to do this, performance wise and secure wise.

Any feedback is very much appreciated.

Kind regards,
Kim
 
In my opinion, you should have all the data in one table. 25M rows is kind-of-a-lot, but certainly not unheard of. A well designed table with 25 million rows can outperform a poorly designed table with 25 thousand rows.

This table (with 25 million rows) should be VERY narrow. What does this mean? Well... a narrow table is one that does not necessarily store a lot of data in each row, but is designed to store a lot of rows.

For example, you mentioned multiple companies. You'll need to store the company in the data, but how do you do this? You should have a company table with a CompanyId and CompanyName. The large table should store the CompanyId. The CompanyId column should be a simple integer data type (which takes only 4 bytes to store).

When you start considering other columns for this table, think small. smaller is better. For example, you mentioned 5000 different companies. An int has a range of -2 billion to + 2 billion, an small int has a range of -32 thousand to +32 thousand. An int uses 4 bytes, a smallint uses 2 bytes. By using a smallint for your CompanyId, you will save 2 bytes per row. Over 25 million rows, this means 50 megabytes.

And you know what... I couldn't even care less about 50 megabytes because hard drive space is very cheap (you can buy a terrabyte drive for less than $100). In this case, storage space is important because disks are slow, and minimizing disk access is important for maximizing performance. By using smaller data types, you will be able to fit more data in to a data page, which will cause less physical I/O and therefore better performance.

Anyway, general guidelines for large tables.... Try to avoid string data (if possible). Use the smallest data type possible. Carefully determine what the indexes should be. Setup regular maintenance for things like updating statistics and rebuilding indexes.

25 million rows in a table is not at all scary. You just need to be a little bit careful.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Oh yeah. I forgot to mention. With the help of friends, I create an application called SQLCop. This application checks for database design problems. SQLCop is absolutely free. You can download it here:

SQLCop

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I have a table, called transactions and this table holds all financial transactions within a company.

If I give each company their own table it would also be more secure I think, but the administration of the tables will be more difficult, as if I need to add a field or make changes, I need to go through all tables or at least program a script that does it.

This leads me to believe these companies are clients, unless your company has thousands of subsidiaries? Anyways, if that is the case, I don't really think this is a database design issue. No, 25 million rows isn't really even that large, and can easily perform just fine.

If I were a client sending your company my financial records, I would want complete assurance as far as data security is concerned, and mixing my data up with that of other companies' wouldn't really give me much assurance.

But if it turns out there isn't anything to worry about as far as those types of security concerns, then I'd definitely use one transaction table.

If you do need to separate the data for security measures, I would probably go with separate databases for each company. Yes, rolling out changes is more of a pain, but it would actually be much easier to accomplish than if you had a system residing at your clients' sites. And trust me, many systems vendors deal with that just fine.
 
Thanks for the feedback I really appreciate it.

My database is going to host individual companies, but there are many other tables in the database, but the most important one is the transactions table. They are all important, but thought of doing some extra security to the transaction table. For now I will keep it in one table I think.

If I run the system on a database for each company, I think the administrative would be too much to handle, but if I gave them each a transaction table, it could be easier to manage, rather than having 1000 databases.

I am affraid of the SQL injections - even we are installing a webapplication firewall too and working with stored procedures.
 
kim1969 said:
or should I give each company their own table ie:
Oh, please don't do that!

Don't graft "data" into the database design. Imagine some day you want to have a report where you dynamically pick wich company's data you want to view. What kind of SELECT statement could you use to fill a list box with company names?

One you start making your database schema part of the "data", you loose much of the power and usefulness of a database (you basically have a glorified spreadsheet).

If you haven't already done so, you should study the rules of Database Normalization.

 
okay, I thought if lets say I have two companies:

id:1 = Company A
id:178 = Company B

I could just do a SQL string like

select from transactions_table0000001 (compnay A)
select from transactions_table0000178 (company B)

And have all company main information (id etc) on a shared table.

But I see the idea - and if ie. 25 mill. records is no big deal, then it is much easier to have just one table to manage and optmizie on.

Thanks.
 
ohh, by the way - I just spoke with a company today and they want us to import all their transactions - and they have 27,000 transactions.

That is just one company - so it is a lot of data - and I hope to have thousands of companies!!
 
Congratulations on the order.

but I see the idea - and if ie. 25 mill. records is no big deal

25 million records CAN be a big deal or not. I all depends on how you build your tables, create your indexes, and generate your queries.

JoeAtWork suggested that you read up on [google]database normalization[/google]. If you don't know what this is, then you should stop everything immediately and begin reading up on it. You cannot "accidentally" get a large table to perform well. You need to plan it!

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
kim1969 said:
okay, I thought if lets say I have two companies:
id:1 = Company A
id:178 = Company B

I could just do a SQL string like

select from transactions_table0000001 (compnay A)
select from transactions_table0000178 (company B)

And have all company main information (id etc) on a shared table.

Suppose you want to find all transactions that occur on a particular date (regardless of customer)? On a normalized database the query would be very simple:

SELECT * FROM Transactions WHERE TransDate = '12-Oct-2010'

You could write a stored procedure that takes the date as a parameter, and once you have it tested and working never need to worry about it again.

Now, with the system you are proposing, to do the same thing you would need to write a really long query where you UNION all your customer tables together:

SELECT * FROM
(
SELECT Field1, Field2, TransDate FROM transactions_table0000001
UNION
SELECT Field1, Field2, TransDate FROM transactions_table0000178
)
WHERE TransDate='12-Oct-2010'

And every time you add a new customer you would need to update this SQL statement to add their table. It's a maintenance nightmare!

I've seen many posts where people have made things very difficult for themselves because they did not use a normalized design. What would otherwise have been very simple becomes very complicated if the design is not correct.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top