Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Separate tables? 1

Status
Not open for further replies.

barbola

Technical User
Feb 27, 2003
1,132
CA
We have a huge database with one table having over 1,000,000 records (okay, that's huge for us!).

We get data from about a dozen different customers and put it all into an SQL table. The data is for the sale of the same product (livestock), but each customer has its own way of reporting to us as to the quality and grade of the animals.

As an example, for Customer 1 the table has 25 fields updated of the 50 fields in the table. Customer 2 may have 30 fields, but all in all, there are only a dozen common fields between the customers.

The data from Cust 1 in column Z has a totally different meaning from Cust 2's data in the same column even though they call it the same thing, and technically should be in different fields.

I am in the process of creating separate tables for each customer and copying the data over. Then I will create one master table (or maybe a view) with the comparable common data.

Does this sound like it will be more efficient or should we keep it all in one table? Any ideas? Anyone?

thanks.

Thanks!
Barb E.
 
I would keep it separate and create a view or views to unify the data for purposes when the data needs to be seen as one table.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright
 
I think it depends on what you are using the data for. If in your reporting and querying, you are usually only looking at one customer at a time, then separate tables can makes sense. But if what you really need is the consolidated data, then separating the data will make things even less efficient (and much harder to code against unless you use a view which may negatively affect performance).

In the second case, what I would do is transform the data into the correct fields on import. It doesn't matter what the customers call the information, it matters what you call it. I have two different databases which rely heavily on data received from multiple sources that are in different formats and structures when I get it.

As a result I have very complex transformations built. However, the data is all consolidated as it needs to be for the purposes I'm using the data for. Data fields, they send me that don't fit my data model, I throw away. Field sizes are determined based on which of my data providers has the largest field size and then a fudge factor is thrown is as they are known to change structure on me without warning. Non relational data is made relational as part of the transform and their fields are mapped carefully to my fields with the exact mapping differing from data provider to data provider. If there is differnt data from different providers that I want to use in a consolidated fashion, I put the field in my data structure but allow nulls so that data sources which don't have that informaion will not fail in the import.

With tables that size, you need to become expert in performance tuning. I suggest you look at some books on the subject. Some immediate things to consider. Put indexes and transaction logs on separate drives from the main tables. Get rid of cursors. Avoid in, not in, like %sometext' etc. Never ask for more columns than you need, especially never use select * if you have an inner join. Find ways to reduce all information sent to the bare minimum. Change subqueries to derived tables (check results, this improves efficeny in most but not all cases). Don't use union or distinct unless you have no other choice. Use table variables instead of temp tables. Make sure you have good indexing. Avoid dynamic SQL. If you can archive off old information that is not used frequently, so so (this is much better than creating multiple customer tables). If there are multiple ways to get the same result, try all of them for any poorly performing query until you have the best performing one.

Questions about posting. See faq183-874
Click here to learn Ways to help with Tsunami Relief
 
If performance becomes and issue, and if disk space is not an issue, you could consider having three tables, one each for the detail and one combined table containing only the fields common to both. This could be easily maintained by using replication. Again, as I mentioned, this will require more disk space than the views.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright
 
Thank you - that is alot of valuable info. Performance is a huge issue and we report on the consolidated info by Sales Origin, not customer. Importing the data into one table will be more efficient for processing, jut more difficult to import and maintain and customize. We end up with alot of blank fields that should probably be thrown away so I might look into that.
Our custom import programs use Perl and Phantom which I know very little about-I can make minor changes, but I have tried from scratch to create some for new customer file formats and can't get my own programs to work. We don't have the expertise nearby anymore so I'm just going to have to keep trying.
I'm taking VB courses right now (took a few DB courses as well) and have worked with SQL alot so I am learning!



Thanks!
Barb E.
 
Well personally I do all my importing using DTS or T-SQL code. One technique that I find helpful is to import the data into a holding table and then use T-SQL code from there to parse it correcty into the real tables. This is especially helpful if you get flat file data where different lines may mean different things and use differnt fields and field lengths.

Questions about posting. See faq183-874
Click here to learn Ways to help with Tsunami Relief
 
Time to pull out my big fat SQL book I guess!

thanks.

Thanks!
Barb E.
 
Just a thought,
I had to build a database that kept track of Product dimensions. seems simple enough untill you findo out that the products were of every size and shape imaginable.
my solution, and I think this could apply for you, was to make a table that had a dimension type then a 3rd table that had Part number, Dimension Type and then the true dimension.

For you this could mean for instance if Farmer A has a column named weight(meaning live weight) and Farmer B had a Column Weight (Meaning Butchered weight) you could have a table for something like Detail Type with a detail type of Live weightr, and another of Butchered weight. Then have a correlation table where you have a CowID, DetailType, and Detail

HTH

Bassguy
 
I tried posting earlier to this but timed out...

First, protect the data. Since you have several customers all of whom submit their data in a non-standard format, keep the import processes separate. It seems you have very little control for this part of the process. Last thing you want is for Uncle Bob to change their format and kill good data from the others. This means separate input routines and temp or final target tables.

An import routine, regardless of platform should verify the integrity of the data -- import the good data and flag the bad. Accuracy is the most important, but it sounds like speed and performance are not far behind in terms of importance. You may find that you can improve the routines as you work with and become more familiar with the data, and as your skill-set improves.

Second, to produce meaningful and comparative information, you do need to see what fields can be compared. No point in comparing fat indexes if different measures are used. Using accepted standards within the agriculture industry, you should be able to find some fields that are a match.

Third, can you normalize part of your database. A million records is a lot. You may be able to improve consistancy, reduce redundancy and perhaps improve performance by being able to break up your data into different tables. For example, you can break up Sales in to

tblSalesOrder
SalesOrderID - pk
SaleDate
Supplier
Packer
LotNo

tblSalesOrderDetail
SalesOrderDetailID - pk
SalesOrderID - fk
AnimalTag
Breed
Weight

This is just something I dreamt up, but I hope you see what I am getting at. Your import reoutine(s) should be able to take the raw data and parse it out as appropriate.


Fourth, take control. Surely, you can attempt to provide some guidelines for your customers to comply with. Reference industry and legislated standards where applicable. The more standard your data is across the board, the more meaningful the information you can generate. A million raw data records is an awesome amount of information just waiting to extracted from a talented person such as yourself.

Richard
 
Thank-you Richard. I am talented! LOL

The info we get pretty much follows industry standards, except we sell to both Canadian and US packers but the data is similar - not sure yet if I'll have separate tables for this.

Personally it would be a huge accomplishment if I could re-write the import programs in either SQL or VB (or Access).
I know the data and what I want done to it I just don't know how to program it. The programs for the original two packers works great, but now we have over a half dozen.

My first step will be to review the data that is coming in and write out step by step everything that has to get done to each packer's data to get to the final target tables.

Then I will decide whether I need the fat book or a real programmer to help me....or just you guys!

thanks again!

Thanks!
Barb E.
 
Ahh those Canadians and their metric weights, and two languages ;-) Since you work with to federal levels of legislation, and possibly different levels of state and provincial, you are "stuck".

And I would suspect you need to focus on VB or other procedural langauage before you can use SQL. SQL will be more effecient for extracting information from relational databases, but you need to get the data into a reasonable relational format first. You need to dynamically work with the data. If speed is improtant, then you may need to move away from an interprative language like VB to a compiled program to something within C. Maybe FoxPro might be a good mix?

Lastly, "understanding the data" is really important. There are many I have seen who do not understand this concept. However, I suspect you should hit your boss for some training. A two week or six course might be perfect to augment your skills and help you resolve your problem. If you "understand the data", during the course, you should be able to connect the dots very quickly when something very applicable to your situation is presented.

Richard
 
I have resources for Perl, so I might stick with that.

thanks.

Thanks!
Barb E.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top