Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Table design/Storing history 1

Status
Not open for further replies.

RosieMP

MIS
Jul 20, 2007
25
US
Hi folks--

I'm having a hard time deciding between design decisions that may or may not make my life miserable in six months.

I need a database to hold a significant amount of data about aging accounts. I'll add about 12000 lines each month about how much money various customers owe, and how long they've owed it (I don't have control over the fact that the raw data is calculated elsewhere, it's a BW thing and I just live with what I get).

So each month I'll have a file with:

Customer type, collector name, Customer name, amount owed 1-90 days, amount owed 91-180 days, amount owed 181-365 days---etc.

I want to be able to produce reports that tell me trends within those buckets, ie., these customer types owed this much in Nov 07 versus what they owed in Oct 07 (one month ago) and also what they owed Nov 07 versus Nov 06. Same for collector name -- how does this month compare to last, and how does this month compare to last year at this time?

So--do I want a separate table for each new month or do I want a single table with a field that indicates the "report month", which is the month that the 'picture was taken' of the data?

I hope that this makes sense. I kind of think the latter makes more sense but I'm not sure because the whole thing breaks a lot of rules (It's going to be a failure of normalization, for example, but there's nothing I can do about that....)

I'm lousy with cross tab queries, which doesn't help.

Thanks so much
Rosie
 
you want a single table that indicates the report month....no brainer....

Leslie

In an open world there's no need for windows and gates
 
the decision (for me at least) was a no brainer...not trying to insult anyone!
 
You definitely want all the data in one table with the snapshot month as a field. This is as close to normalized as you can get.

Also, a part of design decision is reporting and sometimes you conisder denormalizing for reporting (Although I recommend normalizing until you have some good experience or someone in the know recommends otherwise). However the goal in denormalization is to get everything down to one table.

Definitely go with one table.

The alternative is having to either write queries to explicitly deal with each months queries or writing VBA that will write the queries. Neither of these is a worthwhile endeavor until you have no other choice, which you do.
 
Thanks both!

I took the 'no brainer' as intended :)

I'm glad it's a clear answer. I think what got me is that it seemed like a giant Excel spreadsheet.


 
Rosie,

what you've asked about is achieved by partitioning in SQL server and Oracle. If your data volumes get truly enormous, you may end up with no choice but to implement a "poor man's partition". I hope it doesn't come to that, but bear in mind that de-normalising for performance reasons is fine. After all, Data Warehouses are designed that way.

FWIW, I must concur with lespaul, and say "keep it normalised" until you have a very good reason not to.

Do bear in mind the limitations of Access. You can't turn a system designed for a small number of users and data into an enterprise solution without some serious hassle.

Regards

Tharg

Grinding away at things Oracular
 
de-normalising for performance reasons is fine." Since Codd wrote up the theory of Normalization in 1970, I've never seen where that's been true. Would like to see an example of that.
 
fneily,

Since I don't have the mass storage to demonstrate sql timings with a 1 billion row table, and a simple where clause, I suggest you try any book on data warehousing techniques.

Please note that I did use the word "reason" and not "purist drivel".

If handling a table with say 1 billion transaction records for the current year, and without say a range partition by date, the query to produce quarterly results would have to wade through 1 billion rows to find the relevant 250 million.

Oh, silly me, we could index by date, and hence cut the job down to a mere 250 million rows, but oh, silly me, a 250 million row index might take some little while to update when we carry out inserts. Since this is coming from an OLTP system (after all, I don't recall mentioning data warehouses for OLAP, yet another sad omission in my original respone) performance can't really be a problem, can it?

Sheesh, I really am dumb today.

T

Grinding away at things Oracular
 
Books and your discussion are interesting. I just haven't seen any actual implementation of a de-normalized database in front of me that is efficient. All I ask is to actually show me one. Very simple.
"Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks", IBM Research Report RJ599(Aug. 19th, 1969)

"A Relational Model of Data for Large Shared Data Banks," CACM 13, No.6(June 1970)

His "Data Sublanguage ALPHA" was predecessor to Quel and SQL.

Major contributor to Online Analytical Processing (OLAP)
As of today, Microsoft has approximately 32% of the OLAP market. Oracle about 3%

"Chief among them was Larry Ellison, who along with Ed Oates and Bob Miner produced the world's first commercially available relational database management system in 1977, and in the process, started the company that would become Oracle. The rest is database history.
 
fneily,

that's a fair enough question. Clearly de-normalised systems are not efficient when compared with normalised. However, they do offer performance at the expense of efficiency.

With the above hypothetical quarterly outturn report, if one took the normalised data from one table, and split it into the four quarters, the report would run significantly quicker, for obvious reasons. Effectively the system does pre-calculation, knowing that a certain query will have to be run.

A monthly break down would require yet more pre-calculation (de-normalisation) and inefficient storage. For a given processor power, data retrieval speed and network, there is clearly a hardware-defined limit to performance. The only way to circumvent this is by pre-calculation or other techniques to overcome one or more of these limitations (e.g. data compression).

I agree that from first principles, anything other than normalised storage, in a system designed in accord with the illustrious Mr Codd's principles, is sure to be inefficient.

Regards

T

Grinding away at things Oracular
 
fneily,

When I say denormalize, I really mean sometimes in considering reporting, you simply stop at 3rd Normal Form or denormalize to it or torwards it.

Sometimes further normalization only serves to make queries complex. Something Jet is more susceptable to in performance compared to a high end product like SQL Server.

I doubt in a Jet database I would bother with data warehousing or anything completely denormailized. However, as the original poster has indicated, I may use Access as a tool to manage reports and do additional analysis of what is in effect the data warehouse of a DIFFERENT system.
 
I just haven't seen any actual implementation of a de-normalized database in front of me that is efficient.
It depends what you mean by efficient? Do you mean efficient in terms of storage space, efficient in terms of user response time, or efficient in terms of developer time?

Geoff Franklin
 
What do I mean by efficient? Whatever proves my point. Don't you pick defintions to suit your arguments?
 
I doubt any programmer in the real world is going to argue database design using anything other than facts.

Choosing definitions for arguments is something polititians and litigators do.

The point in denormalizing is that you give up storage space but gain response time in reading (reporting). It has its place and I doubt anyone would try to design a system around it other than of course reporting, the topic of the thread.
 
I see alot of debate here about normalized and denormalized design. RosieMP may be overwelmed by some of this. Paul Litwin and Ken Getz have updated and expanded what I consider the seminal work on designing and building Access applications called: Access 2002 Developer's Handbook. It has been updated since Access 2.0. In this book there is a very good 22 page chapter on the basics of relational database design. I would suggest getting this book for this chapter alone, and use the book as a reference work.
 
I have the 97 and 2000 editions of the Developer's Handbook. Thare are definitely the best texts I have read from on Access. These fora are also very helpful. For what it is worth my 2 favorite technical publishers are Sybex (the above series) and O'Reilly.

I think RosieMP got the right advice early on from lespaul and myself. In this case add a month field. In general normailze if possible but reporting considerations may change how normalized you make a database and until you have a good feel for it normalize unless someone advises against it. Obviously, I aggree that book series is excellent. Most of what I know about Access is probably covered in there. I learned quite a bit through this forum and trial by fire.

Denormalizing has its place but I think the confusion is over it being rare versus it is never good. A good maxim is think long and hard about denormalizing before you do it.
 
In my opinion, for a data warehouse with transactional data containing data from sets of'normalized' tables in a separate production system, you've got to denormalize.

For instance, I have a payroll data warehouse with every check to every employee every week for 20 years. Using the mailing address of the check as an example of something that might typically be normalized, I store the complete address in the table. In our system, the address is held in the employee record, but it changes--and very often with some of our employees.

So from a design standpoint, I could do the beg-date/end-date thing with the employee table, or I could make a separate 'address' table and link that to the employee table--there are many ways that could be accomplished. But they all involve complexity and when the choice came down to complexity or storing an extra 100 bytes in the table--which has only 3mil or so records, that was a no-brainer.

A data warehouse (and I'm using sql-server, not access, in this case) can be optimized with large block sizes and many other such things to where the performance is extremely fast when chunking through many records of a large per-record size.

I really don't care if I could save a few megabytes or even gigabytes--this data is static and I lay it down in a big horizontal table and it's performance is superb and it is not complex. It's just a different set of rules when you go from oltp to olap.
Just my opinion,
--Jim
 
I've always found that data warehousing is a specific denormalization of a database in order to extract aggregate information. Data warehouses are NOT normally used in the day to day transactions of the business. The day to day data is denormalized and set up in a way that makes it easier for the statisticians to extract trends and make decisions based on those trends.

I think that any decisions made for data warehousing efforts should be postponed until after the normalized working database is set up and then the data warehouse can be developed to extract the required information into a view.


Leslie

In an open world there's no need for windows and gates
 
I think that any decisions made for data warehousing efforts should be postponed until after the normalized working database is set up and then the data warehouse can be developed to extract the required information into a view.

The best and clearest way I have seen it put in this thread. I agree completely.

I think in the case of Access it is doubtful to have a data warehouse from a normalized Access system. Clearly someone may use Access to warehouse data they get from another system. Because this is an Access forum, that is the chief reason I suggested one may consider denormalizing (i.e. not using the beg and end date thing jsteph mentioned) and later had to clarify not less than third normal form for a system. Reporting is not the only reason to consider not normalizing completely as user entry and application limitations have to be considered too. Although reporting is where people will feel the most pain... enter the data warehouse in large RDMS like SQL Server.

Alas so as not to thouroughly confuse the original poster... The orginal post is about making a data warehouse to a separate database not a typical system design.
 
Lesli,
I agree to a point, but I wouldn't use views. I think you may be talking about making cubes, which at their simplest are just pre-aggregated tables, which may be based on views or crosstab-queries or some such source--but loaded into a physical table so the view/query needn't be rerun each time someone looks at the data.

However, as part of that, I believe you should still have the ability to drill down and see the transactional detail on what went into the cubes. (In my example it's weekly payroll data but it's still the lowest level).
--Jim


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top