Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DB must OLAP and OLTP?

Status
Not open for further replies.

iwease

IS-IT--Management
Sep 4, 2006
104
CA
I have to build a program that does two things. First, it must have interaction from users to save the current state or roads and road conditions. This involves linking to weather conditions tables, predefined deficiencies, etc. Data can not be deleted from the database and it must allow users to look up road conditions in the past. So, if a deficiency changes, then the historical data must refer to the old deficiency.

Method 1) One way to do this would be to have our deficiency table (and weather table, etc) look like this

DeficienciesTable
-------------
DeficiencyID
OrginalID (Null unless the deficiency gets updated)
DeficiencyName
Description

If a Deficiency gets changed, a new deficiency gets inserted into the database and then the original deficiency gets referenced through OrginalID. I could create indexed views to show only the current information for the transactional side of things, but use this method to store historical information.

Method 2) Another method would be to have 2 different databases, a transactional one for storing the current data and a historical one for past data. Data from the transactional database would be periodically copied into an historical database for storing historical data. The thing is, the table structure for the transactional database would not be that different from the historical one since there is only one data source.

So, does method 2 have any advantage over method one.
 
I should mention that I am by no means an expert in using the terminology OLAP and OLTP. I have created lots of fully normalized databases for handling short term transactions (so, I assume this is OLTP) and I have also created a star schema database that was only for reporting but that is my only experience with OLAP.
 
I presume that if a deficiency gets updated more than once, each new deficiency will have the OriginalID of the most recent one?

And what's the primary key that relates deficiencies to each other? That is, how do you know, looking at the deficiencies table, that two deficiencies code for the same instance/event/condition except for following the OriginalID value backward?

Let's say a deficiency has been updated 8 times. What would a query look like that tries to show all 8 instances at once? You'd have to join the table to itself eight times. That's not a general solution that would work for any deficiency, so you'd have to have a looping process that adds history values until none are left.

Now, your second method has some benefits. First of all, I wouldn't put them in different databases, and I wouldn't "periodically" copy the data from one to the other. I would put the history tables in the same database as the main tables, for example Deficiencies and DeficienciesHistory. I would most emphatically not make it two databases.

Then create a trigger on Deficiencies that pushes updates to the History table. The History table would have all the columns of the main table, but no constraints, and additionally it would have a HistoryDate column and a DataAction column. HistoryDate would have the current date pushed to it and DataAction would say whether it was an Insert, Update, or Delete.

Now you can query the history table in one pass to see the history of an item, you aren't bloating your main table with old values keeping your indexes smaller and queries faster, and you don't ever run into situations where the history table is missing data that it should have.

I am currently developing a generic solution for creating history tables and the triggers to fill them. When I am done with it I wouldn't mind sharing it with you.

Here's a sample trigger I hand-wrote that pushes to a history table:

Code:
CREATE TRIGGER TR_Number_IUD ON dbo.Number FOR INSERT, UPDATE, DELETE
AS
SET NOCOUNT ON
INSERT dbo.NumberHistory
SELECT
   IsNull(I.Number, D.Number),
   IsNull(I.AssignedTo, D.AssignedTo),
   IsNull(I.CreateDate, D.CreateDate),
   HistoryDate = GetDate(),
   DataAction =
      CASE
         CASE WHEN EXISTS (SELECT 1 FROM Inserted) THEN 1 ELSE 0 END
         + CASE WHEN EXISTS (SELECT 1 FROM Deleted) THEN 2 ELSE 0 END
      WHEN 1 THEN 'I'
      WHEN 2 THEN 'D'
      WHEN 3 THEN 'U'
      END,
   Agent = System_User
FROM
   Inserted I
   FULL JOIN Deleted D ON I.Number = D.Number

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
That is a nifty little trigger. I might use that for our system but I'm still not entirely sure which method to go with. That trigger could be useful for both.

I have been thinking about this the same way as you have. A co-worker explained to me that for method 1, you don't have to keep referencing all the way back to get the original deficiency. The original ID is the same for all deficiencies that change from the original.

ie
DefID OrigID
1 1
3 1
4 1
5 1

The method you described after that is the method that I had originally pushed for. But, after my co-worker explained method 1 some more, I decided that with an indexed view, I don't really have to worry about preformance losses either for method 1. Plus, our reporting engin doesn't need to look at millions and millions of rows so I think I would have a hard time convincing them to develop a separate design for the OLAP tables when we could just report directly off our original tables (so long as they have the history via the two ids)
 
I still see a problem with your design. How do you do a query for the most recent deficiency value? Tell me how you are doing that or plan to do that, give a query example.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Lets see...I figured I could do something like

SELECT d.* FROM Deficiencies d
INNER JOIN
(SELECT Max(DeficiencyID) as CurrentDefID FROM Deficiencies
GROUP BY OriginalID) c
ON c.CurrentDefID = d.DeficiencyID

Perhaps not the most beautiful query ever but does the job. And with the view indexed, it should be pretty speedy I think.

I'm just saying I could do that. I still don't know the best method and I really appreciate your input.
 
By the way, at first I thought you meant something like this:

[tt]DefID OrigID
1 NULL
3 1
4 3
5 4[/tt]

Anyway. I doubt you can index that view. You and your coworker should actually try it. There are great restrictions on the kind of things that an indexed view can have in it. It takes quite a bit of fuddling and studying to get an indexed view to work the first time, until you really learn what you're doing.

Now, this query is the one you'll be doing the MOST often, right? So why use a design that makes the query you'll be doing the most often the most complicated?

Basically what you've done is make OriginalID your deficiency identifier. But there's no way to know the most recent value except by selecting the max DeficiencyID by OriginalID, as you've done in your query. For one thing, you're relying on the surrogate key always being in order. This is something you should not do. Even though the database enforces lookup, how do you know that some day someone isn't going to insert a record into that table with IDENTITY_INSERT on? How do you know that some records won't be deleted, or two deficiencies will be discovered to be the same and need to be merged? Then you're in a deep hole because of your database design.

What OriginalID and DeficiencyID are doing in your table are mixing two types: one uniquely identifies a series of values that apply to one deficiency. The other one is an arbitrary number (and you should not rely on its ordering) to distinguish the iterations in the series of values for this deficiency. But they are really the same!

I could see a header/detail relationship:
DeficienciesHeader (a new table)
DeficiencyHeaderID (your OriginalID)
CurrentID (FK to Deficiencies table, DeficiencyID)

Deficiencies (your table)
DeficiencyHeaderID (your originalID)
DeficiencyID

Code:
CREATE TRIGGER TR_Deficiencies_I ON Deficiencies FOR INSERT
AS
SET NOCOUNT ON
UPDATE H
SET H.CurrentID = I.DeficiencyID
FROM
   DeficienciesHeader H
   INNER JOIN Inserted I ON H.DeficiencyHeaderID = I.DeficiencyHeaderID

But I wouldn't do this either: it has denormalization problems. The same information is being stored twice: The header points to the detail, but what if the detail row it's pointing to gets updated to point to a different header? Problems.

My experience says do the simpler route as I described before. Use one table for current now information. Use the other for history. I guarantee it will save you serious problems down the road.

Plus, our reporting engin doesn't need to look at millions and millions of rows so I think I would have a hard time convincing them to develop a separate design for the OLAP tables when we could just report directly off our original tables
Wait a minute. Your original tables have millions and millions of rows in them, don't they? How will the history table in my suggested design end up having more rows in it than the main table? Are you thinking about updates as opposed to inserts?

So use the header/detail schema I suggested. It would probably be best to add a CHECK constraint on the header to ensure that the detail row it points to as current has the same header value. And you should rollback updates to the detail that change the header value or propagate those changes to the header table. And you have to decide what to do about deletes.

Another way to do it would be to use the history table as I suggested, but alter the behavior of the trigger so only deletes, or updates that change the "version number" of the deficiency, push the OLD values to the history table. You'd have to do a union with the history table to the current table to get all values, but this is a much less complicated query than the one you proposed.

I have had two big production experiences at jobs with people using history tables as combination "current info" and also historical data. They performed BADLY and were a HUGE PAIN to administer and to query from.

Do yourself a favor and avoid this kind of combo current/history design.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top