Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can there be duplication of data between fact tables. 2

Status
Not open for further replies.

jtamma

IS-IT--Management
Oct 3, 2007
24
IN
I work for a manufacturing company and I am creating a datamart for tracking inventory of products. As part of this datamart I also need to analyse activity based costing of these products .A semifinished product is also a product for the company and we need to keep track of the cost that has occurred till that stage of production. Each time a product undergoes a value addition process - it becomes a new product.

Design Issue 1
================================
I have designed two transaction level fact tables to analyse these two acitvities at the most grainular level.

(1) Inventory Transaction fact table - For a product - each activity that results in invenotry change .
(2) Cost Transaction fact table - For a product each acitivity that results in the cost addition to that product.

Here is the example in its simplest case - say

(1) P1 is the raw material product with cost C1.
(2) P1 undergoes some process and becomes P2. Cost of the process is C2.
(3) P2 undergoes some process and becomes P3. Cost of the process is C3.

Inventory Transaction Table would be..

T1 | Purchased P1 | C1
T2 | Manufactured P2 from P1 | C2
T3 | Manufactured P3 From P2 | C3

Cost Transaction Table would be

P1 | C1 | T1
P2 | C1 | T1
P2 | C2 | T2
P3 | C1 | T1
P3 | C2 | T2
P3 | C3 | T3

Here we can have few Inventory Transactions (20 %) that do not result in Cost Transactions. Also we can have few cost Transactions ( may be 5 %) that are not related to the Inventory transactions.

As both these fact tables are huge and with the intention of avoiding join betweent the two fact tables, I intend to copy all about transaction details that I need for cost analysis ( and they are most of them) in Cost transaction, rather then referring to a row in the inventory transaction table from the cost transaction table. With confirmed dimensions these transaction details consists of mostly dimension keys and there would be around 12 to 13 of these keys. I know query performance would be good, but I am not feeling comfortable with so much of duplication. Do you think this is a good design ?? What alternatives exist ??
 
There are no hard and fast rules regarding duplicate data. We do what we need to in providing timely and accurate access to the information for our user base.

When there are duplicate fact tables, however, there is always the possibility that one table could be different than another, either through physical record count (bad), or by logical counting (nulls, unmatched foreign keys, etc - confusing). If you use two fact tables, I suggest a daily balancing check routine to help insure you don't need to reconcile the two tables sometime in the future.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Thanks Johnherman. This guideline is really helpful. Not only here but for my other design problems as well. A star from me.
 
Johnherman,

Someone in my team has found out about this design tip #13 from Kimball."When a fact table can be used as a dimension table"

Here Kimball uses a fact as dimension in another fact. This is similar to my situation where I need to use Inventory transaction as a dimension to my cost transaction fact table.

I do not understand this design tip. Doesnt it mean that we need to join two large fact tables for queries compromsing on performance ? Also will it be a star schema then ? Have you actually seen such a design in your long career ? In what circumstances we should copy keys between the fact tables and in what circumstances we should refer to other facts as dimension ?
 
Before I comment on the design, I need to know a few more things. Does the product P1 refer to a batch whose quantity never changes except in process to P2? Or can several P1's be combined when making P2?

What I'm trying to understand is the level of granularity for each table, and trying to capture the lowest level. I'm thinking the Cost Transaction Table can be simplified, too.



-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
P1,p2 identify unique instances of company's products. There will not be any other P1 or P2 in the system. Life of the P1 starts when it enters into the inventory because it is procurred from outside and it ends when it shipped out of the company. Even if there is some other product exactly same as P1 - it will identified with different id. So several P1s can not make a P2.. but p1 + p3 can make a p2. Quantity of the p1 is always = 1. Quanity of p2 = 1 etc.

I hope I answered your question.
 
From what I have read so far in your original post it seems that conceptually the inventory transaction table is indeed used as a dimension on the cost transaction fact table.
This is a situation that sometimes occurs (which is why Kimball refers to it). It is indeed something you want to avoid. But don't go to far in this as you may loose understandability of your model.

when looking at the data you presented I think the original tables should be:

inventory:
==========
TransactionKey InventoryAction Cost others attributes
T1 Purchased P1 C1 .....
T2 Purchased P2 C2 .....
T3 Used P1 for P3 C3
T4 Used P2 for P3 C4
T5 Purchased P5 C5

Where it reflects that P3 can be made from P1 + P2 and there is a inventory transaction that does not result in a cost transaction.

Cost:
=====
Product TransactionKey Cost other attributes
P1 T1 C1 .....
P2 T2 C2 .....
P3 T1 C1
P3 T2 C2
P3 T3 C3
P3 T3 C4
P6 -- C6

Where it reflects that P6 has costs not related to inventory transactions.


From the information presented so far it looks like you are filling the cost transaction table with a transitive closure on the inventory. Apart form the fact that there are cost records that do not relate to inventory records.

12 or 13 dimensions is quite a lot. This should make you wonder if some of these dimensions are correlated in some way, so you can place them in 1 dimension. I would not be happy when duplicating 12 keys in a huge table as it takes considerable amounts of space and takes extra time in the ETL-process as well. Because they are dimension keys they should be indexed as well. Indexing may not be a real issue if you cna use bitmap indexes, but still.

BTW: what does huge mean to you? (tens of) thousands / millions / billions of records. Megabytes / Gigabytes / Terabytes

Do the queries you need to run on the join between these two tables use a small portion of the records, or will they use (almost) all records?
The question you have depends a lot on what excactly you have you achieve.
 
Thanks Hanes, what you have described is exactly the situation I have in hand.

sIZES
-------------

We should have around 40 Million records in inventory transaction in an year. We will have 2 years data to begin with. Cost transaction can roughly be 5-6 times of that as one Transaction can result in multiple costs as the product goes thru various cycles.

Query needs
--------------------
For some adhoc queries we might need to join the two tables for all the rows but for all the known requirement we will always have constraints on one or the other dimensions when we join the two tables.


No of dimensions
-----------------
I am at a point where I do not think dimensions can be reduced more. I already have a junk dimension capturing all the status etc. However there is one place where it could be possibility and I describe it below.

Main use of the inventory transaction is to view history of the transactions on a product, so for a shipment transaction if I just say "shipped from A" and do not say "Ship from A to B" then it doesnt make much sense. There is a separate fact table to capture more details of shipment. But in this invetnory transaction I just want to capture the destination facility. This dimension in other kind of transactions doesnt make sense and hence would be NA.

LIke that I have 4 dimensions that would be populated for one or other kind of transactions and would be NA for other transactions. If I take these fields away then it takes lot of query power with it. So do you think to reduce no of dimensions I should remove these dimensions from here ?

============================================

So with this information is your opinion would be to use inventory transaction fact as a dimension for cost transaction fact by creating a key in the first fact.
============================================
 
Well Jtamma, it is a challenging problem you face here.

==========
As to the number of dimensions:
As you have already looked into this and already have a junk dimension, I don't think you should be spending much time this.

==========
I would indeed try to use the inventory transaction fact as a dimension on the cost fact. And I would create some relevant queries on these two tables and measure their performance before your customer gets to see them. That way you can check if the performance is lousy but acceptable or if it is awful (perhaps just plain good). This will depend largely on the number of rows returned when applying the constraints.

And you can use the query plan to check if you can optimize your model (by using indexes, partitions etc. Optimizers may find strange plans for the queries you give them.

==========
Joining 80 million against 400 million with a full table scan may cause a performance issue.

==========
Depending on the requirements, it may be an idea to add an extra column on the cost analysis fact table which holds the cumulative cost for the product up to this time (inventory transaction).
 
Thanks Hans63, your analysis and suggestion is really useful.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top