Multiple Source Systems 1

DataStiletto · Apr 5, 2009

We have just acquired a company that is in the same industry as us but offer a different product and service mix. They are located in New England. We know that we will share some customers and products because we have a large New England presence ourself.

I designed our warehouse from the onset to be capable of handling acquisitions by including the source system as part of the key. What I need to do for now is load all the customers and not look for duplicate because they will have different payment and credit profiles than what we share.
(We will dedupe then later)

What I need to do though is to combine the common products we sell to determine where we can obtain better pricing and profit margins. We have set up a data group to look at each item in our inventory and determine if the new acquisition has the same item.

My question is I intend to create a mapping table that will accept the source system id and product id and return a surrogate key for every item in both us and the new acquisition (18,000 items in total to compare). The mapping table will return the same surrogate key if the items are considered the same regardless of the source system.

My question is what to do if the data group misses an item and there is no surrogate key in the mapping table or an new item is entered and the data group doesn't update the mapping key. I was thinking of creating a surrogate default key of not found if this should occur.

I do not want to stop the flow of data into the dimension tables or fact tables because the data group missed a mapping.

Any ideas would be helpfull.

Hans63 · Apr 5, 2009

If I read your post right you will create a piece of code that will in essence perform a full outer join, to identify which products are the same and which products do not have a match.
Is it possible that one of you products will match several products from your acquisition of vice versa?

If you cannot find a match for a specific product, I would use a special value for the surrogate key with the meaning: not known.

If, in a later stage you have to refine the mapping due to new insights, you can use this key to filter the records you still have to compare.

DataStiletto · Apr 6, 2009

Hello Hans;

I was going to modify the column behavior and defualt the item to the surrogate value for unknow.

This way all the fatcs will be loaded and balances will be correct except. Thne an email can be sent to the data comitte to fix the problem. We can then perfrom an change to set the item to the ocrrect surrogate key or create a new one for it.

Many peopel do not seem to like value sof NA or Unknown though.

I would rather have the data in and have it say the truth which in this case is unknown.

Thanks

Hans63 · Apr 6, 2009

Yes, many people feel unfomrotable with values for 'Unknown'. They probably don't know how to handle this

, what it means in business terminology.

I think that you should indeed load all facts.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Multiple Source Systems 1

DataStiletto

Programmer

Hans63

Programmer

DataStiletto

Programmer

Hans63

Programmer

Similar threads

Part and Inventory Search

Sponsor