danrobertson
IS-IT--Management
How strict should the architecture principles placed on the DW be?
The classic principle that I like to assign to a DW architecture is that it is there for information reporting/analysis and definitely not for information processing/feeding to other systems. The DW is a target system, never a source.
However, I occassionally have a dilemma when a requirement appears for another system that wants to take advantage of the cleansing and integration that has been performed on data in the warehouse. It is difficult to argue that this requirement should be fulfilled independently straight from the source(s) into the new target when that would require replicating the integration and cleansing that has been performed by the DW. I would bend the rules at this point and allow the new target system to use the ODS as the source.
My architectures are almost always Staging Area, Operational Data Store, DW/DM:
- Staging Area would be a copy of source files
- Operational Data Store would be a normalised, relational store of integrated data (based on similar subjects)
- Data Warehouse / Data Marts would be aggregated, star schema stores
The real 'rule-bender' comes when a requirement comes from a target system that wants to take an export from a data mart. In this instance the full length of the DW ETL process has become a dependent part of this new target system. I often find this situation when the request has come from Finance who want to populate their General Ledger (GL) via this route. Their process and reasoning is that this way they can examine the data first before posting it to the GL and if they question any data they have the depth of the DW to drill into it and get their answers (this depth of data would not be available in the GL). My dilemma here is whether I provide the feed direct from the DW (which has been aggregated to the appropriate level, or whether I insist the feed goes from the ODS).
Should I say no to all these requests on the grounds that the dependent system/target will be subject to the validation routines on a bunch of data that it does not need and therefore may not load one day when a piece of data it is not interested fails a check?
What opinions do people have?
The classic principle that I like to assign to a DW architecture is that it is there for information reporting/analysis and definitely not for information processing/feeding to other systems. The DW is a target system, never a source.
However, I occassionally have a dilemma when a requirement appears for another system that wants to take advantage of the cleansing and integration that has been performed on data in the warehouse. It is difficult to argue that this requirement should be fulfilled independently straight from the source(s) into the new target when that would require replicating the integration and cleansing that has been performed by the DW. I would bend the rules at this point and allow the new target system to use the ODS as the source.
My architectures are almost always Staging Area, Operational Data Store, DW/DM:
- Staging Area would be a copy of source files
- Operational Data Store would be a normalised, relational store of integrated data (based on similar subjects)
- Data Warehouse / Data Marts would be aggregated, star schema stores
The real 'rule-bender' comes when a requirement comes from a target system that wants to take an export from a data mart. In this instance the full length of the DW ETL process has become a dependent part of this new target system. I often find this situation when the request has come from Finance who want to populate their General Ledger (GL) via this route. Their process and reasoning is that this way they can examine the data first before posting it to the GL and if they question any data they have the depth of the DW to drill into it and get their answers (this depth of data would not be available in the GL). My dilemma here is whether I provide the feed direct from the DW (which has been aggregated to the appropriate level, or whether I insist the feed goes from the ODS).
Should I say no to all these requests on the grounds that the dependent system/target will be subject to the validation routines on a bunch of data that it does not need and therefore may not load one day when a piece of data it is not interested fails a check?
What opinions do people have?