Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Real time datawarehousing

Status
Not open for further replies.

jtamma

IS-IT--Management
Oct 3, 2007
24
IN
We are in the process of desiging a datamart with nightly load etl that would be developed in Informatica. Management wants us to think on the possibility of making it a real time datawarehouse. Having experience in batch ETL and already having the ETL architecture to support it I am finding it difficult to change my thought process to think how this can be accomplished. Explored and found some new terms on Google like Micro Batch ETL and Streaming ETL vs Batch ETL, however no details to support what they are. Does anyone has experience on real time datawarehousing and know what these terms are ? My current architecture consists of SOurce Stating - ODS - star schema..how the real time architectures are modeled? How the ETL is different from the batch ETL. ANy help would be greatly appreciated.
 
Hi Jtamma,

I have no experience in building a real-time ETL proces. I do have a definition for micro-batch and streaming ETL.

Micro batch ETL means you gather the source records on a short periodic base (say every 5 minutes) and process these with your ETL software.

Streaming ETL means you get EVERY change on the source system and feed that to your ETL process.

micro batch does not nescecarilly mean you have to re-architect your ETL software (although it can happen that you do have to).
Streaming ETL calls for a new approach.

All I can say is: use your experience in batchwise ETL in a creative process to come up with a new solution.
Keep an eye on robustness (make sure you can buffer an peek in changes that woould overflow your ETL).

Hope this helps a bit.
 
It does help. Thanks Hans63.
 
Another thing to keep in mind when dealing with streaming ETL is the fact that a single transaction in the source system, may enter the ETL proces as several smaller transactions. Consider a transaction where 3 tables are to be updated in the source system.
You may get 3 transactions for your ETL system, in the wrong (an illogical) order. You must be prepared to handle this.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top