Question about ETL processing / data loading...

lmctech · Jan 19, 2010

First off, let me be very clear: I am working with a new dataset and am trying to understand why something 'is' the way it IS...and our DBA team is not too keen on answering questions of this nature.

Background:
Source environment (A)
DW environment (B) - data direct from A (B1) and transformed fields/values (B2)

Our processes are set so that
step1= B1 is populated from A, step2= separate processes work on A to transform and send over (B2).

In the event of an issue with one of the data loads in step 1, B2 could still be okay - or vice versa.

This does not make sense to me from a processing perspective. I would think that step 1 is correct,
but step 2 should actually occur in the DW environment using A1 data to create and load A2. This would make it so that all of 'B' would be consistent at a given point in time, and would also reduce the amount of data that needs to move, wouldnt it?

Any comments that would help me understand the need / advantage of doing it one way or another would be most helpful. I'm just tryin' to learn! =)

Turkbear · Jan 19, 2010

Hi,
Our 'far' is A from B, networkwise ( Same segment, subnet, ??)

How many transactions that change ( or could change) the data in A are going on when the DW loading on B is taking place ?
( Usually DW loads happen during off hours for the source system)

As to your thought about where to get B2's data from, the above questions may help you decide if a change is really needed- it would aslo be helpful to find out how many of A's fields are used for B2's transforms and how efficient is the read from A and write to B2 versus the read from B1 and write to B2.

To Paraphrase:"The Help you get is proportional to the Help you give.."

lmctech · Jan 19, 2010

They're physically in the same location, so 'distance' shouldnt be a huge challenge.

The golden rule on B is that data is one day old. Data updates occur at night while A (production) is relatively quiet.

All transformations are using A data and writing to B2.
The issue really popped in my head because we had a situation where A1 data was incomplete (they had to reload it due to some issues) but A2 data was all correct.
This lead to reports being out of sync, and general chaos.
LOB is becoming skitting about trusting data due to differences in data they can SEE (rightfully so).

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Question about ETL processing / data loading...

lmctech

Programmer

Turkbear

Technical User

lmctech

Programmer

Similar threads

Part and Inventory Search

Sponsor