First off, let me be very clear: I am working with a new dataset and am trying to understand why something 'is' the way it IS...and our DBA team is not too keen on answering questions of this nature.
Background:
Source environment (A)
DW environment (B) - data direct from A (B1) and transformed fields/values (B2)
Our processes are set so that
step1= B1 is populated from A, step2= separate processes work on A to transform and send over (B2).
In the event of an issue with one of the data loads in step 1, B2 could still be okay - or vice versa.
This does not make sense to me from a processing perspective. I would think that step 1 is correct,
but step 2 should actually occur in the DW environment using A1 data to create and load A2. This would make it so that all of 'B' would be consistent at a given point in time, and would also reduce the amount of data that needs to move, wouldnt it?
Any comments that would help me understand the need / advantage of doing it one way or another would be most helpful. I'm just tryin' to learn! =)
Background:
Source environment (A)
DW environment (B) - data direct from A (B1) and transformed fields/values (B2)
Our processes are set so that
step1= B1 is populated from A, step2= separate processes work on A to transform and send over (B2).
In the event of an issue with one of the data loads in step 1, B2 could still be okay - or vice versa.
This does not make sense to me from a processing perspective. I would think that step 1 is correct,
but step 2 should actually occur in the DW environment using A1 data to create and load A2. This would make it so that all of 'B' would be consistent at a given point in time, and would also reduce the amount of data that needs to move, wouldnt it?
Any comments that would help me understand the need / advantage of doing it one way or another would be most helpful. I'm just tryin' to learn! =)