Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data cleanup approach

Status
Not open for further replies.

vfspb

Programmer
Jul 17, 2015
1
US
Hello, experts,

I am trying to wrap my mind around the right data cleanup procedure. I have just implemented my SSIS package, that imports data and populates Slowly Changing Dimensions and Fact tables. Now I am trying to understand how would data analyst clean up data (like removing duplicates and fixing spelling errors that lead to multiple records in Dim tables instead of one).

- Would the analyst query resulting Dim/Fact tables and update the ones that do not look correct? But in this case next iteration of incoming data will overwrite the results of this cleanup.

- Would the analyst need to go into production system and fix the dirty data there? This doesn't look very sustainable as well, from security and data safety perspectives.

Also - is there any specific software that tackles this particular issue?
 
Hi vfspb

Look for a Data Flow Transformations task called Fuzzy Lookup, that will help you clean up your data.
If you want me to explain how the whole process will work out, let me know.

Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top