i would like to write a script that would send duplicates to an ignore file. this doesn't work - count(*) > 1 in the constraint of a transform for the ignore file.
You've got a few ways of handling duplicates.
- You can use an aggregator stage to group the data and remove duplicates.
- You can put a copy of the data in a hash file and perform a lookup to identify duplicates.
- You can purchase QualityStage which has a comprehensive set of matching functions.
- You can use the changed data detection CRC32 function to do a fast comparison of a large number of columns. There is an example uploaded on Ascential's DeveloperNet at
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.