An easy way to Handle duplicate records ?

sweetleaf · Feb 20, 2003

Hi there,

I have a sequential file stage which I want to read data into 2 oracle tables.
I want table 1 to contain records which satisfy the pk (product#,vendor).
I'd like records missing pk data / or duplicates to get routed to table 2.
I've already got records with missing pk data going over to table 2, but do not know how to get duplicates to go there as well.
I'm not even sure where to begin with recognizing duplicates within DataStage.

Whats happening is when I load the data from a flat file into table 1 the process simply stops once a duplicate is encountered (product#,vendor#) - rather than continuing to process the rest of the file.

I'm handling missing data with an expression in the constraints header. So, before the pk can even be enforced by oracle, DataStage has already handled records with missing data by outputting them to table 2.

The problem is that I don't know how to create an expression in the constraints header (like the one that traps records with missing data) which can trap a duplicate record so that I can redirect it to table 2.

Any help will is greatly appreciated!

ManjulaMadhu · Feb 20, 2003

Try creating an extra column on the output link from the Aggregator that performs the function Count on any non-grouped input column. Feed the output through a Transformer stage. The Transformer stage has at least two outputs, one that is passing all rows output from the Aggregator, the other the duplicates.
Constraint expression on the duplicate-handling link is HowMany > 1.

Use pre-sort on your text file on the grouping columns, DataStage job will use far less memory and run faster.

ManjulaMadhu · Feb 20, 2003

I saw that you are in discussion in DS forum. Did you get over with the solution ?

pcons · Mar 24, 2003

If you have duplicates, and need to augment the data, you may have a different problem to that which DataStage alone is designed to handle. There is a product called INTEGRITY that Ascential also sells that may help.

battaliou · Jan 12, 2005

Try sorting your data, then you can use stage variables to check the previous value i.e. initialise stagevar1 to stagevar2 and set stagevar2 to the input column, then compare them.

Alternatively, write your data to a hash file, and also lookup against the same hash file. This is probably less efficient than the sort.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

An easy way to Handle duplicate records ?

sweetleaf

Programmer

ManjulaMadhu

Programmer

ManjulaMadhu

Programmer

pcons

MIS

battaliou

Programmer

Similar threads

Part and Inventory Search

Sponsor