Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

An easy way to Handle duplicate records ?

Status
Not open for further replies.

sweetleaf

Programmer
Jan 16, 2001
439
CA
Hi there,

I have a sequential file stage which I want to read data into 2 oracle tables.
I want table 1 to contain records which satisfy the pk (product#,vendor).
I'd like records missing pk data / or duplicates to get routed to table 2.
I've already got records with missing pk data going over to table 2, but do not know how to get duplicates to go there as well.
I'm not even sure where to begin with recognizing duplicates within DataStage.

Whats happening is when I load the data from a flat file into table 1 the process simply stops once a duplicate is encountered (product#,vendor#) - rather than continuing to process the rest of the file.

I'm handling missing data with an expression in the constraints header. So, before the pk can even be enforced by oracle, DataStage has already handled records with missing data by outputting them to table 2.

The problem is that I don't know how to create an expression in the constraints header (like the one that traps records with missing data) which can trap a duplicate record so that I can redirect it to table 2.

Any help will is greatly appreciated!

 
Try creating an extra column on the output link from the Aggregator that performs the function Count on any non-grouped input column. Feed the output through a Transformer stage. The Transformer stage has at least two outputs, one that is passing all rows output from the Aggregator, the other the duplicates.
Constraint expression on the duplicate-handling link is HowMany > 1.

Use pre-sort on your text file on the grouping columns, DataStage job will use far less memory and run faster.
 
I saw that you are in discussion in DS forum. Did you get over with the solution ?
 
If you have duplicates, and need to augment the data, you may have a different problem to that which DataStage alone is designed to handle. There is a product called INTEGRITY that Ascential also sells that may help.
 
Try sorting your data, then you can use stage variables to check the previous value i.e. initialise stagevar1 to stagevar2 and set stagevar2 to the input column, then compare them.

Alternatively, write your data to a hash file, and also lookup against the same hash file. This is probably less efficient than the sort.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top