Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Duplicate data

Status
Not open for further replies.

kpmINFO

IS-IT--Management
Apr 1, 2004
77
US
FIle structure
1 a1
2 a2
3 a4
4 a1

Using informatica I would like to identify all the duplicate occurances of the second field ,
In this case I would like to identify the second occurance or a1 same way if there are 3 occurances of a1 , leaving the first occurance I would like to identify the other occurences of a1.
 
What would you want to do once you identify the Duplicates???? What is the action you want to take??? Say there are three occurances for A1 what exactly you want to do with those... When you say leave the First Occurence how do you identify the first occurance??? Do you sort based on some other field... Give us more clarity and will be able to provide meaningful suggestions...

Sri
 
Very brute force:

Create a staging table with primary key on the second field you mention. Now do a hard insert into the staging table and all the duplicates will not be loaded. You will have a lot of failed rows and a huge logfile, but it essentially will do what you want.......

T. Blom
Information analyst
tbl@shimano-eu.com
 
first of all this data is my input, I would like to rout the first occurance to a table and the rest of the occurances to a different tables.
 
There is no 'first occurance' of data in relational databases. Data is stored in random order , so getting a first occurance is pretty meaningless I guess.

However, if you add a sequence generator to your mapping that will assign a dummy-value to each record you can then easily use a rank transformation to assign a rank value.

Description:

1. Sequence generator

Add a counter (choose reset option) that will assign dummy value to each passing row, so output will look like:

a1 1
a2 2
a3 3
a1 4
a4 5
a3 6
a2 7

2. Expression transformation

Subtract the dummy-values from a fixed large value, say 1000000, so out put will look like:

a1 999999
a2 999998
a3 999997
a1 999996
a4 999995
a3 999994
a2 999993

3. Rank transformation

Assign rank with group by on first field biased on the second field:

output:

a1 999999 1
a2 999998 1
a3 999997 1
a1 999996 2
a4 999995 1
a3 999994 2
a2 999993 2

Router transformation:

Use router to distribute to various targets (biased on rank value)


T. Blom
Information analyst
tbl@shimano-eu.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top