Duplicate data

kpmINFO · May 4, 2004

FIle structure
1 a1
2 a2
3 a4
4 a1

Using informatica I would like to identify all the duplicate occurances of the second field ,
In this case I would like to identify the second occurance or a1 same way if there are 3 occurances of a1 , leaving the first occurance I would like to identify the other occurences of a1.

Sridharan · May 5, 2004

What would you want to do once you identify the Duplicates???? What is the action you want to take??? Say there are three occurances for A1 what exactly you want to do with those... When you say leave the First Occurence how do you identify the first occurance??? Do you sort based on some other field... Give us more clarity and will be able to provide meaningful suggestions...

Sri

blom0344 · May 5, 2004

Very brute force:

Create a staging table with primary key on the second field you mention. Now do a hard insert into the staging table and all the duplicates will not be loaded. You will have a lot of failed rows and a huge logfile, but it essentially will do what you want.......

T. Blom
Information analyst
tbl@shimano-eu.com

kpmINFO · May 5, 2004

first of all this data is my input, I would like to rout the first occurance to a table and the rest of the occurances to a different tables.

blom0344 · May 5, 2004

There is no 'first occurance' of data in relational databases. Data is stored in random order , so getting a first occurance is pretty meaningless I guess.

However, if you add a sequence generator to your mapping that will assign a dummy-value to each record you can then easily use a rank transformation to assign a rank value.

Description:

1. Sequence generator

Add a counter (choose reset option) that will assign dummy value to each passing row, so output will look like:

a1 1
a2 2
a3 3
a1 4
a4 5
a3 6
a2 7

2. Expression transformation

Subtract the dummy-values from a fixed large value, say 1000000, so out put will look like:

a1 999999
a2 999998
a3 999997
a1 999996
a4 999995
a3 999994
a2 999993

3. Rank transformation

Assign rank with group by on first field biased on the second field:

output:

a1 999999 1
a2 999998 1
a3 999997 1
a1 999996 2
a4 999995 1
a3 999994 2
a2 999993 2

Router transformation:

Use router to distribute to various targets (biased on rank value)

T. Blom
Information analyst
tbl@shimano-eu.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Duplicate data

kpmINFO

IS-IT--Management

Sridharan

Technical User

blom0344

Technical User

kpmINFO

IS-IT--Management

blom0344

Technical User

Similar threads

Part and Inventory Search

Sponsor