Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

search duplicates

Status
Not open for further replies.

malpa

Technical User
Feb 8, 2004
122
CO
Hi

Order and search duplicates

There are many files from several network elements. These files are generated all days. One per element.
You can find the same register into several network elements. The difference consits of that the duration time between this registers can be more or less than 5 seconds, and the field that identify the network element.

All days the server consolidate the files that generete each network element in one file.

file_1 day one
file_2 day two
file_3 day three
file_4 day four
file_5 day five
file_6 day six
....

The idea is to compare the actual file for example file 6 with the file 1 to file 6 and obtain de duplicates registers.

each file contain at least 6 million of records.

What is the best way to do this, any suggestion ??

The format of each file contain this fields

file_1
network element 1, source a, target b, date YYYYMMDD, hour HHMMSS, duration time HHMMSS
network element 2, source b, target a, date YYYYMMDD, hour HHMMSS, duration time HHMMSS
...

file_2

network element 1, source a, target b, date YYYYMMDD, hour HHMMSS, duration time HHMMSS
network element 2, source b, target a, date YYYYMMDD, hour HHMMSS, duration time HHMMSS
....

thanks


malpa








 

What have you coded? There are many ways to compare files and any particular one may depend on the actual format/content of the file.

If you do not provide sample data and expected result, we cannot but give generic solution.

Generic solution 1) Use diff.
[3eyes]


----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 

Generic solution 2) Use join.


----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 

Generic solution 3) Use awk.

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
What are "de duplicates registers"? What output format are you aiming for?

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top