Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to locate repeat work in some files?

Status
Not open for further replies.

kcluna

Technical User
Oct 28, 2006
3
BR
I have some files like this:

file1.txt:
APA1 2749 0138
APA1 2950 1234
APA1 2294 0204
APA1 2902 0229

file2.txt:
USD1 2380 1229
USD1 2671 1452
USD1 2672 2182
USD1 2130 1234

file3.txt:
CCD1 2370 0229
CCD1 2371 1152
CCD1 2372 1182
CCD1 2030 1234
CCD1 2374 1701

I need a awk script that locate the line that has a "word"
repeated (1234, 0229,...), then print out this lines, with a line identifications ("$1 - $2").

print out like this:
CCD1-2030,USD1-2130,APA1-2950 Repeat: 1234
APA1-2902,CCD1-2370 Repeat: 0229

ps. Sorry my english, I live in Brazil.
 
Hi

You did not explained the sorting criteria, so I list the matches in the order they ar found :
Code:
awk -vr='1234 0229' 'BEGIN{split(r,a)}{for(i in a)if($3==a[i])f[i]=f[i](f[i]?",":"")$1"-"$2}END{for(i in a)print f[i]" Repeat: "a[i]}' file?.txt
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
Hi, feherke


1. What do this option "-vr"?

2. I don't know the numbers that are repeted(1234, 0229, ...), I will find it in this files, then print out like this:
CCD1-2030,USD1-2130,APA1-2950 Repeat: 1234
APA1-2902,CCD1-2370 Repeat: 0229
so on ...

thank you!
 
Hi

kcluna said:
1. What do this option "-vr"?
Is -v. The [tt]r='1234 0229'[/tt] part is the initialization of the variable [tt]r[/tt] with the given value.
man awk said:
-v var=val
--assign var=val
Assign the value val to the variable var, before
execution of the program begins. Such variable
values are available to the BEGIN block of an AWK
program.
kcluna said:
2. I don't know the numbers that are repeted(1234, 0229, ...)
Do you mean, the [tt]awk[/tt] script itself has to find the numbers with more then one occurences ? Then :
Code:
awk '{f[$3]=f[$3](f[$3]?",":"")$1"-"$2}END{for(i in f)if(index(f[i],","))print f[i]" Repeat: "i}' file?.txt
Tested witk [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
OK! Feherke

it's resolved my problem.

Look this print out in 3 files:
APA1-2901,USD1-2141 Repeat: 2901
APA1-2250,USD1-2422 Repeat: 2002
APA1-2905,USD1-2192 Repeat: 2905
APA1-2220,USD1-2653 Repeat: 2009
APA1-2255,CCD1-2030,USD1-2437 Repeat: 1234
APA1-2205,USD1-2403 Repeat: 1710
APA1-2226,USD1-2182 Repeat: 0403
APA1-4455,CCD1-4455,USD1-4455 Repeat: 4455
APA1-2736,USD1-2669 Repeat: 4917
APA1-2224,USD1-3510 Repeat: 3510
APA1-2245,USD1-2465 Repeat: 1412
APA1-2716,USD1-2490 Repeat: 1627
APA1-2274,USD1-2171 Repeat: 4553
APA1-2786,USD1-2138 Repeat: 2244
APA1-2233,CCD1-2370 Repeat: 1701
APA1-2244,USD1-2496 Repeat: 1916
APA1-2294,USD1-2618 Repeat: 0204
APA1-4444,CCD1-4444,USD1-4444 Repeat: 4444
CCD1-2032,USD1-2642 Repeat: 2612
APA1-2733,USD1-2495 Repeat: 2404
CCD1-2028,USD1-2412 Repeat: 2387


Now I will apply in 75 files.
Thank you!
 
Hi

kcluna said:
Now I will apply in 75 files.
No idea about your shell skill, so better I mention it. For 75 files you have to change the question mark ( ? ) to asterisk ( * ) in the file name wildcard :
Code:
[s]awk ' ... ' file[red]?[/red].txt[/s]

awk ' ... ' file[red]*[/red].txt

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top