Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

fileA a x b, fileB a x c, how to get a file which is a x b x c 1

Status
Not open for further replies.

metaphilex

Instructor
Aug 29, 2010
3
US
Hi, all

FileA (each firm has two records)

firm1 rec1
firm1 rec2
firm2 rec1
firm2 rec2

FileB (each firm owned by two institutions)

firm1 ppf1
firm2 ppf1
firm1 ppf2
firm2 ppf2

I want to do a "firm x records x owner" expansion
(if the first field in each row of fileB is the same as the first element of a row in fileA, append the same-first-element row in fileA
to the end of the corresponding row in fileB, the result should be like below:)

firm1 ppf1 firm1 rec1
firm1 ppf1 firm1 rec2
firm2 ppf1 firm2 rec1
firm2 ppf1 firm2 rec2
firm1 ppf2 firm1 rec1
firm1 ppf2 firm1 rec2
firm2 ppf2 firm2 rec1
firm2 ppf2 firm2 rec2

Thank you,
Meta

 
Show us your code so far... which part are you stuck with?

If you haven't made a start yet there are many threads in this forum about "comparing 2 files" and similar which should give you some clues.

Annihilannic.
 
Hi, Annihilannnic
my code, as below, is not efficient since it takes a really long time to run on two large-size input files. I am sure there must be much more efficient coding than I did.

awk 'BEGIN { FS=OFS="\t"}
NR==FNR{ara[$1,$2]=$0;ara2[$1]=1; next}
{
if(ara2[$1]) {
for(s in ara) {
split(s,con,"\034"); if($1==con[1]&&con[1]!="") {print $0,ara}
}
}
else {
print $0,"",""
}
} ' FileA FileB
 
It looks pretty good to me!

Since you're basically creating a cartesian product of the input files it's bound to be pretty slow when they are large.

How slow are we talking? How many lines in the input files?

Annihilannic.
 
What about this ?
Code:
awk  'BEGIN { FS=OFS="\t"}
NR==FNR{ara[$1,++ara2[$1]]=$0; next}
{
if(ara2[$1])
  for(s=1;s<=ara2[$1];++s) print $0,ara[$1,s]
else
  print $0,"",""
}  ' FileA FileB

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
PHV, thanks

I like the idea " ara[$1,++ara2[$1]]=$0 ", creative and inspiring.
It is much faster.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top