Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Join command

Status
Not open for further replies.

Romeu

MIS
Jun 25, 2002
9
PT
I'm trying to join this file:

1|304500001||1|0|304500001|(...)
2|304500002||2|0|304500002|(...)
3|304500003||3|0|304500003|(...)
4|304500004||4|0|304500004|(...)
5|304500005||5|0|304500005|(...)
6|304500006||6|0|304500006|(...)
7|304500007||7|0|304500007|(...)
9|304500008||9|0|304500008|(...)
11|304500009||11|0|304500009|(...)
(...)

with this one:

1|304500001|3|105|2956318|(...)
2|304500002|3|105|2057057|(...)
3|304500003|1|105|2980777|(...)
4|304500004|3|105|2980786|(...)
5|304500005|1|105|2066079|(...)
6|304500006|3|105|2057060|(...)
7|304500007|3|105|2981167|(...)
9|304500008|3|105|2981731|(...)
11|304500009|3|105|2981734|(...)
(...)

Strangely the result of join -t "|" file1 file2 if this:

1|304500001||1|0|304500001|(...)
2|304500002||2|0|304500002|(...)
3|304500003||3|0|304500003|(...)
4|304500004||4|0|304500004|(...)
5|304500005||5|0|304500005|(...)
6|304500006||6|0|304500006|(...)
7|304500007||7|0|304500007|(...)
9|304500008||9|0|304500008|(...)
90|304500074||90|0|304500074|(...)
92|304500075||92|0|304500075|(...)
93|304500076||93|0|304500076|(...)
(...)

Troughtout the rest of the file, it will only join lines starting by 9(9). I've tried various sorting before joining the files but I can't seem to get a valid join. What's happening?

Thanks for any help,
Romeu
 
What field do you want to join on? What should the output file look like?

Greg.
 
I was trying to join the two files by the first field ( I believe that's join's default option). The output should look like:

[key field][rest of file 1][rest of file 2]

For istance, the first line would be:

1|304500001||1|0|304500001|(...)|304500001|3|105|29563|(...)

Thanks,
Romeu



 
I see what you mean. The problem seems to be join is treating the join fields as a string data type. From the man page for join ...

file1 and file2 must be sorted in increasing collating sequence as determined by LC_COLLATE on the fields on which they are to be joined, normally the first in each line (see sort(1))

In other words it's joined keys starting 1 to 9, it then comes across one starting with 1 (in this case 11) so it's ignored, and it ignores subsequent lines until it reaches the ones starting worth 9 again.

I suppose a way around would be to sort the input files into the order join expects (a simple "sort filenane" should suffice), then do the join, and then sort the resultant file back into the correct order - see man sort for how to do this properly - I can't remember :-(

Greg.
 
Good thinking, but it didn't work either. Here is the sorted inputs (just the join field):

100000
100001
100002
100003
100004
100005
100006
100007
100008
100009
10000
100010

And the output:

100000
100001
100002
100003
100004
100005
100006
100007
100008
100009
100010

The "10000" is missing... I've tried all sort flags and the result is always the same.

Thanks,
Romeu
 
ok ignore my last post. It does the same as your method of definind the field delim character. I am able go get it to work both ways, though, given the input you posted above. I can't reproduce your output. It worked both with and without the -n flag.

jaa
 
Thanks anyway. From what Greg posted and what I've read, I believe the difference in my join and yours can be related to the global env LC_COLLATE. However I wasn't able to change it correctly. I ended up solving my problem by padding the join field with leading zeros, so it would have always the same lenght. For that I used gawk:

gawk 'BEGIN{FS="|"}{printf "%#07s",$1 ; print "|"$0}'

Do you guys know a simpler/speedier way of doing this?

Thanks,
Romeu
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top