Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparing file contents unix

Status
Not open for further replies.

ksdh

Technical User
Oct 24, 2007
41
IN
HI
I have 2 files A and B
A
1
1234
986740982
78264182
89264162
9128635
5
6
7
8

B
1
4
5
6
7

As you can see , some of the records in B are in A. What is the best way that we can compare the 2 files? I mean lets say i want to output the comparison of both the files where the entries match (or dont match). I tried the while command but could not really get what i wanted , any kind of help is appreciated.

Thanks
 
Hi

[tt]diff[/tt] ?
Code:
[blue]master #[/blue] diff A B
2,6c2
< 1234
< 986740982
< 78264182
< 89264162
< 9128635
---
> 4
10d5
< 8

[blue]master #[/blue] diff -y A B
1                                           1
1234                                      | 4
986740982                                 <
78264182                                  <
89264162                                  <
9128635                                   <
5                                           5
6                                           6
7                                           7

Feherke.
 
Feherke
Thanks , but the diff command would nt work out here .
The files contain thousands of entries.

So basically i would have to run a command where if the number or string is in file B and file A the count is incremented and if the content from file B does not match A, the counter stays the way it is (no increment).

I can give you a typical example, lets say we have a whitelist and we have a stream of raw data coming in. The raw data contains all the numbers and the white list contains only x amount that are allowed to pass through. Once the raw data is passed through a whitelist filter, i get another file. Now i want to compare if the filtered file was actually filtered and that whitelist works.

I hope i have been able to explain the situation.

Thanks
 
Hi

In your example file A being the raw and file B the white list ? Maybe like this ?
Code:
[gray]# allowed by white list[/gray]
[blue]master #[/blue] grep -f B -x A
1
5
6
7

[gray]# rejected by white list[/gray]
[blue]master #[/blue] grep -f B -x -v A
1234
986740982
78264182
89264162
9128635
8
Tested with GNU [tt]grep[/tt].

Feherke.
 
Feherke
Sorry but the -f option doesnt work on my solaris?
grep -f IMSI2 -x IMSI1
grep: illegal option -- f

Also, what i want to do here is

A------> Raw file
B------> Whitelist
C------> Filtered file

Lets say each of them contains only numbers.
I want to check , how many numbers in the filtered file (came from raw) are there in the whitelist. Ideally all of them should be in the whitelist (filter). But i still want to compare ----is each number in the filtered file also present in the whitelist. If so, i want to count the number of entries that matched.


Thanks
 
All i want to do here, is run a for or while loop on file C to compare the entries with file B but dont know how to do it. Once thats doen, i want to count the number of entries that matched in both files.
The output would just give us a count of the number of entries that matched, a simple number.
Apologies if i could not explain the problem in my previous messages.

Is it clear now?
 
What have YOU tried so far and where in YOUR code are you stuck ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Do you only care about the entries existence? If the original order is not significant, then you usually sort both inputs then use comm on the sorted files with the -3 option. diff would give the contextual difference, not what you want if you just want to check solely for existence.

If you don't care about multiple entries, then use uniq to supress multiple entries on input.
 
I was trying
cat C | while read line; do grep -i $line B ;done | wc -l

I dont know if i applied the right logic.

Thanks
 
To fix your code, it would be:

cat C | while read line
do
egrep -i "^$line$" B > /dev/null
if [ $? -eq 0 ]
then echo $line in B
fi
done

You need to egrep it because you don't want a substring match.
(BTW, I am a marginal shell programmer)
 
Elgrandeperro
Thanks for your post. Your script does the job but could you please explain the logic behind
egrep -i "^$line$" B > /dev/null
if [ $? -eq 0 ]

What does ^$line$ mean
and $? -eq 0

Thanks
 
grep finds all substrings so for instance a grep with like:

grep 47
in a file matches
147
1147

etc. I thought there was an exact match grep, but at least does not have it.

egrep is a regular expression grep. ^ means begins with, and an ending $ means end of line. So you ^47$ would be "starts with 47 and ends with EOL. We use " in your example because we want the variable to expand.

$? means the return value of the grep. The expected return value of any command is at the bottom of its man page. Most return 0 on success, in this case it returns 0 on a match.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top