Comparing file contents unix

ksdh · Feb 18, 2009

HI
I have 2 files A and B
A
1
1234
986740982
78264182
89264162
9128635
5
6
7
8

B
1
4
5
6
7

As you can see , some of the records in B are in A. What is the best way that we can compare the 2 files? I mean lets say i want to output the comparison of both the files where the entries match (or dont match). I tried the while command but could not really get what i wanted , any kind of help is appreciated.

Thanks

feherke · Feb 18, 2009

Hi

[tt]diff[/tt] ?

Code:

[blue]master #[/blue] diff A B
2,6c2
< 1234
< 986740982
< 78264182
< 89264162
< 9128635
---
> 4
10d5
< 8

[blue]master #[/blue] diff -y A B
1                                           1
1234                                      | 4
986740982                                 <
78264182                                  <
89264162                                  <
9128635                                   <
5                                           5
6                                           6
7                                           7

Feherke.

http://rootshell.be/~feherke/

ksdh · Feb 18, 2009

Feherke
Thanks , but the diff command would nt work out here .
The files contain thousands of entries.

So basically i would have to run a command where if the number or string is in file B and file A the count is incremented and if the content from file B does not match A, the counter stays the way it is (no increment).

I can give you a typical example, lets say we have a whitelist and we have a stream of raw data coming in. The raw data contains all the numbers and the white list contains only x amount that are allowed to pass through. Once the raw data is passed through a whitelist filter, i get another file. Now i want to compare if the filtered file was actually filtered and that whitelist works.

I hope i have been able to explain the situation.

Thanks

feherke · Feb 18, 2009

Hi

In your example file A being the raw and file B the white list ? Maybe like this ?

Code:

[gray]# allowed by white list[/gray]
[blue]master #[/blue] grep -f B -x A
1
5
6
7

[gray]# rejected by white list[/gray]
[blue]master #[/blue] grep -f B -x -v A
1234
986740982
78264182
89264162
9128635
8

Tested with GNU [tt]grep[/tt].

Feherke.

http://rootshell.be/~feherke/

ksdh · Feb 18, 2009

Feherke
Sorry but the -f option doesnt work on my solaris?
grep -f IMSI2 -x IMSI1
grep: illegal option -- f

Also, what i want to do here is

A------> Raw file
B------> Whitelist
C------> Filtered file

Lets say each of them contains only numbers.
I want to check , how many numbers in the filtered file (came from raw) are there in the whitelist. Ideally all of them should be in the whitelist (filter). But i still want to compare ----is each number in the filtered file also present in the whitelist. If so, i want to count the number of entries that matched.

Thanks

feherke · Feb 18, 2009

Hi

Show us a sample of the desired output too.

Feherke.

http://rootshell.be/~feherke/

ksdh · Feb 18, 2009

All i want to do here, is run a for or while loop on file C to compare the entries with file B but dont know how to do it. Once thats doen, i want to count the number of entries that matched in both files.
The output would just give us a count of the number of entries that matched, a simple number.
Apologies if i could not explain the problem in my previous messages.

Is it clear now?

PHV · Feb 18, 2009

What have YOU tried so far and where in YOUR code are you stuck ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

elgrandeperro · Feb 18, 2009

Do you only care about the entries existence? If the original order is not significant, then you usually sort both inputs then use comm on the sorted files with the -3 option. diff would give the contextual difference, not what you want if you just want to check solely for existence.

If you don't care about multiple entries, then use uniq to supress multiple entries on input.

ksdh · Feb 18, 2009

I was trying
cat C | while read line; do grep -i $line B ;done | wc -l

I dont know if i applied the right logic.

Thanks

elgrandeperro · Feb 18, 2009

To fix your code, it would be:

cat C | while read line
do
egrep -i "^$line$" B > /dev/null
if [ $? -eq 0 ]
then echo $line in B
fi
done

You need to egrep it because you don't want a substring match.
(BTW, I am a marginal shell programmer)

Annihilannic · Feb 18, 2009

/usr/xpg4/bin/grep support the -f option.

Annihilannic.

ksdh · Feb 19, 2009

Elgrandeperro
Thanks for your post. Your script does the job but could you please explain the logic behind
egrep -i "^$line$" B > /dev/null
if [ $? -eq 0 ]

What does ^$line$ mean
and $? -eq 0

Thanks

elgrandeperro · Feb 19, 2009

grep finds all substrings so for instance a grep with like:

grep 47
in a file matches
147
1147

etc. I thought there was an exact match grep, but at least does not have it.

egrep is a regular expression grep. ^ means begins with, and an ending $ means end of line. So you ^47$ would be "starts with 47 and ends with EOL. We use " in your example because we want the variable to expand.

$? means the return value of the grep. The expected return value of any command is at the bottom of its man page. Most return 0 on success, in this case it returns 0 on a match.

feherke · Feb 19, 2009

Hi

elgrandeperro said:
I thought there was an exact match grep, but at least does not have it.

Annihilannic said:
/usr/xpg4/bin/grep support the -f option.

And hopefully supports the -x or --line-regexp option too.

Feherke.

http://rootshell.be/~feherke/

ksdh · Feb 19, 2009

thanks elgrandeperro

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Comparing file contents unix

ksdh

Technical User

feherke

Programmer

ksdh

Technical User

feherke

Programmer

ksdh

Technical User

feherke

Programmer

ksdh

Technical User

PHV

MIS

elgrandeperro

Technical User

ksdh

Technical User

elgrandeperro

Technical User

Annihilannic

MIS

ksdh

Technical User

elgrandeperro

Technical User

feherke

Programmer

ksdh

Technical User

Similar threads

Part and Inventory Search

Sponsor