How to efficiently implement file comparison

whn · Aug 5, 2016

Let's say I have a baseline file like this:

Code:

4.1.27-4.1.2-amd64-[COLOR=red]1cccde6e81b2c42b[/color]
cmos2q installed and running
lm-sensors2q installed and running
lspci2q installed and running
mcelog2q installed and running
[COLOR=blue]mpt2q not needed[/color]
pmbus2q installed and running
smartmon2q installed and running

And a newly created file looks like this:

Code:

4.1.27-4.1.2-amd64-[COLOR=red]b2c2cfc7c5cc6d49[/color]
cmos2q installed and running
lm-sensors2q installed and running
lspci2q installed and running
mcelog2q installed and running
pmbus2q installed and running
smartmon2q installed and running
[COLOR=blue]mpt2q not needed[/color]

In theory, these two files are the SAME, because:
1) the first line (in red) can be ignored
2) the blue lines are the same even if at different location.

So, in this case, we cannot simply use File::Compare.

What I did was read both files into hashes and each line is a hash key. The first line is not in the hashes. Then I can compare hash keys through a loop. The implementation is omitted cause it's too simple.

I sense there must be a smarter way to implement this. But I don't know how. So I am here to ask experts for help.

Thanks!

prex1 · Aug 6, 2016

First thing, you don't need to read both files into hashes, but just one, then you read the second file line by line to check if it is in the hash.
Also you should decide what to do with equal lines: you could have two equal lines in a file and only one in the second. Are these files considered the same or different? With hashes you won't even notice (unless you explicitly check this condition).
If the files are quite similar as in your example, I guess the best procedure would be like this:

1)an array of strings is initially empty
2)first compare the size of both files (after having skipped the first line if relevant) and exit with 0 if they are different
3)read the first line of both files
4)if the array is not empty read the next line from the first file only and go to 8 (exit with 0 if the first file is at eof)
5)exit with 1 if both files are at eof, with 0 if one file only is at eof
6)read the next line from both files
7)if the two lines are the same go to 6
8)check if the first string is in the array (go to 10 if the array is empty)
9)if it is in, discard the string from the array and go to 4
10)push the line of the second file onto the array of strings
11)read the next line from the second file only: exit with 0 if the second file was at eof
12)if the two lines at hand are the same go to 4
13)go to 10

Should work, but I'm not sure, you'll have to check.
Of course if the files are big and the distance of the equal but displaced lines can also be big, this procedure could become slow. With hashes wouldn't necessarily be more efficient though.

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

whn · Aug 8, 2016

Thank you, prex.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to efficiently implement file comparison

whn

Programmer

prex1

Programmer

whn

Programmer

Similar threads

Part and Inventory Search

Sponsor