Sorting log files advice

tar565 · Apr 12, 2005

I have a large daily log file created from merging 2 smaller log files. I need to sort the file by time. All of the fields are delimited by space. Originally I was thinking of using a hash with the Time as a key and the line as a value but I cant as not all the times are unique.

Any ideas?? (Time is of the format hh:mm:ss)

PaulTEG · Apr 12, 2005

Just use an array, assuming the time is the first entry in the log, it's an easy job to sort it.

The constraint would be how large is large, and how much RAM is available to the process
--Paul

cigless ...

MikeLacey · Apr 12, 2005

Hi,

If you use Time and another, sequential, value for the hash key that should get you around the problem.

So for the two log file entries:

12:41:32 First log entry
12:41:32 Second log entry

Append a value to the Time column before you use it as a hash key, like this:

12:41:32-1 First log entry
12:41:32-2 Second log entry

That will let you maintain the order of your log file entries.

When you come to display it you'll be able to strip the -1 etc from the Time column.

Mike

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

PaulTEG · Apr 12, 2005

Thinking again, it doesn't matter really which log file has precedence, does it?

Unless you have a serious level of granularity on the time stamp?

Are both logfiles based on the same machine? time?

It would be a lot easier, and less memory intensive if you still had the two original files, and merged in order then
--Paul

cigless ...

tar565 · Apr 12, 2005

I have the 2 original files and both are based on the same machine time.

However I cant use the sequential counter for the hash key as there is no guarantee the time and counter combo in file 1 will not be found in file 2.

eg 12:15:32-1 could very well be in file 2 also.

rharsh · Apr 12, 2005

I agree with the approach PaulTEG suggested, and I don't think you really need to worry about storing everything in a hash. This code is a bit messy and has lots of room for improvement, but it might give you a start.

Also, in the smaller logs files, I separated the timestamp from the text with a tab, so you'll need to adjust that to match your logs.

Code:

open LOGA, "< loga.txt" or die "Cannot open loga.txt\n$!\n";
open LOGB, "< logb.txt" or die "Cannot open logb.txt\n$!\n";

my ($line_a, $line_b, $date_a, $date_b);

while ( ! eof(LOGA) && ! eof(LOGB)) {
    unless (defined($line_a)) {
        chomp($line_a = <LOGA>);
        if ($line_a =~ /^\s*$/) {   # Skip Blank Lines - Log a
            undef $line_a;
            next;
        } else {
            $date_a = (split("\t", $line_a))[0];
        }
    }
    unless (defined($line_b)) {
        chomp($line_b = <LOGB>);
        if ($line_b =~ /^\s*$/) {   # Skip Blank Lines - Log b
            undef $line_b;
            next;
        } else {
            $date_b = (split("\t", $line_b))[0];
        }
    }
    
    my $sort = $date_a cmp $date_b;
    
    if ($sort eq -1) {
        print "$line_a\n";
        undef $line_a; undef $date_a;
    } elsif ($sort eq 1) {
        print "$line_b\n";
        undef $line_b; undef $date_b;
    } else {
        print "$line_a\n", "$line_b\n";
        undef $line_a; undef $date_a;
        undef $line_b; undef $date_b;
    }
}

print "$line_a\n" if defined ($line_a);
print "$line_b\n" if defined ($line_b);
my $fh = eof(LOGA) ? \*LOGB : \*LOGA;
while (<$fh>) {
    next if /^\s*$/;
    print;
}

close LOGA;
close LOGB;

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Sorting log files advice

tar565

Programmer

PaulTEG

Technical User

MikeLacey

MIS

PaulTEG

Technical User

tar565

Programmer

rharsh

Technical User

Similar threads

Part and Inventory Search

Sponsor