Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sorting log files advice

Status
Not open for further replies.

tar565

Programmer
Jan 24, 2005
39
0
0
IE
I have a large daily log file created from merging 2 smaller log files. I need to sort the file by time. All of the fields are delimited by space. Originally I was thinking of using a hash with the Time as a key and the line as a value but I cant as not all the times are unique.

Any ideas?? (Time is of the format hh:mm:ss)
 
Just use an array, assuming the time is the first entry in the log, it's an easy job to sort it.

The constraint would be how large is large, and how much RAM is available to the process
--Paul

cigless ...
 
Hi,

If you use Time and another, sequential, value for the hash key that should get you around the problem.

So for the two log file entries:

12:41:32 First log entry
12:41:32 Second log entry

Append a value to the Time column before you use it as a hash key, like this:

12:41:32-1 First log entry
12:41:32-2 Second log entry

That will let you maintain the order of your log file entries.

When you come to display it you'll be able to strip the -1 etc from the Time column.


Mike

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

 
Thinking again, it doesn't matter really which log file has precedence, does it?

Unless you have a serious level of granularity on the time stamp?

Are both logfiles based on the same machine? time?

It would be a lot easier, and less memory intensive if you still had the two original files, and merged in order then
--Paul

cigless ...
 
I have the 2 original files and both are based on the same machine time.

However I cant use the sequential counter for the hash key as there is no guarantee the time and counter combo in file 1 will not be found in file 2.

eg 12:15:32-1 could very well be in file 2 also.
 
I agree with the approach PaulTEG suggested, and I don't think you really need to worry about storing everything in a hash. This code is a bit messy and has lots of room for improvement, but it might give you a start.

Also, in the smaller logs files, I separated the timestamp from the text with a tab, so you'll need to adjust that to match your logs.

Code:
open LOGA, "< loga.txt" or die "Cannot open loga.txt\n$!\n";
open LOGB, "< logb.txt" or die "Cannot open logb.txt\n$!\n";

my ($line_a, $line_b, $date_a, $date_b);

while ( ! eof(LOGA) && ! eof(LOGB)) {
    unless (defined($line_a)) {
        chomp($line_a = <LOGA>);
        if ($line_a =~ /^\s*$/) {   # Skip Blank Lines - Log a
            undef $line_a;
            next;
        } else {
            $date_a = (split("\t", $line_a))[0];
        }
    }
    unless (defined($line_b)) {
        chomp($line_b = <LOGB>);
        if ($line_b =~ /^\s*$/) {   # Skip Blank Lines - Log b
            undef $line_b;
            next;
        } else {
            $date_b = (split("\t", $line_b))[0];
        }
    }
    
    my $sort = $date_a cmp $date_b;
    
    if ($sort eq -1) {
        print "$line_a\n";
        undef $line_a; undef $date_a;
    } elsif ($sort eq 1) {
        print "$line_b\n";
        undef $line_b; undef $date_b;
    } else {
        print "$line_a\n", "$line_b\n";
        undef $line_a; undef $date_a;
        undef $line_b; undef $date_b;
    }
}

print "$line_a\n" if defined ($line_a);
print "$line_b\n" if defined ($line_b);
my $fh = eof(LOGA) ? \*LOGB : \*LOGA;
while (<$fh>) {
    next if /^\s*$/;
    print;
}

close LOGA;
close LOGB;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top