Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to extract data from a large log file? 5

Status
Not open for further replies.

stla

IS-IT--Management
Mar 12, 2003
190
DE
I have a log file around 1GB in size.

Obviously this file is to big to open in any text programme nor can I use commands such as 'tail' or 'more' because of the number of lines.

Each line of the log file contains a date and time entry.

Is it possible to do the following:

"Find me all log entries from 12.12.07 until 13.12.07 and copy this to a new file."

Best regards
 
First I would try grep command, and its variants, e.g.
fgrep 12.12.07 oldfile > newfile

This may be not sufficient, e.g. when the string 12.12.07 also is in other lines, but you want to get it only if it is in a certain postion.
You may have to look at awk command. But syntax is more difficult than grep.
sed also may be considered.

hope this helps
 
If the date is at the begining of each row you can use these commands:

fgrep "^12.12.07" logfile >foobar.txt
fgrep "^13.12.07" logfile >>foobar.txt

or as one command:

grep -E "^12.12.07|^13.12.07" logfile >foobar.txt
 
If you do this lots (like splitting web log reports) I usually write a perl script to read through the log and write each into a separate report file. This is way more efficient and as you will notice that if the file is chronically increasing the greps will always have to read the whole file because they can't "exit". So if the data of interest is in the beginning, you will still have to read the whole file for each report.

It is probably a 20 line perl script, BTW.

Of course if this is a one time thing, then forget-about-it, just do it the grundgy grep way.
eugene
 
I think you will need to use awk.
You will probably need to test around a bit to find out what field numbers awk assigns to your log lines (use head to make a sample file),
then run it against the whole file using an IF to determine if the ip address field falls between 12.12.x.x and 13.12.x.x and if it does to append that line to a results file.
 
Stfaprc, I think what he wants is the log entries between December 12, 2007 and December 13, 2007. Ip addresses wouldn't fall in the 12.12.07 to 13.12.07 range unless that's just a subnet.
 
Grepping for patterns to extract a subset of the log can miss things, especially if some of the log lines are multi-line (example, Java stack traces). [tt]sed[/tt] is a good way to extract subsets of the log file...
Code:
# Select from one pattern to another
sed -n '/^12.12.07 /,/^14.12.07 /p' orig.log > subset.log

# Select from line 100,000 to line 200,000
sed -n '100000,200000p' orig.log > subset.log
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top