Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Command line Log Parsing

Status
Not open for further replies.

jakrabit

MIS
Jan 4, 2001
133
US
We use an analytics service for our web traffic (which looks to the IIS logs). Problem is, we pay by the line. So we needed an automated parsing tool to remove the heartbeat checks, bot and crawler lines (which is quite a few lines).

With some trial and error, I put together the following batch file (use .cmd for your file extension, as the command interpreter requires 32 bit, and .bat is 16 bit):

NOTE: you'll need the directory structure you see in the batch file, and you'll need to change any references to folder structure to your environment. You'll also need an exclude.txt (which is the lines you want excluded from your parsed log)

****************************************************************

@echo off
REM **** Move log files that have not been modified in the past 24 hours to the processing folder
robocopy c:\Extranetlogs\Jan_Logs\ c:\ExtranetLogs\Jan_Logs\processing /MOV /MINAGE:1

REM **** place all lines not excluded (exclusions found in the exclude.txt) in the parsed.log file
cd c:\extranetlogs\jan_logs\processing
for %%i in (*.log) do findstr /v /i /g:"c:\ExtranetLogs\exclude.txt" "%%i" >>parsed.log

REM **** Move parsed.log and delete the processed logs
robocopy c:\Extranetlogs\Jan_Logs\processing c:\ExtranetLogs\Jan_Logs\Parsed parsed.log /MOV
del /q c:\extranetlogs\jan_logs\processing\*.log

REM **** determine current date, and subtrace 1 (accounting for 30/31/29 day months, Leap Year)
set yyyy=

set $tok=1-3
for /f "tokens=1 delims=.:/-, " %%u in ('date /t') do set $d1=%%u
if "%$d1:~0,1%" GTR "9" set $tok=2-4
for /f "tokens=%$tok% delims=.:/-, " %%u in ('date /t') do (
for /f "skip=1 tokens=2-4 delims=/-,()." %%x in ('echo.^|date') do (
set %%x=%%u
set %%y=%%v
set %%z=%%w
set $d1=
set $tok=))

if "%yyyy%"=="" set yyyy=%yy%
if /I %yyyy% LSS 100 set /A yyyy=2000 + 1%yyyy% - 100

set CurDate=%mm%/%dd%/%yyyy%
set dayCnt=%1

if "%dayCnt%"=="" set dayCnt=1

REM Substract your days here
set /A dd=1%dd% - 100 - %dayCnt%
set /A mm=1%mm% - 100

:CHKDAY
if /I %dd% GTR 0 goto DONE
set /A mm=%mm% - 1
if /I %mm% GTR 0 goto ADJUSTDAY
set /A mm=12
set /A yyyy=%yyyy% - 1

:ADJUSTDAY
if %mm%==1 goto SET31
if %mm%==2 goto LEAPCHK
if %mm%==3 goto SET31
if %mm%==4 goto SET30
if %mm%==5 goto SET31
if %mm%==6 goto SET30
if %mm%==7 goto SET31
if %mm%==8 goto SET31
if %mm%==9 goto SET30
if %mm%==10 goto SET31
if %mm%==11 goto SET30
REM ** Month 12 falls through

:SET31
set /A dd=31 + %dd%
goto CHKDAY

:SET30
set /A dd=30 + %dd%
goto CHKDAY

:LEAPCHK
set /A tt=%yyyy% %% 4
if not %tt%==0 goto SET28
set /A tt=%yyyy% %% 100
if not %tt%==0 goto SET29
set /A tt=%yyyy% %% 400
if %tt%==0 goto SET29

:SET28
set /A dd=28 + %dd%
goto CHKDAY

:SET29
set /A dd=29 + %dd%
goto CHKDAY

:DONE
if /I %mm% LSS 10 set mm=0%mm%
if /I %dd% LSS 10 set dd=0%dd%

REM **** Rename parsed.log to Previous Date (yyyymmdd.log)
rename c:\extranetlogs\Jan_Logs\Parsed\parsed.log u_ex%yyyy%%mm%%dd%.log


with this batch file i've been able to parse 4,000,000 line logs down to 200,000



Shane
and now for the impressive abbreviations:
DOA, SOL, AWOL, PEBKAC, id10t, FUBAR
 
Thanks for the article.

I was able to write alternate code thought should post it for anyone interested.

This script will remove all lines that contain a string from the input file.

# Script CompressLog.txt
# Input arguments
var string file # file path
var string str # string to search for

# Read file in
var string in, out, line
cat $file > $in

stex -c ("]^"+$str+"^") $in > $line
while ($line <> "")
do
echo $line >> $out
stex -c ("^"+"\n"+"^]") $in > null
stex -c ("]^"+$str+"^") $in > $line
done

# Write file back
echo $out > { echo $file }

This script here is in biterscripting. Save it in file C:/CompressLog.txt. Execute it as

# This is all one line ---->
script "C:/CompressLog.txt" file("C:/Server.log") str("GET /heartbeat.asp")

will remove all lines containing that text from file C:/Server.log. If you want to user regular expressions, use the -c option of stex command. Hope this helps.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top