Scripting help needed... 5

Cybex1 · Sep 15, 2011

I have a file “capinfos.txt” which contains:
[file]
pcap0000.pcap
Thu Feb 10 15:58:48 2011
Thu Feb 10 19:27:26 2011
pcap0001.pcap
Thu Feb 10 19:27:26 2011
Thu Feb 10 20:35:35 2011
pcap0002.pcap
Thu Feb 10 20:35:35 2011
Mon Feb 14 11:10:43 2011
pcap0003.pcap
Mon Feb 14 11:10:43 2011
Mon Feb 14 12:25:11 2011
pcap0004.pcap
Mon Feb 14 12:25:11 2011
Mon Feb 14 14:31:13 2011
pcap0005.pcap
Mon Feb 14 14:31:13 2011
Mon Feb 14 22:34:23 2011
pcap0006.pcap
Mon Feb 14 22:34:23 2011
Tue Feb 15 11:21:04 2011
pcap0007.pcap
Tue Feb 15 11:21:04 2011
Tue Feb 15 11:33:34 2011
pcap0008.pcap
Tue Feb 15 11:33:34 2011
Tue Feb 15 12:31:50 2011
pcap0009.pcap
Tue Feb 15 12:31:50 2011
Tue Feb 15 12:51:50 2011
pcap0010.pcap
Tue Feb 15 12:51:50 2011
Tue Feb 15 13:25:03 2011
pcap0011.pcap
Tue Feb 15 13:25:03 2011
Tue Feb 15 14:15:05 2011
pcap0012.pcap
Tue Feb 15 14:15:05 2011
Tue Feb 15 14:37:02 2011
pcap0013.pcap
Tue Feb 15 14:37:02 2011
Tue Feb 15 15:18:09 2011
pcap0014.pcap
Tue Feb 15 15:18:09 2011
Tue Feb 15 17:27:15 2011
pcap0015.pcap
Tue Feb 15 17:27:15 2011
Wed Feb 16 13:10:12 2011
pcap0016.pcap
Wed Feb 16 13:10:12 2011
Wed Feb 16 16:03:18 2011
pcap0017.pcap
Wed Feb 16 16:03:18 2011
Thu Feb 17 13:32:06 2011
pcap0018.pcap
Thu Feb 17 13:32:06 2011
Thu Feb 17 19:33:41 2011
pcap0019.pcap
Thu Feb 17 19:33:41 2011
Thu Feb 17 19:38:45 2011
pcap0020.pcap
Thu Feb 17 19:38:45 2011
Thu Feb 17 23:37:42 2011
pcap0021.pcap
Thu Feb 17 23:37:42 2011
Sun Feb 20 11:50:00 2011
[/file]

After running the following, I end up with a file called “count.txt”

Code:

gawk '2==NR%3 {print $2" "$3}' capinfos.txt|uniq -c > count.txt; gawk 'END{print $2" "$3}' capinfos.txt|uniq -c >> count.txt

cat count.txt
Returns:
      3 Feb 10
      4 Feb 14
      9 Feb 15
      2 Feb 16
      4 Feb 17

By running the following for each date I can determine the starting file name for each date:

Code:

grep -B 2 "Feb 14" capinfos.txt|head -1
Returns: 
pcap0002.pcap

I know that “Feb 14” files start with “pcap0002.pcap” and I know from the “uniq -c” in count.txt that “Feb 14” includes 4 files. How can I update count.txt to show the start and end file names for each date? Basically, I think it would be something like:

Code:

pseudo code: 
in a for loop...
set the file count in count.txt for “Feb 14” as a variable, i.e. FCOUNT in this case =”4” as well as the date... CDATE=First date ++1
grep -B 2 $CDATE capinfos.txt|head -1|gawk '{print $1" pcap"(0002  + $FCOUNT)}'
Hopefully producing “pcap0002.pcap pcap0006.pcap
”
then strip out the file count and append the new data to the date in count.txt

All said and done count.txt would ideally look like this:
[file]
Feb 10 pcap0000.pcap pcap0002.pcap
Feb 14 pcap0002.pcap pcap0006.pcap
Feb 15 pcap0006.pcap pcap0015.pcap
Feb 16 pcap0015.pcap pcap0017.pcap
Feb 17 pcap0017.pcap pcap0021.pcap
[/file]

Any help is greatly appreciated!

feherke · Sep 16, 2011

Hi

Code:

awk 'NR%3==1{f=$0;next}d!=$2" "$3{print d,s,f;d=$2" "$3;s=f}' capinfos.txt

Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.

http://free.rootshell.be/~feherke/

PHV · Sep 16, 2011

Minor change:
awk 'NR%3==1{f=$0;next}d!=$2" "$3{[!]if(NR>2)[/!]print d,s,f;d=$2" "$3;s=f}' capinfos.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

SamBones · Sep 16, 2011

Wow! Both star worthy. [bigsmile]

Cybex1 · Sep 16, 2011

No kidding, These are both remarkable!! Thank both of you for taking your time to help.

feherke [medal]

PHV

Thank you!

Cybex1 · Sep 17, 2011

New problem, same script...

First off, I changed the “capinfos.txt” a bit. It now has the year more field which is needed in this next step. Here is an excerpt from that file:

Feb 10 2011 pcap0000.pcap pcap0002.pcap
Feb 14 2011 pcap0002.pcap pcap0006.pcap
Feb 15 2011 pcap0006.pcap pcap0015.pcap
Feb 16 2011 pcap0015.pcap pcap0017.pcap
Feb 17 2011 pcap0017.pcap pcap0021.pcap

The last step in this script is to run each date in the “capinfos.txt” against an application; “tcpsplice”.
Tcpsplice will take the files given to it and properly combine them into one file. The biggest problem is that only 32 of our “.pcap” files can be combined into a new file at one time, in order to keep the produced files under 2GB. A loop is needed and each date present will be passed into “tcpsplice” along with each “.pcap” file that contains data for that date. Additionally, if the number of input .pcap files is more than 32 we neep to create another tcpsplice command for a new file for the next 32 files. I have one date that has 287 pcap files... I believe a nested loop is needed to place the input pcap file names into the tcpsplice command line. Below is what I have been trying with limited success.

Code:

PSTRING=`awk '{print $0}' capinfos.txt`
SFILE=`awk '{print $4}' capinfos.txt`
EFILE=`awk '{print $5}' capinfos.txt`

#Testing..., needs to be inside the other loop after working...
#echo ${SFILE:4:4} "   "${EFILE:4:4}
#echo `seq -w ${SFILE:4:4} ${EFILE:4:4}`
	for a in `seq -w ${SFILE:4:4} ${EFILE:4:4}`
		#  Giving seq" two arguments starts the count at the first one,
		# and continues until it reaches the second.
		do
			AFILES="pcap"$a".pcap"
  			BFILES= echo -n "$AFILES "
	done

#echo $BFILES


# use for loop read capinfos.txt and build command lines from it
	for (( i=0; i<${#PSTRING[@]}; i++ ));
		do
			PDAY=`echo ${PSTRING[$i]}| awk '{print $2}'`
			PYEAR=`echo ${PSTRING[$i]}| awk '{print $3}'`
			PMONTH=`echo ${PSTRING[$i]}| awk '{print $1}'`
			PDATE=`echo ${PSTRING[$i]}| awk '{print $1" "$2}'`
			
		echo "tcpslice -w "$PYEAR"y"$PMONTH"m"$PDAY"d0h0m0s0u.pcap "$PYEAR"y"$PMONTH"m"$PDAY"d0h0m0s0u "$PYEAR"y"$PMONTH"m"$PDAY"d11h59m59s59u "$BFILES
	done

However, the code above is only outputing one line and it looks like this:
pcap0000.pcap pcap0001.pcap pcap0002.pcap tcpslice -w 2011yFebm10d0h0m0s0u.pcap 2011yFebm10d0h0m0s0u 2011yFebm10d11h59m59s59u

it should look like this:

[ideal output]
tcpslice -w 2011yFebm10d0h0m0s0u.pcap 2011yFebm10d0h0m0s0u 2011yFebm10d11h59m59s59u pcap0000.pcap pcap0001.pcap pcap0002.pcap
tcpslice -w 2011yFebm14d0h0m0s0u.pcap 2011yFebm14d0h0m0s0u 2011yFebm14d11h59m59s59u pcap0002.pcap pcap0003.pcap pcap0004.pcap pcap0005.pcap pcap0006.pcap
tcpslice -w 2011yFebm15d0h0m0s0u.pcap 2011yFebm15d0h0m0s0u 2011yFebm15d11h59m59s59u pcap0006.pcap pcap0007.pcap pcap0008.pcap pcap0009.pcap pcap0010.pcap pcap0011.pcap pcap0012.pcap pcap0013.pcap pcap0014.pcap pcap0015.pcap
tcpslice -w 2011yFebm16d0h0m0s0u.pcap 2011yFebm16d0h0m0s0u 2011yFebm16d11h59m59s59u pcap0015.pcap pcap0016.pcap pcap0017.pcap
tcpslice -w 2011yFebm17d0h0m0s0u.pcap 2011yFebm17d0h0m0s0u 2011yFebm17d11h59m59s59u pcap0017.pcap pcap0018.pcap pcap0019.pcap pcap0020.pcap pcap0021.pcap
[/ideal output]

I am sure my approach is flawed so by all means feel free to recommend any method that I can use. I think the only way my approach would work is if I had a multi dimensional array...

I also just thought of the tcpsplice output filenames for each subsequent file for those over the 32 file block limit. I think maybe underscores... tcpslice -w 2011yFebm17d0h0m0s0u_1.pcap

[headache]

feherke · Sep 18, 2011

Hi

I see you used [tt]seq[/tt]. As I found out in the past, [tt]seq[/tt] is not typically present on Unix systems. So I suppose you are using Linux. And on Linux the default shell used to be [tt]bash[/tt]. This is a Bash only solution :

Bash:

[b]while[/b] [b]read[/b] m d y f t[teal];[/teal] [b]do[/b]
  [b]eval[/b] [navy]s[/navy][teal]=([/teal] [green][i]"${f:0:4}{${f:4:4}..${t:4:4}}${t:8}"[/i][/green] [teal])[/teal]
  [b]for[/b] [teal](([/teal][navy]i[/navy][teal]=[/teal][purple]0[/purple][teal],[/teal][navy]l[/navy][teal]=[/teal][navy]${#s[@]}[/navy][teal];[/teal]i[teal]<[/teal]l[teal];[/teal]i[teal]+=[/teal][purple]32[/purple][teal]));[/teal] [b]do[/b]
    echo tcpslice -w [green][i]"${y}y${m}m${d}d0h0m0s0u.pcap"[/i][/green] [green][i]"${y}y${m}m${d}d0h0m0s0u"[/i][/green] [green][i]"${y}y${m}m${d}d11h59m59s59u"[/i][/green] [green][i]"${s[@]:i:32}"[/i][/green]
  [b]done[/b]
[b]done[/b] [teal]<[/teal] [green][i]'capinfos.txt'[/i][/green]

Feherke.

http://free.rootshell.be/~feherke/

Cybex1 · Sep 18, 2011

feherke,

This looks remarkable, but I not even sure how to implement it... You exceeded my script understanding with this... Please explain a bit.

Cybex1 · Sep 18, 2011

Nevermind, I got it. Thanks I need a bit to test.

Cybex1 · Sep 18, 2011

First off let me say, "This is great!". However, the file name sequencing is not happening when there are more than 32 files.

tcpslice -w 2011yFebm20d0h0m0s0u.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u

I imagine it would attempt to overwrite the prior file. It needs to be something like:

tcpslice -w 2011yFebm20d0h0m0s0u.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u_1.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u_2.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u

The first ones could even be:
tcpslice -w 2011yFebm20d0h0m0s0u_0.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u

Any ideas?

Cybex1 · Sep 18, 2011

This is not ideal but it works...

echo tcpslice -w "${y}y${m}m${d}d0h0m0s0u_$i.pcap"

tcpslice -w 2011yFebm20d0h0m0s0u_0.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u_32.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u_64.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u
tcpslice -w 2011yFebm20d0h0m0s0u_96.pcap 2011yFebm20d0h0m0s0u 2011yFebm20d11h59m59s59u

Cybex1 · Sep 19, 2011

I fixed the multiple output file names for a single day by adding a separate count variable. I also had a lapse in brain function and forgot to notice the month reference was "Apr"... Tcpslice needs a numeric representation for the syntax, so I had to switch them to numbers. Also, for continuity I added leading zeros to the days so everything would look the same. Below are pulled sections to show the changes. Any suggestions or modifications are welcomed! I know my solutions are usually pretty ugly...

Code:

#Replace Mmm with numerical months
sed "s/Jan/01/;s/Feb/02/;s/Mar/03/;s/Apr/04/;s/May/05/;s/Jun/06/;s/Jul/07/;s/Aug/08/;s/Sep/09/;s/Oct/10/;s/Nov/11/;s/Dec/12/"  capinfos.txt > temp.txt

#Replace single digit days with double digits
awk '{day=$2; if (length(day) == 1) day="0"day; print $1" "day" "$3" "$4" "$5}' capinfos.txt >> temp.txt


cat capinfos.txt |
while read m d y f t; do
  eval s=( "${f:0:4}{${f:4:4}..${t:4:4}}${t:8}" )
  for ((i=0,[COLOR=red]c=0,[/color red]l=${#s[@]};i<l;i+=32[COLOR=red],c++[/color red])); do
    echo tcpslice -w "${y}y${m}m${d}d0h0m0s0u_$c.pcap" "${y}y${m}m${d}d0h0m0s0u" "${y}y${m}m${d}d11h59m59s59u" "${s[@]:i:32}"
  done
done

feherke · Sep 19, 2011

Hi

Glad to see you solved the counter problem in simple and elegant way.

Regarding the month conversion and day formatting, I would prefer different approach :
[ul]
[li]Supposing your Bash is version 4 or newer, you can use an [highlight]associative array[/highlight] for months.[/li]
[li]The [highlight pink][tt]printf[/tt] function[/highlight] is more suitable for formatting and it is available in Bash too.[/li]
[/ul]

Code:

[highlight][b]declare[/b] -A [navy]month[/navy][teal]=([/teal] [teal][[/teal][green][i]'Jan'[/i][/green][teal]]=[/teal][green][i]'01'[/i][/green] [teal][[/teal][green][i]'Feb'[/i][/green][teal]]=[/teal][green][i]'02'[/i][/green] [teal][[/teal][green][i]'Mar'[/i][/green][teal]]=[/teal][green][i]'03'[/i][/green] [teal][[/teal][green][i]'Apr'[/i][/green][teal]]=[/teal][green][i]'04'[/i][/green] [teal][[/teal][green][i]'May'[/i][/green][teal]]=[/teal][green][i]'05'[/i][/green] [teal][[/teal][green][i]'Jun'[/i][/green][teal]]=[/teal][green][i]'06'[/i][/green] [teal][[/teal][green][i]'Jul'[/i][/green][teal]]=[/teal][green][i]'07'[/i][/green] [teal][[/teal][green][i]'Aug'[/i][/green][teal]]=[/teal][green][i]'08'[/i][/green] [teal][[/teal][green][i]'Sep'[/i][/green][teal]]=[/teal][green][i]'09'[/i][/green] [teal][[/teal][green][i]'Oct'[/i][/green][teal]]=[/teal][green][i]'10'[/i][/green] [teal][[/teal][green][i]'Nov'[/i][/green][teal]]=[/teal][green][i]'11'[/i][/green] [teal][[/teal][green][i]'Dec'[/i][/green][teal]]=[/teal][green][i]'12'[/i][/green] [teal])[/teal][/highlight]

[b]while[/b] [b]read[/b] m d y f t[teal];[/teal] [b]do[/b]
  [b]eval[/b] [navy]s[/navy][teal]=([/teal] [green][i]"${f:0:4}{${f:4:4}..${t:4:4}}${t:8}"[/i][/green] [teal])[/teal]
  [highlight pink][b]printf[/b] -v d [green][i]'%02d'[/i][/green] [green][i]"$d"[/i][/green][/highlight]
  [b]for[/b] [teal](([/teal][navy]i[/navy][teal]=[/teal][purple]0[/purple][teal],[/teal][navy]c[/navy][teal]=[/teal][purple]0[/purple][teal],[/teal][navy]l[/navy][teal]=[/teal][navy]${#s[@]}[/navy][teal];[/teal]i[teal]<[/teal]l[teal];[/teal]i[teal]+=[/teal][purple]32[/purple][teal],[/teal]c[teal]++));[/teal] [b]do[/b]
    echo tcpslice -w [green][i]"${y}y${[highlight]month[$m][/highlight]}m${d}d0h0m0s0u_$c.pcap"[/i][/green] [green][i]"${y}y${[highlight]month[$m][/highlight]}m${d}d0h0m0s0u"[/i][/green] [green][i]"${y}y${[highlight]month[$m][/highlight]}m${d}d11h59m59s59u"[/i][/green] [green][i]"${s[@]:i:32}"[/i][/green]
  [b]done[/b]
[b]done[/b] [teal]<[/teal] [green][i]'capinfos.txt'[/i][/green]

Feherke.

http://free.rootshell.be/~feherke/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Scripting help needed... 5

Cybex1

Technical User

feherke

Programmer

PHV

MIS

SamBones

Programmer

Cybex1

Technical User

Cybex1

Technical User

feherke

Programmer

Cybex1

Technical User

Cybex1

Technical User

Cybex1

Technical User

Cybex1

Technical User

Cybex1

Technical User

feherke

Programmer

Similar threads

Part and Inventory Search

Sponsor