Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Text parsing question 1

Status
Not open for further replies.

carg1

MIS
Aug 19, 2003
20
0
0
US
Well I'm convinced I'm using the split function wrong if my output is anything to go by. What I want to do is take lines formed exactly like the following (with the exception that there's multiple spaces between the columns in the real thing):

10.1.3.4 08/13/2003 14:18:30 PASSED 18.9.27.99 08/13/2003 16:08:44 PASSED 212.18.94.22 08/13/2003 17:29:06 PASSED
And have the script look at it as 5 different columns (or parts), so I could take the user IP addresses, and count the individual instances of them. As in if the IP address 10.1.3.4 appeared 12 times in the document, I want it to output that to the screen; if 18.9.27.99 appeared 4 times, tell me, etc. They're all grouped chronologically, so the loop would have to go through the document several times until it got every instance of the IP's. I know it's possible but I can't exactly put the picture together.

Code:
my $User;
my $Dd;
my $Tt;
my $Ap;
my $Dest;

print "Which file: ";
$TheDB = <STDIN>;
chomp($TheDB);

# Open the database file but quit if it doesn't exist
open(INDB, $TheDB) or die &quot;The database $TheDB could &quot; .
  &quot;not be found.\n&quot;;

  while(<INDB>) {
    $TheRec = $_;
    chomp($TheRec);
    ($User, $Dd, $Tt, $Ap, $Dest) = split(/\t/, $TheRec, 2);
      $SuccessCount++;
      print &quot;$User &quot;
  } # End of while(<INDB>)
  
  if($SuccessCount == 0) { print &quot;No records found.\n&quot; }
  else { print &quot;$SuccessCount records found.\n&quot; }

print &quot;Program finished.\n&quot;;

I'm definitely not sure how to get it to store all the instances in different variables. I'm probably overextending my abilities being a total newbie at this, but I'd appreciate any help or direction anyone could offer. Thanks!
 
save this messy text as activity.txt

10.1.3.4 08/13/2003 14:18:30 PASSED 18.9.27.99 08/13/2003 16:08:44 PASSED 212.18.94.22 08/13/2003 17:29:06 PASSED 18.9.27.99 08/13/2003 17:08:24 PASSED 18.9.27.99 08/13/2003 18:08:34 PASSED 18.9.27.99 08/13/2003 19:08:44 PASSED 212.18.94.22 08/13/2003 21:29:06 PASSED
save this script as activity.pl

open (ACTIVITY, &quot;<activity.txt&quot;);

while (<ACTIVITY>) {
chomp;
m/(\d+.\d+.\d+.\d+)\s+(\d+\/\d+\/\d+)\s+(\d+:\d+:\d+)\s+([A-Z]+)\s+(http:\/\/.*)/;
print &quot; I.P.: $1\n&quot;;
print &quot; date: $2\n&quot;;
print &quot; time: $3\n&quot;;
print &quot;status: $4\n&quot;;
print &quot; URL: $5\n\n&quot;;
}

this will break up the text with the regex

no counting involved... yet!

hope it's of help?

regards
Duncan
 
Why not try something like the following? :

my %IPs; #Added in this HASH
my $User;
my $Dd;
my $Tt;
my $Ap;
my $Dest;

print &quot;Which file: &quot;;
$TheDB = <STDIN>;
chomp($TheDB);

# Open the database file but quit if it doesn't exist
open(INDB, $TheDB) or die &quot;The database $TheDB could &quot; .
&quot;not be found.\n&quot;;

while(<INDB>) {
$TheRec = $_;
chomp($TheRec);
($User, $Dd, $Tt, $Ap, $Dest) = split(/\t/, $TheRec, 2);
$SuccessCount++;
if (exists $IPs{$User}) { #If the key exists, add 1
$IPs{$User} += 1;
} else { #Otherwise, basically initialize the count to 1
$IPs{$User} = 1;
}
print &quot;$User &quot;
} # End of while(<INDB>)

if($SuccessCount == 0) { print &quot;No records found.\n&quot; }
else {
print &quot;$SuccessCount records found.\n&quot;
foreach $key (sort keys %IPs) { #Print out each key
print (&quot;IP Address: &quot; . $key . &quot;\tNumber of instances: &quot; . $IPs{$key} . &quot;\n&quot;);
}
}

print &quot;Program finished.\n&quot;;

 
duncdude, it doesn't actually print anything to the screen but I can definitely study that and get some direction from it, thanks! Firemyst, it says there's a syntax error on the line &quot;
Code:
foreach $key (sort keys %IPs) {  #Print out each key
&quot; but I'm gonna look into it and see if I can make it work. Thanks for all your help guys, I appreciate it:)
 
oh, it was just a missing curly bracket
 
Carg:

The syntax error is because there's no ending semicolon &quot;;&quot; on the line before:

print &quot;$SuccessCount records found.\n&quot;

That should work for you now. :-]
 
if you saved the top section as activity.txt and the bottom section as activity.pl is will print data to the screen - i tested it. it works a treat!

Regards
Duncan
 
Actually I got it working with the curly bracket, either way, works like a charm, and you have my gratitude and a star, hehe:)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top