Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Text parsing question 1

Status
Not open for further replies.

carg1

MIS
Aug 19, 2003
20
US
Well I'm convinced I'm using the split function wrong if my output is anything to go by. What I want to do is take lines formed exactly like the following (with the exception that there's multiple spaces between the columns in the real thing):

10.1.3.4 08/13/2003 14:18:30 PASSED 18.9.27.99 08/13/2003 16:08:44 PASSED 212.18.94.22 08/13/2003 17:29:06 PASSED
And have the script look at it as 5 different columns (or parts), so I could take the user IP addresses, and count the individual instances of them. As in if the IP address 10.1.3.4 appeared 12 times in the document, I want it to output that to the screen; if 18.9.27.99 appeared 4 times, tell me, etc. They're all grouped chronologically, so the loop would have to go through the document several times until it got every instance of the IP's. I know it's possible but I can't exactly put the picture together.

Code:
my $User;
my $Dd;
my $Tt;
my $Ap;
my $Dest;

print "Which file: ";
$TheDB = <STDIN>;
chomp($TheDB);

# Open the database file but quit if it doesn't exist
open(INDB, $TheDB) or die &quot;The database $TheDB could &quot; .
  &quot;not be found.\n&quot;;

  while(<INDB>) {
    $TheRec = $_;
    chomp($TheRec);
    ($User, $Dd, $Tt, $Ap, $Dest) = split(/\t/, $TheRec, 2);
      $SuccessCount++;
      print &quot;$User &quot;
  } # End of while(<INDB>)
  
  if($SuccessCount == 0) { print &quot;No records found.\n&quot; }
  else { print &quot;$SuccessCount records found.\n&quot; }

print &quot;Program finished.\n&quot;;

I'm definitely not sure how to get it to store all the instances in different variables. I'm probably overextending my abilities being a total newbie at this, but I'd appreciate any help or direction anyone could offer. Thanks!
 
save this messy text as activity.txt

10.1.3.4 08/13/2003 14:18:30 PASSED 18.9.27.99 08/13/2003 16:08:44 PASSED 212.18.94.22 08/13/2003 17:29:06 PASSED 18.9.27.99 08/13/2003 17:08:24 PASSED 18.9.27.99 08/13/2003 18:08:34 PASSED 18.9.27.99 08/13/2003 19:08:44 PASSED 212.18.94.22 08/13/2003 21:29:06 PASSED
save this script as activity.pl

open (ACTIVITY, &quot;<activity.txt&quot;);

while (<ACTIVITY>) {
chomp;
m/(\d+.\d+.\d+.\d+)\s+(\d+\/\d+\/\d+)\s+(\d+:\d+:\d+)\s+([A-Z]+)\s+(http:\/\/.*)/;
print &quot; I.P.: $1\n&quot;;
print &quot; date: $2\n&quot;;
print &quot; time: $3\n&quot;;
print &quot;status: $4\n&quot;;
print &quot; URL: $5\n\n&quot;;
}

this will break up the text with the regex

no counting involved... yet!

hope it's of help?

regards
Duncan
 
Why not try something like the following? :

my %IPs; #Added in this HASH
my $User;
my $Dd;
my $Tt;
my $Ap;
my $Dest;

print &quot;Which file: &quot;;
$TheDB = <STDIN>;
chomp($TheDB);

# Open the database file but quit if it doesn't exist
open(INDB, $TheDB) or die &quot;The database $TheDB could &quot; .
&quot;not be found.\n&quot;;

while(<INDB>) {
$TheRec = $_;
chomp($TheRec);
($User, $Dd, $Tt, $Ap, $Dest) = split(/\t/, $TheRec, 2);
$SuccessCount++;
if (exists $IPs{$User}) { #If the key exists, add 1
$IPs{$User} += 1;
} else { #Otherwise, basically initialize the count to 1
$IPs{$User} = 1;
}
print &quot;$User &quot;
} # End of while(<INDB>)

if($SuccessCount == 0) { print &quot;No records found.\n&quot; }
else {
print &quot;$SuccessCount records found.\n&quot;
foreach $key (sort keys %IPs) { #Print out each key
print (&quot;IP Address: &quot; . $key . &quot;\tNumber of instances: &quot; . $IPs{$key} . &quot;\n&quot;);
}
}

print &quot;Program finished.\n&quot;;

 
duncdude, it doesn't actually print anything to the screen but I can definitely study that and get some direction from it, thanks! Firemyst, it says there's a syntax error on the line &quot;
Code:
foreach $key (sort keys %IPs) {  #Print out each key
&quot; but I'm gonna look into it and see if I can make it work. Thanks for all your help guys, I appreciate it:)
 
Carg:

The syntax error is because there's no ending semicolon &quot;;&quot; on the line before:

print &quot;$SuccessCount records found.\n&quot;

That should work for you now. :-]
 
if you saved the top section as activity.txt and the bottom section as activity.pl is will print data to the screen - i tested it. it works a treat!

Regards
Duncan
 
Actually I got it working with the curly bracket, either way, works like a charm, and you have my gratitude and a star, hehe:)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top