Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

grep? regular expression?? 1

Status
Not open for further replies.

orangegal

Technical User
Oct 12, 2005
15
US
hey, guys, i am alittle confused about regular expression, will somebody please help me?? thanks a million! if i have the following sentence, how would i extract each part out??

**.**.**.** -> ***.**.**.*** ICMP TTL:127 TOS:0x0 ID:24623

the pattern and variables should be like this:

Variable1 -> Variable2 Variable3 TTL:Variable4 TOS:Variable5 ID:Variable6
 
What are the possible values of what you have masked by **.**.**.** ? Are these actual * you want to match, or will they be numbers, or letters, or both?
 
I bet TrojanWarBlade has a better solution, but here's what I got:
Code:
#!/usr/local/bin/perl -w
use strict;
my $line="12.34.56.78 -> 123.12.12.123 ICMP TTL:127 TOS:0x0 ID:24623";
if ($line=~/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+\-\>\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\S+)\s+TTL:(\d+)\s+TOS:(\S+
)\s+ID:(\d+)$/){
     print "$1,$2,$3,$4,$5,$6\n";
}
else{
    print "Could not extract values\n";
}

Now, this is assuming a few things:
That the numbers you have masked with ** will be between 1-3 digits long.
That the values after TTL and ID will be digits.
That there will not be any leading or trailing spaces.

Hopefully that is enough to start you off, and I'm sure others will be able to add to this too!
 
Brian,
You kill me! ;-)
I might take a look later if you like but first thoughts are to creating scalars of regexs with "qr" for the ip address (I assume that's what it is) and the integers to make it a little clearer.
Wanna have a go?


Trojan.
 
:) I had already done that, but I didn't know if that would make things more confusing, to embed a regex inside a regex.

Basically, orange, we're both assuming those are ip addresses. If they are, you can set up a regex just for ip addresses, like this:

Code:
my $ipaddr=qr/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/;

That way, since we are searching for ip addresses twice in the same string, we can use $ipaddr instead of the long regex above.

Like this:
Code:
#!/usr/local/bin/perl -w
use strict;
my $ipaddr=qr/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/;
my $line="12.34.56.78 -> 123.12.12.123 ICMP TTL:127 TOS:0x0 ID:24623";
if ($line=~/^\s*($ipaddr)\s+->\s+($ipaddr)\s+(\S+)\s+TTL:(\d+)\s+TOS:(\S+)\s+ID:(\d+)\s*$/){
     print "$1,$2,$3,$4,$5,$6\n";
}
else{
    print "Could not extract values\n";
}

(the \s+ mean one or more spaces)

Also, in this version, I added "\s*" at the front and end of the regex - that way if there are any leading or trailing spaces, this will still match.


 
Brian, i tried it, it worked!! thank you so much~ but i might have more Qs later, i am still a little confused about regular expressions
 
I just kind of "showed" you what to do, without explaining much.

The \d searches for a digit, and means the same thing as [0-9]. When I put \d{1,3} that means either 1,2, or 3 digits in a row.
So this regex:
Code:
qr/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
will look for 1,2, or 3 digits followed by a period, followed by 1,2, or 3 digits, followed by a period, etc.

I embedded that regex into the scalar $ipaddr - so anywhere you see $ipaddr in the main regex, just substitute what is in the $ipaddr scalar.

As I mentioned before, \s is a space, and the + plus sign means at least one or more of them. (The * star means zero or more).

The \S (capital s) means any "non-space" character. So when I put "\s+(\S+)\s+" I am looking for a group of non space characters in between one or more spaces.

The ( ) parenthesis are used to capture what was matched inside them, and later accessed with the $1,$2,$3... variables.

Hopefully that helps more?

 
not checked - but this might work:-

qr/(\d{1,3}\.)\1\1\d{1,3}/


Kind Regards
Duncan
 
Brian i need help again, i tried your way alone and several other ways, it all worked out, but after i put it in my program, it kept telling me "could not extract info", anyway, below is my program.

the program first extract the file into paragraphs, and then it split the paragraph into lines. then extract the the lines.

the actual file is shown below the program


==========================================================
$file = 'sampleAlertfile'; # Name the file
open FILE, $file; # Open the file

#################################################
#find each paragraph and save them into a file
#################################################
local $/ = "\n\n";
my $paragraph;
$indexOfParagraph = 0;
while ($paragraph = <FILE>) {

chomp $paragraph;

my @lines=split(/\n/,$paragraph);
#print $indexOfParagraph;
print $lines[3];


#########################################
#find alert type
#########################################
$lookfor=("[**]");
my $ipaddr=qr/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/;
if ($line[0]=~/^\s*($ipaddr)\s+->\s+($ipaddr)\s+(\S+)\s+TTL:(\d+)\s+TOS:(\S+)\s+ID:(\d+)\s+IpLen:(\d+)\s+DgmLen:(\d+)\s*$/){
print "$1,$2\n";
$source_IP[$indexOfParagraph]=$1;
$Destination_IP[$indexOfParagraph]=$2;
$protocol[$indexOfParagraph]=$3;
$TTL[$indexOfParagraph]=$4;
$TOS[$indexOfParagraph]=$5;
$ID[$indexOfParagraph]=$6;
$IPLen[$indexOfParagraph]=$7;
$DmgLen[$indexOfParagraph]=$8
}
else{
print "Could not extract values\n";
}

print @source_IP;

$indexOfParagraph++;
}#end of while loop
close(FILE);
===========================================================

[**] [1:483:5] ICMP PING CyberKit 2.2 Windows [**]
[Classification: Misc activity] [Priority: 3]
09/19-13:35:41.644975 0:3:6C:A8:44:0 -> 0:11:11:5A:F0:9C type:0x800 len:0x4A
111.11.111.111 -> 111.111.111.111 ICMP TTL:127 TOS:0x0 ID:24623 IpLen:20 DgmLen:60
Type:8 Code:0 ID:512 Seq:38080 ECHO
[Xref => ]

[**] [1:1411:10] SNMP public access udp [**]
[Classification: Attempted Information Leak] [Priority: 2]
09/19-13:37:16.503268 0:F:3D:1:2B:4A -> 0:4:0:5D:FD:35 type:0x800 len:0x78
111.111.111.111:61813 -> 111.111.111.111:161 UDP TTL:127 TOS:0x0 ID:9446 IpLen:20 DgmLen:106
Len: 78
[Xref => ][Xref => ][Xref => ][Xref => ][Xref => ][Xref => ]
 
never mind, i made it work, thank you so much for your kindness!!
 
Code:
[b]#!/usr/bin/perl[/b]

$line = "12.34.56.78 -> 123.12.12.123 ICMP TTL:127 TOS:0x0 ID:24623";

$line =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) -> (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) ([^ ]+) TTL:(\d+) TOS:([^ ]+) ID:(\d+)/;

print "$1\n$2\n$3\n$4\n$5\n$6";


Kind Regards
Duncan
 
thanks, Duncan, i might have some more questions very soon~~ :)
 
Looking at the sample data (specifically the second record), there is an optional port number on the end of the IP address that needs to be accounted for. The IP address regex is kind of a mess, but it works.

Here's one way:
Code:
local $/ = "\n\n";
my (@source_ip, @destination_ip, @protocol, @ttl, @tos, @id, @ip_len,@dgm_len);
my ($counter, $p)=(0,"");
my $ip_addr = qr/((?:\d+\.){3}\d+)(?::\d+)?/;
my $pattern = qr/$ip_addr -> $ip_addr (\w+) TTL:(\d+) TOS:(\w+) ID:(\d+) IpLen:(\d+) DgmLen:(\d+)/;

while (<DATA>) {
    parse_paragraph($_, $counter);
    $counter++;
}
    
sub parse_paragraph {
    my ($text, $index) = @_;
    if ($text =~ /$pattern/) {
        ($source_ip[$index], $destination_ip[$index]) = ($1, $2);
        ($protocol[$index], $ttl[$index], $tos[$index]) = ($3, $4, $5);
        ($id[$index], $ip_len[$index], $dgm_len[$index]) = ($6, $7, $8)
    } else {
        print "Paragraph $index didn't match\n";
    }
}
 
I just noticed that I declared $p and used $_ instead, so disregard that variable.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top