Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Handle White Space in File Path 5

Status
Not open for further replies.

ickypick

IS-IT--Management
Sep 1, 2006
9
0
0
US
Hi All,
I have a parser that reads in ftp logs and does an insert into a database. Everything parses, and inserts fine. However, on occasion, a user has created a file or file path with whitespace in the name, i.e. - "/userhome/my files/bigfile".
The parser picks up the whitespace between "my files" in this example and the variable $path is incomplete and all other variables shift over. Therefore the wrong data is inserted into the wrong table in the database. Does anyone have a suggestion on how to check for whitespace in the file path and account for it? Is there a way to ignore the white space for this place in the log file only? Any help is appreciated =)

Here is the snippet of code for reading the log file:


#while (<>) {
while (<FILE>) {
($dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $status) = split(/\s+/);

Thanks in advance.

Icky
 
you'll need to escape the space

$path=~ s/\s/%20/g;

where %20 is hex for chr(32), it also means you'll have to unescape when reading from the DB

A lot of this functionality is covered in the URI::Escape module

HTH

Simply ignoring spaces will lead to data mismatches

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Thanks Paul,
How would I stick this in my loop?

while (<FILE>) {
($dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $st
atus) = split(/\s+/);

Icky
 
A little more information:

A log entry may look like this -

Mon Oct 2 11:20:10 2006 23 192.168.0.1 87269520 /xyz2_00HR_ASD_70804_1_86GB_3ABC/1_SPACE IN_FILE_PATH/70804123401.45210221.865760.tar b _ o r janedoe ftp 0 * c
the next entry may be this -
Mon Oct 2 11:22:10 2006 23 192.168.0.3 47269520 /BB4/2_SPACES IN FILE_PATH/testfile.mp3 b _ o r mikesmith ftp 0 * c
 
OK, a few more entries if you could, I'm not sure there's enough to go on there, we could be looking at building a few different types of splits. Does the file always have a known extension? are all the rest of the fields always populated?

splitting on space, and move in so many, until we hit the filename, and then move to the end and work backwards, until we hit the filename, and all that's left in the middle should be joined with spaces to form the filename

HTH

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
The file extension could be anything, or perhaps no extension at all. All the fields are always populated. So there will be eight fields before the file name and nine after. Does that help?
Thanks,
Icky
 
yup, it's late here, and the code's messy, but you can tidy it up at your leisure ... ;-)

Code:
$str="Mon Oct  2 11:20:10 2006 23 192.168.0.1 87269520 /xyz2_00HR_ASD_70804_1_86GB_3ABC/1_SPACE IN_FILE_PATH/70804123401.45210221.865760.tar b _ o r janedoe ftp 0 * c";
#($dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $status) = split(/\s+/);
@data=split (/\s+/, $str);
$loop=1;
foreach (@data) {
  print "$loop => $_\n"; $loop++;
}
$len=$#data;
print $len."\n";
$dow     = $data[0];
$month   = $data[1];
#...
$size    = $data[7];
$ttype   = $data[-9];
$specialact = $data[-8];
$type    = $data[-7];
$mode    = $data[-6];
$uid     = $data[-5];
$service = $data[-4];
$authm   = $data[-3];
$authu   = $data[-2];
$status  = $data[-1];

if ($len == 17) {
   $path = $data[8];
} else {
   $offset=$len-17;
   print "offset :{$offset}\n";
   foreach $index (8..(8+$offset)) {
     print $index."\n";
     $path.="$data[$index] ";
   }
}
print "[$path]";
print "\nsize [$size]";

HTH

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
assuming the file path always starts with a forward slash, here is another (rather wacky) way:

Code:
my @data = ('') x 18;
while(my $line = <DATA>) {
   chomp($line);
   @data[0..7] = (split /\s+/, $line)[0..7];
   $line =~ s/^([^\/]+)//;
   my @temp = split /\s+/, $line;
   for my $i (reverse 9..17) {
      $data[$i] = pop @temp;
   }   
   $data[8] = join(' ',@temp);
   print "$_\n" for @data;
}	
	
__DATA__
Mon Oct  2 11:20:10 2006 23 192.168.0.1 87269520 /xyz2_00HR_ASD_70804_1_86GB_3ABC/1_SPACE IN_FILE_PATH/70804123401.45210221.865760.tar b _ o r janedoe ftp 0 * c
Mon Oct  2 11:22:10 2006 23 192.168.0.3 47269520 /BB4/2_SPACES IN FILE_PATH/testfile.mp3 b _ o r mikesmith ftp 0 * c
 
might be better written with the @data array inside the while loop:

Code:
while(my $line = <DATA>) {
   my @data = ('') x 18;
   chomp($line);
   @data[0..7] = (split /\s+/, $line)[0..7];
   $line =~ s/^([^\/]+)//;
   my @temp = split /\s+/, $line;
   for my $i (reverse 9..17) {
      $data[$i] = pop @temp;
   }   
   $data[8] = join(' ',@temp);
   #do useful stuff with @data
}

or if you already have all those variable names in your script you can assign them values from the @data array:

Code:
while(my $line = <DATA>) {
   my @data = ('') x 18;
   chomp($line);
   @data[0..7] = (split /\s+/, $line)[0..7];
   $line =~ s/^([^\/]+)//;
   my @temp = split /\s+/, $line;
   for my $i (reverse 9..17) {
      $data[$i] = pop @temp;
   }   
   $data[8] = join(' ',@temp);
   my ($dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $status) = @data;
   print qq~$dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $status\n~;
}

replace <DATA> with <FILE> to use with your existing script.
 
jeez, should have just done it like this to begin with:

Code:
while(my $line = <FILE>) {
   chomp($line);
   my @data = ();
   my @temp = split /\s+/, $line;
   for my $i (0..7) {
      $data[$i] = shift @temp;
   }
   $data[8] = '';
   for my $i (reverse 9..17) {
      $data[$i] = pop @temp;
   }   
   $data[8] = join(' ',@temp);
   #do useful stuff with @data
}

of course you may want to validate your data line by line before blindly running the data through the above code.
 
Was going to post a solution using shift and pop. Then I scrolled to the bottom...

Nice solution, Kevin, have a star.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Would this work, using splice instead of the loops?
Code:
my @data = splice(@temp, 0, 7), undef, splice(@temp, -8);
$data[8] = join(@temp);

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
pretty much what I was thinking, but too late to articulate, just a minor mod to the array concatenation, and increase the initial splice by 1
Code:
$str="Mon Oct  2 11:20:10 2006 23 192.168.0.1 87269520 /xyz2_00HR_ASD_70804_1_86GB_3ABC/1_SPACE IN_FILE_PATH/70804123401.45210221.865760.tar b _ o r janedoe ftp 0 * c";
#the business end
@temp=split (/\s+/, $str);
my @data = [COLOR=red]([/color]splice(@temp, 0, [COLOR=red]8[/color]), undef, splice(@temp, -8)[COLOR=red])[/color];
$data[8] = join(" ",@temp);
#print results
$loop=0;
foreach (@data) {
  print $loop."==> $data[$loop]\n";
  $loop++;
}
Code:
0==> Mon
1==> Oct
2==> 2
3==> 11:20:10
4==> 2006
5==> 23
6==> 192.168.0.1
7==> #
8==> 87269520 /xyz2_00HR_ASD_70804_1_86GB_3ABC/1_SPACE IN_FILE_PATH/70804123401.45210221.865760.tar b
9==> o
10==> r
11==> janedoe
12==> ftp
13==> 0
14==> *
15==> c

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
BTW point 7==># is the size and prints with the code posted, that's why the increase in number of elements for the splice

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 

What about splitting the data on spaces, then reconcatenating, with a space, consecutive indexes which contain slashes and/or an extension?
 
Paul,

the last splice should be increased by one:

Code:
my @data = (splice(@temp, 0, 8), undef, splice(@temp, [b]-9[/b]));


not sure what happened in your output example because there should be 18 elements (8 + file path + 9), your output only shows 16.
 
How would I put this splice into my while statement?
Code:
# OPEN LOG FILE DIRECTORY
# -----------------------
opendir(DIR, "$archive_dir/$dir") or die "Can't open $archive_dir/$dir: $!";

while (defined ($file = readdir DIR)) {

        next unless $file =~ /^ftp.log/;
        print "reading file $archive_dir/$dir/$file\n";
        open(FILE, "< $archive_dir/$dir/$file") || die "Cannot open $archive_dir/$dir/$file: $!";

# READ-IN LOG FILE LINES
# ----------------------
        while (<FILE>) {
                ($dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $status) = split(/\s+/);

# ASSIGN DATE DATA TO SINGLE VARIABLE
# -----------------------------------
$date = "$dow,$month,$day,$time,$year";

# ATTACH USERS TO MASTER ACCOUNTS
# -----------------------------
$appuid=$uid;
$account = $aliases{$uid};

# ASSIGN UNUSED DATA TO EXTRA VARIABLE FOR FUTURE USE
# ---------------------------------------------------
$extra = "$service,$mode,$authm,$authu,$specialact,$ttype,$status";

# CONVERT INPUT/OUTPUT CODES | DON'T USE ENTRIES WITH NULL SIZE
# --------------------------------------------------------------
                if ($type eq "o") {
                        $method = RETR;
                } elsif ($type eq "i") {
                        $method = STOR;
                }
                unless ($size eq "-" || $duration eq "-") {
                        
# INSERT THE DATA IN DB
# ---------------------
$sth->execute($date,$clientip,$duration,$size,$method,$appuid,$path,$account,$extra) or die "Couldn't execute statement: " . $sth->errstr;
                }
        }

        close(FILE);

}


closedir(DIR);

Thanks for your help!

Icky
 
change this part:

Code:
        while (<FILE>) {
                ($dow, $month, $day, $time, $year, $duration, $clientip, $size, $path, $ttype, $specialact, $type, $mode, $uid, $service, $authm, $authu, $status) = split(/\s+/);

to:

Code:
        while (<FILE>) {
                chomp;
                my @temp = split(/\s+/);
                my @data = (splice(@temp, 0, 7), undef, splice(@temp, -9));
                $data[8] = join(' ',@temp);
                my ($dow, $month, $day, $time, $year, $duration,
                    $clientip, $size, $path, $ttype, $specialact, $type,
                    $mode, $uid, $service, $authm, $authu, $status) = @data;

and give it a try.
 
Hi Kevin,

Thanks for all the help BTW. :)

For some reason the $size variable is not getting populated. I get an error saying :

"Use of uninitialized value in string eq at ./test.pl line 109, <FILE> line 1"

This is line 109 :
unless ($size eq "-" || $duration eq "-") {

$duration sems to be fine as I see the entries in the duration column in the DB

It makes the entries in the DB anyways and those indicate that the file $size and the $path are being entered in the path column of the DB as:

"13216664 /ASDF_04_PT1_70804_1_86GB_3/FILE PATH/70804DB1A02.45210D1A.3400B0.zip"

and the duration column is blank.

If I change $data[8] = join(' ',@temp); to
$data[7] = join(' ',@temp);

I get a bigint error as it tries to insert "13216664 /ASDF_04_PT1_70804_1_86GB_3/FILE PATH/70804DB1A02.45210D1A.3400B0.zip" into the duration column in the database.

Any ideas why duration and path are not being parsed into two variables?

Thanks,
Icky
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top