Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

File Manipulation/Parsing in Perl.

Status
Not open for further replies.

Archon

Programmer
May 1, 2001
17
0
0
US
Okay, I'm trying to pull a few values out of a flat txt file, and set them to variables.

Example:

blah blah blah blah weight 2kg blah blah blah

What would be the method/syntax to search the file and set $weight = (in this case) 2 when I don't know what the value will be, or where in the file it will be?

Any help would be greatly appreciated,
Thanks

Archon
 
What is the exact format of the test to be matched? It makes a difference in finding a stable Regular expression. Can you give us some indication of what the rest of the file represents or how the data will be represented. On what you have given me I would do this
#!/usr/bin/perl
$fullpathtothefile='c:/test/test.txt';

open(DFILE,&quot;< $fullpathtothefile\0&quot;) || die (print qq(&quot;Failed to open DFILE, $!&quot;));
while (read(DFILE, $newtext,1)){
$data1 .=$newtext;
}
close (DFILE);

$data1 =~ /(\bweight\b[\s]?)([0-9]+)/;
$weight=$2;

print &quot;Our new variable \$weight = $weight&quot; ;

I will break down the Regular Expression for you to help you use it over and over
(\bweight\b[\s]?)<--this matches the word preceiding the number you want to set it two in this case weight. YOu will have to be careful because this is a gready expression and will match more than once. So if you have more than one weight in the file it will match it. This match is stored in $1.

([0-9]+)<---will match any number after the previous match and stores this value in the variable $1.
 
err replace this $data1 =~ /(\bweight\b<\s>?)([0-9]+)/;

with $data1 =~ /(\bweight\b[\s]?)([0-9]+)/;

hope this helps
 
Well there is an error in the board postin apparently.

The <> that surround the \s should be brackets. It just will not let me post it that way.

It should look like this with no spaces:
[ \s ]
 
I tried the code, even toyed with it.. but I get no value. As if it's not finding the match, but I've been running it on a test file that I know has the match. Would it be easier if I read it into a string first?

An exact example of the text I'm searching through would be:

--- Garbage Line ---
and has weight 2kg with blah blah
--- Garbage Line ---

What I need to do is be able to search through this and pull 2 into a variable (naturally it won't be 2 every time.)
I need to do this with several variables, but they're all set up the same way.

Thanks,
Archon
 
open (FILE, &quot;<$file&quot;);
while (<FILE>){
$weight = $1 if ($_ =~ /\bweight\s+(\d+)\s+/);
}
close FILE;
 
Please advise how I would print results to the screen?
I can only get it to print the whole file contents instead of
printing out the variable and number 2.

[tt]

$db = 'tex.doc';
open(FILE, &quot;<$db&quot;) or die &quot;File does not open: $!&quot;;
#@data = (<DATA>);
#close (DATA);
#open(DATA, &quot;>$db&quot;) or die &quot;File not open: $!&quot;;

while (<FILE>)
{
$weight = $1 if ($_ =~ /\bweight\s+(\d+)\s+/);
print;
}
close FILE;[/tt]

I used file (tex.doc) as mentioned earlier:
[tt]
test test
nd has weight 2kg with blah blah
test[/tt]





 
$weight = $1 if ($_ =~ /\bweight\s+(\d+)\s+/);
print $weight;#<-make this change

PERL has to be told what to print or it will pull from the $_ variable for its source to print.

If this a CGI program you will neet to print the HTTP headers.

example
print &quot;Content type: text/html&quot;;
print &quot;This is the persons weight: $weight&quot;;

If you just are using the standard commandline output.

print $weight;

will do.



 
Still no luck I tried:[tt]
print $weight;
print $_;
print FILE;[/tt]

and none of the above printed out weight 2.

MOst of the print outs were of the whole file such as:
--- Garbage Line ---
and has weight 2kg with blah blah
--- Garbage Line ---

Any other suggestions??

 
Theres a problem with the regex:
instead of
($_ =~ /\bweight\s+(\d+)\s+/)
try
($_ =~ /\bweight\s+(\d+)/)
 
I got the output I wanted and was wondering it this is the best way to do it??

[tt]
$db = 'aaa.txt';
open(FILE, &quot;$db&quot;) or die &quot;File does not open: $!&quot;;


while(<FILE>)
{
if ($weight = (/\b(weight)(\s+)(\d+)/))
{
print $1 . $2 . $3;
}
}
close FILE;[/tt]
 
If you are sure that the number (eg 2) is followed by 'kg' (eg 2kg), i think you may use the regex :

/\bweight\s+(\d+)kg/

instead of :

/\bweight\s+(\d+)\s+/

cause there are no spaces after the number, isn't ? If both cases are possible, use :

/\bweight\s+(\d+)\s*kg/


What do you think ?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top