Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Processing html documents

Status
Not open for further replies.

danny2785

Programmer
Jun 26, 2006
16
US
I was wondering if anybody knew after putting an html document into an array, how does one go by editing the html line by line without having to write it to a text file. I tried using foreach $line(@array) but it didn't work. Thanks.


Danny
 
If you show us what you got, we might be able to help ;-)

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
I am trying to grab parts of an html document with specific tags but with the foreach method it seems to treat the whole document as one line.
 
foreach $line(@h_list)
{
my $request = new HTTP::Request GET => $line;
my $response = $robot->request( $request );
@html_source_code = $response->content . "\n";

}


foreach $line1(@html_source_code)
{
if($line1 =~ /<td>\d+\&nbsp\;\&nbsp\;\&nbsp\;<\/td><td>/)
{
push(@clean_list, $line1);
}
}

 
How did you put the doc into the array? Did you store it one line per array element? Or the whole doc in one array element?

If you stored it the whole doc with one array element I would say split it on the \n into another array that you can then work on each element of the new array.

If you stored it one line per elment then you should just run threw the array editing each element.
 
If you're looking for specific tags in your document, you should be using a proper tag-aware parser like HTML::TokeParser or (my favourite) HTML::TokeParser::Simple, rather than processing it yourself.
 
That's because the document IS one line...
One very long line.

 
With what you have already, try something like...

Code:
foreach $line(@h_list)
{
    my $request  = new HTTP::Request GET => $line;
    my $response = $robot->request( $request );
    $html_source_code = $response->content . "\n";

    while( $html_source_code =~ /(<td>\d+\&nbsp\;\&nbsp\;\&nbsp\;<\/td>)/g ) {
      push @clean_list, $1;
    }
}

although ishnid is definitely correct... Use somebody else's (debugged) hard work for your own benefit.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top