Processing html documents

danny2785 · Jun 26, 2006

I was wondering if anybody knew after putting an html document into an array, how does one go by editing the html line by line without having to write it to a text file. I tried using foreach $line(@array) but it didn't work. Thanks.

Danny

PaulTEG · Jun 26, 2006

If you show us what you got, we might be able to help ;-)

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

danny2785 · Jun 26, 2006

I am trying to grab parts of an html document with specific tags but with the foreach method it seems to treat the whole document as one line.

danny2785 · Jun 26, 2006

foreach $line(@h_list)
{
my $request = new HTTP::Request GET => $line;
my $response = $robot->request( $request );
@html_source_code = $response->content . "\n";

}

foreach $line1(@html_source_code)
{
if($line1 =~ /<td>\d+\&nbsp\;\&nbsp\;\&nbsp\;<\/td><td>/)
{
push(@clean_list, $line1);
}
}

mikedaruke · Jun 26, 2006

How did you put the doc into the array? Did you store it one line per array element? Or the whole doc in one array element?

If you stored it the whole doc with one array element I would say split it on the \n into another array that you can then work on each element of the new array.

If you stored it one line per elment then you should just run threw the array editing each element.

ishnid · Jun 26, 2006

If you're looking for specific tags in your document, you should be using a proper tag-aware parser like HTML::TokeParser or (my favourite) HTML::TokeParser::Simple, rather than processing it yourself.

brigmar · Jun 26, 2006

That's because the document IS one line...
One very long line.

brigmar · Jun 26, 2006

With what you have already, try something like...

Code:

foreach $line(@h_list)
{
    my $request  = new HTTP::Request GET => $line;
    my $response = $robot->request( $request );
    $html_source_code = $response->content . "\n";

    while( $html_source_code =~ /(<td>\d+\&nbsp\;\&nbsp\;\&nbsp\;<\/td>)/g ) {
      push @clean_list, $1;
    }
}

although ishnid is definitely correct... Use somebody else's (debugged) hard work for your own benefit.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Processing html documents

danny2785

Programmer

PaulTEG

Technical User

danny2785

Programmer

danny2785

Programmer

mikedaruke

Technical User

ishnid

Programmer

brigmar

Programmer

brigmar

Programmer

Similar threads

Part and Inventory Search

Sponsor