Reading HTML Pages with perl

jimberger · Oct 22, 2001

Hi all,

I am trying to read a HTML page into my perl script.

My code is as follows:
print "Content-type: text/html\n\n";
sysopen(STATUS, "$status_page", O_RDONLY)

$html = <STAUTS>;

print $html;

close (STATUS);

However this dosent work. Has anyone any ideas?

Many thanks

Jim bob

Kathy1 · Oct 22, 2001

Hi Jim.

There are a couple of ways to do this, depending on what you are trying to do with reading the html. First, if you are just wanting to get at the data that someone has entered into fields on your html page, I'd suggest calling the function &readParse, which is located in CGI.pm. This pulls the data in from the html page, and puts it into a hash called %in. To reference information in the hash, you simply access it based on the field name on the html page. For example, if you had a field on your html page called first_name, to access it after performing &readParse, you would simply use

$in{first_name}

If you are trying to read the entire html page into your perl script, including all tags and whatever, open it as a sequential file (or database, if you prefer that term) and read it into your script that way.

Here is a section of code from a subroutine I wrote that does this (note this is not the complete subroutine, just the part to find the file, open it, move it to an array, and close it):

sub ReadSendHTML()
{

#$HTMLPageName contains the directory and file name
#where the HTML page we are processing resides.

#initialize variables
$x = 0;
$i = "";
@indata = "";

#verify that the file we want to read in exists
unless (-e $HTMLPageName)
{
print <<"PrintTag";
<HTML><BODY BGCOLOR="#C0F1EF">
<H2>HTML file not present.</H2>
<P>Cannot retrieve the requested HTML
file. The file $HTMLPageName does not
exist. Please note this error
message and contact the help desk at xxx <P>Program HTMLIO.PM</P>
</BODY></HTML>
PrintTag
exit(0);
}

#open the HTML file - this file should already exist.
open (HTMLFILE, "<$HTMLPageName&quot

|| die "Cannot find HTML file.
Please note this message and contact the xx help
desk at (xxx)xxx-xxxxor 1-800-xxx-xxxx ext \#xxx.
Program readSendHTML in HTMLIO.PM";

#print "after opening the html file. ";

#store HTML file in array
@indata = <HTMLFILE>;

#close files
close (HTMLFILE);

after this, just process the array @indata in whatever fashion you chose.

Hope this helps.

Kathy

jimberger · Oct 22, 2001

Hi kathy,

Thanks for your help on this. What I actually want is the script to goto the url e.g

http://www.address/status

. This page produces a html page which i want to read the contents of into my script so i can then manipluate the data. any ideas?

cheers jim

goBoating · Oct 22, 2001

Check out the LWP module. Try searching this forum for LWP or LWP::Simple.

HTH If you are new to Tek-Tips, please use descriptive titles, check the FAQs,
and beware the evil typo.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Reading HTML Pages with perl

jimberger

Programmer

Kathy1

Programmer

jimberger

Programmer

goBoating

Programmer

Similar threads

Part and Inventory Search

Sponsor