Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading HTML Pages with perl

Status
Not open for further replies.

jimberger

Programmer
Jul 5, 2001
222
0
0
GB
Hi all,

I am trying to read a HTML page into my perl script.

My code is as follows:
print "Content-type: text/html\n\n";
sysopen(STATUS, "$status_page", O_RDONLY)

$html = <STAUTS>;

print $html;

close (STATUS);

However this dosent work. Has anyone any ideas?

Many thanks

Jim bob
 
Hi Jim.

There are a couple of ways to do this, depending on what you are trying to do with reading the html. First, if you are just wanting to get at the data that someone has entered into fields on your html page, I'd suggest calling the function &readParse, which is located in CGI.pm. This pulls the data in from the html page, and puts it into a hash called %in. To reference information in the hash, you simply access it based on the field name on the html page. For example, if you had a field on your html page called first_name, to access it after performing &readParse, you would simply use

$in{first_name}

If you are trying to read the entire html page into your perl script, including all tags and whatever, open it as a sequential file (or database, if you prefer that term) and read it into your script that way.

Here is a section of code from a subroutine I wrote that does this (note this is not the complete subroutine, just the part to find the file, open it, move it to an array, and close it):

sub ReadSendHTML()
{

#$HTMLPageName contains the directory and file name
#where the HTML page we are processing resides.

#initialize variables
$x = 0;
$i = &quot;&quot;;
@indata = &quot;&quot;;

#verify that the file we want to read in exists
unless (-e $HTMLPageName)
{
print <<&quot;PrintTag&quot;;
<HTML><BODY BGCOLOR=&quot;#C0F1EF&quot;>
<H2>HTML file not present.</H2>
<P>Cannot retrieve the requested HTML
file. The file $HTMLPageName does not
exist. Please note this error
message and contact the help desk at xxx <P>Program HTMLIO.PM</P>
</BODY></HTML>
PrintTag
exit(0);
}


#open the HTML file - this file should already exist.
open (HTMLFILE, &quot;<$HTMLPageName&quot;) || die &quot;Cannot find HTML file.
Please note this message and contact the xx help
desk at (xxx)xxx-xxxxor 1-800-xxx-xxxx ext \#xxx.
Program readSendHTML in HTMLIO.PM&quot;;

#print &quot;after opening the html file. &quot;;

#store HTML file in array
@indata = <HTMLFILE>;

#close files
close (HTMLFILE);

after this, just process the array @indata in whatever fashion you chose.

Hope this helps.

Kathy
 
Hi kathy,

Thanks for your help on this. What I actually want is the script to goto the url e.g . This page produces a html page which i want to read the contents of into my script so i can then manipluate the data. any ideas?

cheers jim
 
Check out the LWP module. Try searching this forum for LWP or LWP::Simple.

HTH If you are new to Tek-Tips, please use descriptive titles, check the FAQs,
and beware the evil typo.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top