Getting Text off of a Website

LaneM1234 · Jun 21, 2012

Perl Monks!

I am very new to Perl and am trying to create a script that will allow me to download my homework assignments off of my teacher's website for a specific day. He puts our HW on his website,

http://staweb.sta.cathedral.org/departments/math/mhansen/public_html/1112hcal/1112hcal.htm.

I would like to make a script that when given a date, finds the corresponding assignment and prints it in a blank text file. I am able to create all of the mechanics except for the copying the assignment part

I have been able to use LWP::Simple to find the text, but don't know how to make the script choose the corresponding assignment. Nor do I know how to print that into a blank text file. I don't think this is very complicated, but I'm really bad at Perl, so any/all help would be appriciated!

Annihilannic · Jul 3, 2012

Are you still stuck on this? What is your code so far?

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

MrCBofBCinTX · Jul 4, 2012

I looked at the web page's source.
This one looks like a real chore to pull out the sections with regex's.

Do not worry about getting it into a file until you get it to work. print "$blah"; will let you debug without having to peek inside your new file.

This page is "unique" in a sense, since it follows a strict pattern.
One (of many) ways might be to read the web page line by line.
If it matches <tr at the beginning, start to concatenate a variable ($cool .= $line) until a line matches </tr at beginning. Then push $cool into an array or just skip to next below.

Then you can pull out (with a regex) the date section and the HW section.

If date is correct, print that into your file. Done.

look at:
perldoc perlrequick
perldoc perlretut
perldoc perlfaq6
perldoc perlre
perldoc perlrebackslash
perldoc perlrecharclass
perldoc perlreref

and
perldoc -f open

Zhris · Jul 4, 2012

I would probably make use of HTML::TableExtract to break the html up before considering using other methods i.e. regexes to extract the specific elements. An alternative or combo would be to use HTML::TreeBuilder / HTML::Element which have html lookdown and address methods. From the supplied webpage I can immediately see common groups i.e. each dates container cell has a width of 10% and each descripions container cell has a width of 85% etc etc etc.

Chris

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Getting Text off of a Website

LaneM1234

Programmer

Annihilannic

MIS

MrCBofBCinTX

Technical User

Zhris

Programmer

Similar threads

Part and Inventory Search

Sponsor