Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl Script

Status
Not open for further replies.

perlcoder

Programmer
Jul 1, 2001
16
0
0
US
I am in desperate need of a Perl script that would read from the text file containing a list of website URL's, then read the source code from each removing all the HTML tags. Do you have one handy by any chance, and if not, do you know where to get one? Greatly appreciate any help.
 
I have not run this, but it should be close. It is very simplistic. Using LWP::UserAgent and others gives more tools, but, maybe this does enough.
Code:
#!/usr/local/bin/perl
use LWP::Simple;

# read the input file
open(IPF, &quot;<your_input_file&quot;) or die &quot;Failed to open the input file, $!\n&quot;;
while ($line = <IPF>) 
    {
    # retrieve the page into the var, $content
    $content = get($line);
    
    # replace everything inside <>
    $content =~ s/<.*?>//gs;
    
    # I assume you would print it somewhere, to a file or other....
    }
close IPF;

HTH


keep the rudder amid ship and beware the odd typo
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top