Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pattern Matching

Status
Not open for further replies.

pedros007

Programmer
Nov 14, 2004
8
0
0
GB
Hi,

I am trying to extract a list of URl's from a website, and then store them in a file. I have written a regular expression to extract all of the a href links from the site and store them in a file, but I would like to format them better.

Current format when they are extracted:

E.g.
<a href=" UK</a>

What formatting I would like to do:
"Karting UK", "
Is this possible? If so do you what is the best way to do it? I have a couple of the SAMS guide to Perl books, but they don't really talk about formatting.

Thank you

Pete
 
I hate regexes, they give my nightmares and people always correct me on them. But here I go! This is untested.

Code:
$string =~ m/<a href="([^"]+)">([^<\/a>]+)<\/a>/gi;
 
I think this does what you want.
Code:
#!perl
use strict;
use warnings;

while (<DATA>) {
    chomp;
    my ($addr, $co) = m|//([^"]+)">(.*?)<|;
    print qq("$co", "$addr"\n);
}

__DATA__
<a href="[URL unfurl="true"]http://www.karting.co.uk">Karting[/URL] UK</a>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top