Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I make hyperlinks out of urls in plain text?

Little tricks

How do I make hyperlinks out of urls in plain text?

by  icrf  Posted    (Edited  )
Taken and edited from this thread: thread219-751655
Code:
use strict;
use warnings;

#URI specification from RFC 2396 : http://www.ietf.org/rfc/rfc2396.txt
my $reserved = ';?:@&=+\\$,';            #removed / from spec
my $unreserved = q{\\-_!~*'()a-zA-Z0-9}; #removed . from spec
my $escaped = '%[a-fA-F0-9]{2}';

#a single allowed character or a complete escape sequence
my $allowed_all = qr{(?:[$unreserved$reserved./]|(?:$escaped))};
my $allowed_limited = qr/(?:[$unreserved$reserved]|(?:$escaped))/;

#optionally, could add gopher: mailto: news: telnet:
my $start = qr{(?:http:|www.|ftp:)};

#easy one is where it starts with something predictable from $start
my $easy = qr/$start$allowed_all+/;

#attempts to match domain. multiple times, then a 2 or 3 character top level domain
#examples: com edu uk jp
#there are some exotic tld's that this wouldn't catch, like .museum or .info,
#but they're pretty rare yet so I'll ignore them
my $hard = qr/(?:$allowed_limited+\.)+$allowed_limited{2,3}(?:\/$allowed_all*)?(?!$allowed_all)/;

#make a little test suite of link-like and not-link-like lines to run it on
$_ = <<'EOF';
It needs to recognize, links in the following patterns: 

subdomain.mydomain.com/page1.html 
http://subdomain.mydomain.com/page1.html
www.mydomain.com/page1.html
http://www.mydomain.com/page1.html
subdomain.mydomain.com

Any ideas?

But, certain other combinations are being filtering into hyperlinks as well, such as:

>.<
Q.Letter

Some combinations don't, like the following:
!.!
A.B

Any thoughts?
EOF

#find an easy one or a hard one
s/($easy|$hard)/<a href="$1">$1<\/a>/g;
print;

It wrapped in html tags there at the end, but you can make whatever wrapper you wish. There's a module on cpan called [link http://search.cpan.org/~cwest/HTML-FromText-2.05/lib/HTML/FromText.pm]HTML::FromText[/link] that does the most common cases, as well as a slew of other things. I'd probably suggest using it unless it doesn't pick up as much as you'd like (and this does).

For completeness, here's the [link http://forums.devshed.com/t65634/s.html]DS thread[/link] where it was originally created and discussed.

Feedback is welcome.
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top