Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How can I locate URIs and prepend specific text all URIs on a page?

Status
Not open for further replies.

diodotus

Programmer
Sep 30, 2002
23
0
0
US
I am writing a web-cache program which I hope to make "plug-and-play" for webmasters, so as to make it easy for people with a modicum of web knowledge confound the Great Firewall of China (and other attempts to limit information).

The script can presently process and view any file served via http and send it on the the viewer. See
However, I also want the script to find and re-write URIs within html documents so that those URIs will be linked to and loaded through the selfsame program.

The current source of the script is at
The place where I want the script to alter links and URIs is near the botton (you'll see a lengthy comment there).

Help is much appreciated!
 
hi,
I saw this program in perl cookbook once.. If I've understood correctly, what you want to do, then this should do the trick:

Code:
#!/usr/bin/perl
# urlify - wrap HTML links around URL-like constructs

$urls = '(http|telnet|gopher|file|wais|ftp)';
$ltrs = '\w';
$gunk = '/#~:.?+=&%@!\-';
$punc = '.:?\-';
$any  = "${ltrs}${gunk}${punc}";
$cgi = '[URL unfurl="true"]http://www.unseelie.org/cgi-bin/china_buster.cgi?url=';[/URL]

while (<>) {
    s{
      \b                    # start at word boundary
      (                     # begin $1  {
       $urls     :          # need resource and a colon
       [$any] +?            # followed by on or more
                            #  of any valid character, but
                            #  be conservative and take only
                            #  what you need to....
      )                     # end   $1  }
      (?=                   # look-ahead non-consumptive assertion
       [$punc]*             # either 0 or more punctuation
       [^$any]              #   followed by a non-url char
       |                    # or else
       $                    #   then end of the string
      )
     }{$cgi$1}igox;
    print;
}
----
san.
 
This works beautifully on absolute URLs once hacked up a bit! Thanks!

Now, I have to figure out how to find all relative URLs into absolute URLs. But I have some ideas where to check for that.

:)
 
BTW, thanks for your assistance.

After hacking away, and getting this script doing most of what I wanted, I found another script out there which already does everything on my TODO list and would be easier for people to place on their web servers plug-and-play! So now, instead of continuing to write a script I am promoting this script to webmasters.

Since you expressed some interest, here's the URL for the better script:

-- Scott David Gray
reply-to: sgray@sudval.org
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top