Help Perl CGI: website navigation.cgi

Kachoo · Apr 27, 2008

Ok guys, I don't know whats going on. Something I figured would be a simple cgi script has become a nightmare that I can't figure out without using an extensive "if" process. Here is what I am trying to do. I am by far not an expert in perl coming from a vb background so my code is probably lacking and extreamly ugly so beware!

1.) My program needs to interpret 3 portions of a url request to append to various functions (seems simple.)

GENERAL REQUEST SYNTAX EXAMPLES
a.) Example: External import request - external site
href =

http://www.iterm.mobi/cgi-bin/nav.cgi?http://www.msn.com/fraud/update.htm

b.) Example: External import request local site
href =

http://www.iterm.mobi/cgi-bin/nav.cgi?inform/victim.htm

b.) Example Internal import request local site
href =

http://www.iterm.mobi/cgi-bin/nav.cgi?inform/victim.htm

href =

http://www.iterm.mobi/cgi-bin/nav.cgi?inform/victim

TO BE INTERPRETED
a. $root The http root path IE:

http://www.iterm.mobi

(Default if not present in request)

b. $path The extended directory path IE: /inform (

c. $file The file name IE: /victim.htm

THE GOLDEN RULES
a.) $root . $path . $file should make a valid formatted url in the event $path or $file is not present in the request

b.) URL's are case sensitive so should be stored in the case received.

PROBLEMS
$file should be null if a . is not present and should never equal the domain name itself.

if $file does not contain a . then it should remain null with contents appended to $path

I hope this is clear, I tried my best to explain my issue. If you want me to make a long story short here is exactly what I am tring to do.

I need to parse a url request to

http://www.iterm.mobi/cgi-bin/nav.cgi

into portions

http://|domain|/directory|/file|

any help would be greatly appreciated!

#!/usr/bin/perl -wT

use LWP::Simple;

use CGI;
my $cgi = new CGI;

#Default if not provided
my $root = '

http://www.iterm.mobi';

my $path = '';
my $count = 0;
my @nav = $cgi->param();
$nav = $cgi->param($nav[0]);

$path = '/' .$nav . '/';

#Remove all the //'s
while (rindex($path, '//') != -1){
my $test = rindex($path,'//');
if ($test != -1){
$path = substr($path,0,$test) . '/' . substr($path,$test + 2);
}
}

#remove the trailing / if applicable
$test = length($path) - 1;
if (substr($path, $test) == '/'){
$path = substr($path,0, $test);
}

#Process a root from $path
$test = rindex(lc($path), 'http:/');

if ($test != -1){
$test = $test + 5;
$root = substr($path,$test);
@test = split('/', $root);
$root = "

http://$test

[1]";
}

print "Content-Type: text/html\n\n";

print "Root: $root\n\n Dir: $path";

exit (0);

Kachoo · Apr 27, 2008

ok, I got it, I started over for the 100th time after finding a sweet reverse() function and this does exactly what I wanted it to do.

Like I said, Im not good at perl its a new language for me meaning I've been using it for about a week now so if anyone can improve on this please let me know but it operates exactly the way I want it to with no hitches!

#!/usr/bin/perl -wT

use LWP::Simple;

use CGI;
my $cgi = new CGI;

#Default if not provided
my $domain = '

http://www.iterm.mobi';

my $directory = '/';
my $file = '';
my $path = '';
my $tmp = '';

my @nav = $cgi->param();
$nav = $cgi->param($nav[0]);

my @path = split('/',$nav);

#Snag the end array and remove it from list
if (@path > 3){
@path = reverse(@path);
$file = shift(@path);
@path = reverse(@path);
}

#Snag the domain from array and remove it from list
if (@path){
$tmp = rindex(lc($path[0]),'http:');

if ($tmp > -1){
$domain = shift(@path) . '//' . shift(@path) . shift(@path);
}
}

$directory = join('/',@path);

if ($directory){
$directory = '/' . $directory;
}

if ($file){
$tmp = rindex($file, '.');
if($tmp != -1){
$file = '/' . $file;
}else{
$directory = $directory . '/' . $file;
$file = '';
}
}
# End of URL Processing

print "Content-Type: text/html\n\n";

print "Domain: $domain Directory: $directory File: $file\n";

print " - URL = $domain$directory$file\n";

exit(0);

travs69 · Apr 28, 2008

You do know you can "pop" an array instead of "shift"ing it? If they only reason you are using reverse is to get/remove the last element your should just use pop.

http://perldoc.perl.org/functions/pop.html

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]

Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;

Kachoo · Apr 28, 2008

Nice, I didnt know about that function.

Will save some processing time, great catch!

Kachoo · Apr 28, 2008

Ok, here is my script finalized the way I need it, if anyone sees an error or something that can make it better please let me know.

#!/usr/bin/perl -wT

use LWP::Simple;

use CGI;
my $cgi = new CGI;

#Default if not provided
my $localdomain = '

http://www.iterm.mobi';

my $domain = $localdomain;
my $directory = '';
my $file = '';
my $path = '';
my $tmp = '';

my @nav = $cgi->param();
$nav = $cgi->param($nav[0]);

#Remove the initial "/" if present for processing so it doesn't double
$tmp = substr($nav, 0, 1);
$tmp = rindex($tmp, '/');
if ($tmp != -1){
$nav = substr($nav,1);
}

my @path = split('/',$nav);

#Snag the domain from array and remove it from list
if (@path){
$tmp = rindex(lc($path[0]),'http:');

if ($tmp > -1){
$domain = shift(@path) . '//' . shift(@path) . shift(@path);
}
}

#Get last element of path
$file = pop(@path);

#Set mid elements as directory
$directory = join('/',@path);

if ($directory){
$directory = '/' . $directory;
}

#If $file does not contain a "." then append $file to
#$directory and clear $file
if ($file){
$tmp = rindex($file, '.');
if($tmp != -1){
$file = '/' . $file;
}else{
$directory = $directory . '/' . $file;
$file = '';
}
}

# --------Print test results-------------

print "Content-Type: text/html\n\n";

print "Domain: $domain Directory: $directory File: $file\n";

print " - URL = $domain$directory$file\n";
# -------End of Testing -------------

exit(0);

# --- Questions/Comments Kachoo@live.com ----
# --------End of URL Processing-------------

prex1 · Apr 29, 2008

There are various issues you should resolve in your last code.
One is that the so called default [tt]$localdomain[/tt] is never used.
Another one is the the slashes in the search part of the url (after the ?) might be encoded and you wouldn't catch them.
Also, are you sure that your url's will always start with [tt]http://[/tt] ? (it might be implied or defaulted)
There are other special conditions you should consider (unless you positively know they'll never occur): an example is that, in your first url example above, if the url would be terminating at [tt]

http://www.msn.com

[/tt] , you would catch this one as [tt] $file [/tt].
Assuming that, as you state, your [tt] $domain [/tt] is all the time [tt]

http://www.iterm.mobi

[/tt], that all url's start with http...(unless domain not present), you can do everything in one (!) line using regexps. It won't be necessarily faster, but, as the code is much shorter, it is easier to maintain (if you know regexps, of course...).

Code:

my$domain='[URL unfurl="true"]http://www.iterm.mobi';[/URL]
  #unencode
$nav=~s/\+/ /g;
$nav=~s/%(..)/pack("c",hex($1))/ge;
  #add domain and slash if not present
unless(index($nav,$domain)==0){
  $nav='/'.$nav unless ord($nav)==47;
  $nav=$domain.$nav;
}
  #search for $directory and $file
$nav=~/^$domain(.*?)(\/[\-\w]+\.[\-\w]+){0,1}$/;
$directory=$1;
$file=$2;

Note that the code above allows only for letters, numbers, underscores and dashes (and a single dot) in the filename. If you need multiple dots use [tt]$nav=~/^$domain(.*?)(\/[\-\w\.]+){0,1}$/;[/tt]

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

Kachoo · Apr 29, 2008

Right you are! I think I bit off a little more then I can chew on this project not being native to perl but its been a really good learning exp.

This code segment is actually a vital portion of a project I am building. The project basically acts like server side frames. I have a 4 sided html table (template) with a blank center parsed into files with my html information. header.htm (header info), left.htm, top.htm, bottom.htm and right.htm

In the center or body of the table is a responce from get(the requested url) that has to be in the

http://domain/directory/file

format.

The navigation.cgi accepts any string and processes it as a url via the get command. If http: is not present then the request is considered local and $domain never changes. if http: is present then it uses the domain after http://

my updated code partially works untill I get a request with elements like you said ie:

http://domain/directory/file?element1=value&element2=value

The reason I have to parse the url request into portions is because the href and src values returned from the get command I use in the body or center of my html template will use

http://www.iterm.mobi/cgi-bin/

as the working directory when the true working directory is

http://www.domain.com/directory

(if applicable)

in other words if I pull a url into my website that is considered external to my domain identified by src= and href= tags without an http: I fill in those values in the output html document so they remain functional.

In the href tags I place

http://www.iterm.mobi/cgi-bin/nav.cgi?http://www.domain.com/the

rest of what href=

this way I can keep these href's inside my website template

the srcs are different the src='s have to remain the same if http: is present in them if not I append the

http://www.domain.com/working

directory/ url to them so the links dont break

I also have to merge the headers of the website with my header.htm file so everything remains a working website. Its a lot more complicated then I expected not knowing the language but I am getting there.

Kachoo · Apr 30, 2008

Hey prex1,

thanks for your input! Your code worked like a dream untill I realized I would be dealing with elements in a url too. I have been racking my brain trying to learn regex.

Would it be too much if I asked you to finnish this code for me? I been plugging away for hours at this getting there one step at a time.

The rules for $domain is anything between http:// and the first /

The rules for $directory is is anything after $domain all the way up to the next / that contains a ".", "&", "?" or any invalid directory charater after it.

The rules for $file is simple, its anything after $domain.$directory since im only looking for the working directory.

#----------- Start Code -------------------

#!/usr/bin/perl -wT

use LWP::Simple;

use CGI;
my $cgi = new CGI;

#Default if not provided

my $localdomain = '

http://www.iterm.mobi';

my $domain = $localdomain;
my $directory = '';
my $file = '';
my $url = '';
my $tmp = '';

#parse URL request
my @nav = $cgi->param();
foreach $nav (@nav){

$tmp = rindex($nav,'keywords=');
if ($tmp != -1){
$url = $url . $nav . '=' . $cgi->param($nav) . '&';
} else {
$url = $url . $cgi->param($nav);
}
}

#unencode $url
$url =~ s/\+/ /g;
$url =~ s/%(..)/pack("c",hex($1))/ge;

#Append local domain if http:// is not present
if (($url =~ m/^http:\/\//i) == 0){
$domain = $localdomain;
$url='/'.$url unless ord($url)==47;
$url = $domain . $url;
$url=~/^$domain(.*?)(\/[\-\w]+\.[\-\w]+){0,1}$/;
$directory=$1; #After $domain upto a / followed by invalid file chars
$file=$2; #Everything after $domain.$directory
} else {
#To do
#Extracts domain from $url since http:// was found

}

# --------End of URL Processing-------------
# --------Print Test Output ----------------

print "Content-Type: text/html\n\n";

print "Domain: $domain Directory: $directory File: $file\n";

print " - URL = $domain$directory$file\n";

exit (0);
#-------------End Code -----------------

Thanks in Advance,

Ken

This code is closeYou can check the output of this at these 2 links

http://www.iterm.mobi/cgi-bin/old.cgi?/boards/file.htm

http://www.iterm.mobi/cgi-bin/old.cgi?/boards/file.htm?bar=data

Kachoo · Apr 30, 2008

heh after reading this I know it sounds like an obscene request but my coding platform is notepad and my testing platform is via ftp to my remote domain.

Im not sure if everyone codes like this but having 5 browsers for regex references, a ftp client, notepad, and windows explorer open is a major drag.

Kachoo · May 1, 2008

Well, never got a responce. I learned alot of regex and this code works after many many hours of learning. It plays out clean on everything I throw it. Parses the request perfectly.

Code:

#!/usr/bin/perl -wT

#use strict;

use LWP::Simple;

use CGI;
my $cgi = new CGI;

#Default if not provided

my $localdomain = '[URL unfurl="true"]http://www.iterm.mobi';[/URL]
my $domain = $localdomain;
my $directory = '';
my $file = '';
my $url = '';
my $tmp = '';



#parse Actual URL request
my @nav = $cgi->param();
foreach $nav (@nav){
	$url = $url . $nav . '=' . $cgi->param($nav) . '&';
}

#Remove additional build elements
$url=~ s/^(loc=)//i;
$url=~ s/(&)$//i;

#unencode $url
$url =~ s/\+/ /g;
$url =~ s/%(..)/pack("c",hex($1))/ge;

#Append local domain if http:// is not present
$tmp = '[URL unfurl="true"]http://';[/URL]
if (($url =~ m/^$tmp/i) == 0){
	$domain = $localdomain;
	$url='/'.$url unless ord($url)==47;
	$url = $domain . $url;

	$url=~/^$domain(.*\/)(.*){0,1}$/;
	$directory = $1;
	$file = $2;

}  else {
#Extract domain from $url since http:// was found
$tmp = '[URL unfurl="true"]http://';[/URL]
	$url=~/^$tmp([^\/]+)(.*)$/;
	$domain = $tmp . $1;
	if (ord($2)==47){
		$url=~/^$domain(.*\/)(.*){0,1}$/;
		$directory = $1;
		$file = $2;
	}
}

$directory =~ s/\/$//i;

$tmp = index($file,'.');
if ($tmp != -1){
	$file = '/' . $file;
} elsif($file) {
	$directory = '/' . $file;
}
# --------End of URL Processing-------------

# --------Print Test Output ----------------
print "Content-Type: text/html\n\n";
print '<html>';
print "Domain: $domain Directory: $directory File: $file\n";
print '<br>';
print "Segmented URL = $domain$directory$file\n";
print '<br>';
print "Original  URL = $url\n\n";
print '<html>';

exit (0);

Test out the results here:

http://www.iterm.mobi/cgi-bin/old.cgi?loc=http://www.msn.com/regex/is.a?B=otch&(:=)

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help Perl CGI: website navigation.cgi

Kachoo

Programmer

Kachoo

Programmer

travs69

MIS

Kachoo

Programmer

Kachoo

Programmer

prex1

Programmer

Kachoo

Programmer

Kachoo

Programmer

Kachoo

Programmer

Kachoo

Programmer

Similar threads

Part and Inventory Search

Sponsor