Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sitemap Generator 3

Status
Not open for further replies.

audiopro

Programmer
Apr 1, 2004
3,165
GB
Can anyone recommend a sitemap generator script?
I know there is the Google 500 page one but I need to add some extra functionality and would rather not re-invent the wheel by writing an already existing front end.

Keith
 
Cheers 1MDF
Being a shared server, I have no control over the config.

It is only a certain website which locks up the mechanize routine. Die doesn't work for it as it is already in the middle of the mechanize routine when the error occurs. On all the other existing domains I have tried the correct pages are being listed. If a domain does not exist, mechanize still works, turns no pages as expected and hands control back to my main script.

I will have to drop some breakpoints into the mechanize script to see where it is failing. If all else fails, I will have to commit the cardinal sin and edit the module.

Keith
 
I will have to drop some breakpoints into the mechanize script to see where it is failing. If all else fails, I will have to commit the cardinal sin and edit the module.
ooohh, sounds like fun, don't forget to keep a backup copy, just in case ;-)

Perhaps there are some timeout settings you could set, i've only dabled with mechanize and toke-parser , not had a problem with the application I wrote, so not really looked into flags settings or config attributes.

Let us know how you get on.



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
The module locks up on the line highlighted in red but I do not understand what that line of code does.
Is SUPER::get in another module?

Code:
sub get {
    my $self = shift;
    my $uri = shift;
    $uri = $uri->url if ref($uri) eq '[URL unfurl="true"]WWW::Mechanize::Link';[/URL]
    $uri = $self->base
            ? URI->new_abs( $uri, $self->base )
            : URI->new( $uri );
    # It appears we are returning a super-class method,
    # but it in turn calls the request() method here in Mechanize

   [red]return $self->SUPER::get( $uri->as_string, @_ );[/red]
}

Keith
 
hmm, never heard of that 'SUPER::get'

is it a 'get' method in a 'SUPER' package space perhaps?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
is the Mechanize module installed on your shared hosting or are you doing the good old, stick it in your bin!

If so go and scout CPAN and copy each additonal module it uses.

Also the way it would work is, if it requires an additonal module say MIME::Lite

In the route of your cgi-bin, create a folder MIME , then in there you put the PM Lite.pm

each word before a double colon :: , evaluates to a folder.

Hope that makes sense.


"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
I made a list of all the dependancies and created this test file to see if there were any missing. I get 'start' and 'ok' with all the files 'used' except for Test::Warn, if I include that, I get nothing printed.
As a test, I changed the name of test::warn on the server and got a 'file not found' error as expected.

Just my luck to have a problem with a warning module, of all things.


Code:
#!/usr/bin/perl

print "Content-type: text/html\n\n";	# prepare for HTML output
use CGI::Carp qw(fatalsToBrowser);

print "start<br>";

	use [URL unfurl="true"]WWW::Mechanize;[/URL]
	use FindBin;
	use Pod::Usage;
	use HTTP::Status;
	use Compress::Zlib;
	use Net::FTP;
	use URI;
	use MIME::Base64;
	use Digest::MD5;
	use HTML::Parser;
	use Test::More;
	use XSLoader;
	use HTML::Tagset;
	use Getopt::Long;
	use ExtUtils::MakeMaker;
	use Test::Harness;
	use File::Spec;
	use HTTP::Server::Simple;
	use Array::Compare;
	use Sub::Uplevel;
	use Test::Exception;
	[red]use Test::Warn;[red]

print "end ok";

exit;

Where do I go from here?

Keith
 
hmm, find the module and put it in your bin ?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
It is already there, in a folder called Test.

I wonder if the full script is detecting an error and it is the warn module which is locking up?

I gate the same blank screen in both cases.

Keith
 
With my rusty recall of C++ object oriented programming I think "super" was a special keyword referring to the class that this object was inherited from... perhaps it has a similar meaning here.

I've long since decided that OO programming is not compatible with my brain. :)

Annihilannic.
 
I have no idea what it is.
Whilst trying to find the point at which the script falls over I put break points into the various modules. This was going very well until I encountered this one. I assumed that SUPER::get was just another module but it turns out not to be the case.

I was starting to think I was getting somewhere with all this but now I am forming the same opinion as you, Annihilannic, maybe I would need some more grey matter in order to fully understand these concepts.

By the lack of people coming into this thread, it would seem that there are many others in the same boat unless they are taught at Perl College that this part of scripting is magic and the knowledge should not be passed on to lesser mortals.

Keith
 
Don't know your situation, but sounds like your simplest solution may be to just run this script somewhere else - home computer? office computer?
Once you get this working well somewhere else, should be fairly easy to import working solution onto that server.

If you want to quickly know what perl modules are installed on shared server, there is a script called perldigger.cgi that will list all of them.
 
The script works ok in most cases. It is just 1 particular website which locks the whole process somewhere.
I need to find out which module is locking up as there are no apparent errors being generated.

I keep having a look at this, in between jobs and have got to this point. Ignore my breakpoint in red, the script locks on the green line.
I think the sub routine 'request' is being called again here so the next stage is to insert an increment counter unless someone can help out.

Code:
sub _make_request {
    my $self = shift;

[red]print "exit at _make_request @_<br>";
exit;
[/red]
    [green]$self->SUPER::request(@_);[/green]
}

Keith
 
well out of my league now i'm afraid Keith :-(

Hope one of the perl wizards pops in to help!

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
My latest experiments reveal that the lock up is occuring during a call to HTTP::Request.
HTTP::Request is called earlier in the application without a problem.
The problem I have is that the module is one on the shared server so I cannot insert breakpoints into it.

Is there a way to force a version of the module, in my domain, to be used?

Keith
 
yes, cheat, but depending on what other modules it uses it might screw up paths/dependencies.

Copy the module into a folder under your CGI-BIN.

use the format i explaind with your use command.


use MYMOD::HTTP::Request

so you would have a folder under the CGI-BIN of.

Code:
MYMOD-|
      |
      HTTP-|
           |
            Request.pm

does that make sense?

Though you were having problems with the findbin command, how did you get on with including your own modules in your CGI-BIN ?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Not really a problem with findbin, it just finds the modules without it.

I assume that when a script issues a 'use' command, it looks through a list of likely paths to find thje module.
If this is the case, what controls the search order?

The annoying thing about this whole thing is I have only found 1 website in which it locks up, for every other website it works ok (including tek-tips but I have limited the number of pages of course).

Keith
 
the reason it will force yours, is because you've accessed it via...

use MYMOD::HTTP::Request

not

use HTTP::Request

;-)

It could be something being returned by the site causing the problem if it seems to work fine with all the ohters.

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
I was a bit slow there, busy doing other jobs, sorry. I understand now, it has to use the MYMOD version because it would be the only version.

I must make careful notes of what I change though as the calls are from within the Mech module. I am determined to track down this problem as it may well come back to bite me.

Keith
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top