Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Broken Link Check

Status
Not open for further replies.

DaveyRichards

Programmer
Nov 8, 2006
14
GB
Hey all!

I was wondering how to get my PHP code to check for broken links when fed a URL. I can get my code to spider my website for links but am unsure how to separate the broken links from the long list of links that I am currently extracting.

Thanks for any help.
 
If I were writing something like that, I would likely take the URLs grabbed and try to connect to each one with either cURL or via sockets. I would want a way that would allow me to know the HTTP response header from the foreign servers.



Want the best answers? Ask the best questions! TANSTAAFL!
 
needs to be installed that does it not?

what if installing a new library wasn't an option?
 
Hmmmm ok. At the moment I have a line of code like this:

$sock = fsockopen(" 80, $errno, $errstr, 30);

when calling !$sock I only avoid an error when the URL doesn't point at a specific page. i.e. is fine but returns an error, even if it exists.

How do I adapt this code into a script that will check for broken links?
 
My previous post has a typo. It should have provided a link to the PHP online manual page on fsockopen(). Here it is again: link That page has example code on it.

You might want to take a look at section 2.6 of my FAQ in this forum to see how to fetch an HTML page. See "Debugging PHP": faq434-2999



Want the best answers? Ask the best questions! TANSTAAFL!
 
Hmmmm still having problems. Is there not some way that I can return a header and if the page does not exist I receive a 404 type header?
 
Sure. Issue a HEAD request.

Instead of issuing a GET request like:

[tt]GET / HTTP/1.1
Host: [ignore]www.mit.edu[/ignore][/tt]

which will return both the HTTP headers and the content of the page, issue the request:

[tt][red]HEAD[/red] / HTTP/1.1
Host: [ignore]www.mit.edu[/ignore][/tt]

Which will return only the HTTP headers, not the content itself. It's faster, less bandwidth, and if I recall correctly uses the HTTP spec in a way it was intended to be used for this purpose.



Want the best answers? Ask the best questions! TANSTAAFL!
 
I am managing to display the headers for a specific webpage but when I try looping through pages to display their headers I get:

SSL operation failed with code 1. OpenSSL Error messages:
error:0306B067:bignum routines:BN_div:div by zero

PHP Warning: get_headers(): Failed to enable crypto

Segmentation fault
 
have you enabled the php_openssl.dll extension in your php.ini? (and restarted your web server)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top