perl html syntax checker

welshspoon · Mar 9, 2004

Please could somebody help me create some code in order to check another html page's syntax. I'm not very good at perl so i don't know where to start.

please help

PaulTEG · Mar 10, 2004

What exactly are you looking to do?

If its a one off look for an editor which will colour code your syntax which will make it easier to parse manually

On search.cpan.org

See HTML::TreeBuilder, HTML:

arser, HTML::SimpleParser

HTH
--Paul

welshspoon · Mar 10, 2004

I have a text box in a form in which I want to enter a URL, the CGI script then checks this URL's HTML for syntax errors, and returns any it finds.

thank you for your time

duncdude · Mar 10, 2004

surely this is one hell of a task? while it's very easy to locate the page and return the HTML it is another thing entirely to evaluate if the code is syntactically correct... or am I talking !@£$%

Kind Regards
Duncan

duncdude · Mar 10, 2004

check this out...

[red]

http://watson.addy.com/

[/red]

Kind Regards
Duncan

PaulTEG · Mar 10, 2004

How are you going to check inline Javascript?
--Paul

icrf · Mar 10, 2004

Aren't there already several sites and tools that do (x)html validation? I've seen links on pages that say "this site is compliant" and it goes to some verification tool. If that's all your goal is, then it's already been done and would be wasteful to do again.

A quick google turns up

http://validator.w3c.org

________________________________________
Andrew - Perl Monkey

Phalanx1 · Mar 10, 2004

a little cut out from my COM component

Code:

my $File_With_Ext = $buf;
chomp $File_With_Ext;
print $s "$File_With_Ext almost done.\r\n";
open CWork, ">>C:/docume~1/Administrator.PHANTOMX/Desktop/$File_With_Ext" or die "Can't Open a FileHandle: $!";
print CWork '<html><head><title>Test Environment</title></head><body bgcolor="black" text="ffffff"><h1>For PhantomX Developers.</h1><input type="button" onclick="VBExec()" value="Execute VBScript"><textarea id="EVBS"></textarea><br><input type="button" onclick="PSEval" value="Eval PerlScript"><textarea id="EPS"></textarea><br><input type="button" onclick="JSEval()" value="Eval JScript"><textarea id="EJS"></textarea></body><script language="VBScript">Function VBExec() : Execute(document.getElementById("EVBS").innerText) : End Function </script><script language="PerlScript">sub PSEval {eval($window->document->getElementById("EPS")->innerText);}</script><script language="JavaScript">function JSEval() {eval(document.getElementById("EJS").innerText);}</script></html>' or die "can't write to filehandle: $!";
use HTML::TreeBuilder;
  my $tree = HTML::TreeBuilder->new();
  $tree->parse_file(CWork);
  $html = $tree->as_HTML or die "Can't Parse: $!";
    print $s "checking ...$html..( File Done. ) \r\n";
  $tree->delete;
  close CWork;
}

hope it helps.
type CWORK.htm or .hta or .mht to build the web page with added extension.
definanatly not on a unix system though, sorry.

PaulTEG · Mar 10, 2004

Welshspoon, #this six nations'll never catch on

According to watson.addy.com a

html page containing fully functioning Javascript(inline) fails almost each line of code, but that's because it was built to check html. (PS. it should have picked it up as a comment, unless the engine has become more/less forgiving than it used to be ...)

Can you be more specific about your requirements?

--Paul

welshspoon · Mar 11, 2004

Thank you all for the hard work and effort so far, I'll hand out stars at the end of the thread...

One last thing - isnt there a simple way of just getting the Perl Script to look at the HTML, and find things wrong such as an unclosed tag?

Maybe counting the opened and closed tags, but that doesnt take into account the horizontal rule and images and so forth.

To cut this short - can I have a script that counts closed and opened HTML brackets, but ignores <IMG SRC =""> and other ones that dont get closed?

Therefore at the end of it, if there is a difference in the 2 numbers, then there is something wrong with the coding?

philote · Mar 11, 2004

You could probably use HTML::TokeParser or something similar so you don't have to re-invent the wheel.

http://search.cpan.org/~gaas/HTML-Parser-3.35/lib/HTML/TokeParser.pm

For example, do a loop that gets a token until no more tokens are found. You'll have a hash that holds html tags as the key and a number for the value. If they type of token is "S", and the tag is one that requires an end tag, put that token's name in the hash as a key with a value of 1 if it's not already there. If it's already there, increment the value for that key by one. For each end tag (type "E"), do the same, but decrement the value for that key's value. After that loop's done, loop through the hash and make sure each value is 0. If it's not, you have a problem. You'll also know there's a problem if, in your first loop, you go to decrement a value that's zero or who's key isn't in the hash.

That's a very simple way to verify html code and it won't tell you where the problem is. But I'm sure could create something much more robust.

philote · Mar 11, 2004

Or you could also probably use HTML::Lint. Do some searches at

http://search.cpan.org

to find this and other helpful modules.

duncdude · Apr 2, 2004

it is the input record separator

by default it is the [red]newline (\n)[/red] but undef overrides that - if I had not done that the scalar would only hold the first line of the HTML

did you try the script?

Kind Regards
Duncan

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

perl html syntax checker

welshspoon

Technical User

PaulTEG

Technical User

welshspoon

Technical User

duncdude

Programmer

duncdude

Programmer

PaulTEG

Technical User

icrf

Programmer

Phalanx1

Programmer

PaulTEG

Technical User

welshspoon

Technical User

philote

MIS

philote

MIS

duncdude

Programmer

Similar threads

Part and Inventory Search

Sponsor