Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

perl html syntax checker

Status
Not open for further replies.

welshspoon

Technical User
Dec 14, 2002
34
GB
Please could somebody help me create some code in order to check another html page's syntax. I'm not very good at perl so i don't know where to start.

please help
 
What exactly are you looking to do?

If its a one off look for an editor which will colour code your syntax which will make it easier to parse manually

On search.cpan.org

See HTML::TreeBuilder, HTML::parser, HTML::SimpleParser

HTH
--Paul
 
I have a text box in a form in which I want to enter a URL, the CGI script then checks this URL's HTML for syntax errors, and returns any it finds.

thank you for your time
 
surely this is one hell of a task? while it's very easy to locate the page and return the HTML it is another thing entirely to evaluate if the code is syntactically correct... or am I talking !@£$%


Kind Regards
Duncan
 
How are you going to check inline Javascript?
--Paul
 
Aren't there already several sites and tools that do (x)html validation? I've seen links on pages that say "this site is compliant" and it goes to some verification tool. If that's all your goal is, then it's already been done and would be wasteful to do again.

A quick google turns up
________________________________________
Andrew - Perl Monkey
 
a little cut out from my COM component

Code:
my $File_With_Ext = $buf;
chomp $File_With_Ext;
print $s "$File_With_Ext almost done.\r\n";
open CWork, ">>C:/docume~1/Administrator.PHANTOMX/Desktop/$File_With_Ext" or die "Can't Open a FileHandle: $!";
print CWork '<html><head><title>Test Environment</title></head><body bgcolor="black" text="ffffff"><h1>For PhantomX Developers.</h1><input type="button" onclick="VBExec()" value="Execute VBScript"><textarea id="EVBS"></textarea><br><input type="button" onclick="PSEval" value="Eval PerlScript"><textarea id="EPS"></textarea><br><input type="button" onclick="JSEval()" value="Eval JScript"><textarea id="EJS"></textarea></body><script language="VBScript">Function VBExec() : Execute(document.getElementById("EVBS").innerText) : End Function </script><script language="PerlScript">sub PSEval {eval($window->document->getElementById("EPS")->innerText);}</script><script language="JavaScript">function JSEval() {eval(document.getElementById("EJS").innerText);}</script></html>' or die "can't write to filehandle: $!";
use HTML::TreeBuilder;
  my $tree = HTML::TreeBuilder->new();
  $tree->parse_file(CWork);
  $html = $tree->as_HTML or die "Can't Parse: $!";
    print $s "checking ...$html..( File Done. ) \r\n";
  $tree->delete;
  close CWork;
}
hope it helps.
type CWORK.htm or .hta or .mht to build the web page with added extension.
definanatly not on a unix system though, sorry.
 
Welshspoon, #this six nations'll never catch on

According to watson.addy.com a(n) html page containing fully functioning Javascript(inline) fails almost each line of code, but that's because it was built to check html. (PS. it should have picked it up as a comment, unless the engine has become more/less forgiving than it used to be ...)

Can you be more specific about your requirements?

--Paul
 
Thank you all for the hard work and effort so far, I'll hand out stars at the end of the thread...

One last thing - isnt there a simple way of just getting the Perl Script to look at the HTML, and find things wrong such as an unclosed tag?

Maybe counting the opened and closed tags, but that doesnt take into account the horizontal rule and images and so forth.

To cut this short - can I have a script that counts closed and opened HTML brackets, but ignores <IMG SRC =""> and other ones that dont get closed?

Therefore at the end of it, if there is a difference in the 2 numbers, then there is something wrong with the coding?
 
You could probably use HTML::TokeParser or something similar so you don't have to re-invent the wheel.

For example, do a loop that gets a token until no more tokens are found. You'll have a hash that holds html tags as the key and a number for the value. If they type of token is "S", and the tag is one that requires an end tag, put that token's name in the hash as a key with a value of 1 if it's not already there. If it's already there, increment the value for that key by one. For each end tag (type "E"), do the same, but decrement the value for that key's value. After that loop's done, loop through the hash and make sure each value is 0. If it's not, you have a problem. You'll also know there's a problem if, in your first loop, you go to decrement a value that's zero or who's key isn't in the hash.

That's a very simple way to verify html code and it won't tell you where the problem is. But I'm sure could create something much more robust.
 
it is the input record separator

by default it is the [red]newline (\n)[/red] but undef overrides that - if I had not done that the scalar would only hold the first line of the HTML

did you try the script?


Kind Regards
Duncan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top