I need to parse an HTML document using JTidy. I have found the examples from the web and am making good headway. I just need a steer on how to traverse the xml document.
The HTML that I am trying to parse is like the following
<html>
<body>
<div class="person">
<H2>Charlie</H2>
</div>
<div class="address">
<H2>1 Acacia Avenue</H2>
</div>
</body>
</html>
Now the java I am using, uses a URL class, gets an input stream and then uses JTidy to create an XML dom. So far so good.
The only methods on the TidyDOM class that I can get is getElementsByTagName. Well here the tag name is "div", but what is I just want the div tags with a class of "person".
The code is so far :
u = new URL(URL);
is = u.openStream();
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document tidyDOM = tidy.parseDOM(is, null);
NodeList divTags = tidyDOM.getElementsByTagName("div");
for (int i=0;i<divTags.getLength();i++){
//This is where I am having problems
}
Please not all the usual exception handling is in place, it is just omitted for clarity.
Thanks in advance.
Charlie
The HTML that I am trying to parse is like the following
<html>
<body>
<div class="person">
<H2>Charlie</H2>
</div>
<div class="address">
<H2>1 Acacia Avenue</H2>
</div>
</body>
</html>
Now the java I am using, uses a URL class, gets an input stream and then uses JTidy to create an XML dom. So far so good.
The only methods on the TidyDOM class that I can get is getElementsByTagName. Well here the tag name is "div", but what is I just want the div tags with a class of "person".
The code is so far :
u = new URL(URL);
is = u.openStream();
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document tidyDOM = tidy.parseDOM(is, null);
NodeList divTags = tidyDOM.getElementsByTagName("div");
for (int i=0;i<divTags.getLength();i++){
//This is where I am having problems
}
Please not all the usual exception handling is in place, it is just omitted for clarity.
Thanks in advance.
Charlie