Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HTML, XSLT and XPath

Status
Not open for further replies.

aarochita

Programmer
May 19, 2003
2
0
0
US
Hi All,

I've got the following HTML fragment:

<td>
<b>Author: </b>
<a href=&quot;show.asp?author=Firtsname+Lastname&quot;>Firstname Lastname</a><br/>
<b>Format: </b>
Paperback<br/>
<b>ISBN:</b>0999999999 <b>Pages:</b> 44<br/>
<b>Date:</b> 1/1/1999<br/>
<b>Publisher: </b>
<a href=&quot;show.asp?publisher=Publisher+Company&quot;>Publisher Company</a>
<br/> <br/>
</td>

If I analize this with XPath I got the following results:
Xpath: /td/b
Result:
Author:
Format:
ISBN:
Pages:
Date:
Publisher:

Xpath: /td/text()
Result:
(blank line)
Paperback
0999999999
(blank line)
44
1/1/1999
(blank line)

Xpath: /td/a
Result:
Firstname Lastname
Publisher Company

I can access with XPath any of these records, like this: /td/b[1] (word: Author:) and /td/a[1] (author's name); or /td/b[2] (word: Format:) and /td/text()[2] (Paperback).

The problem is that all those fields are optional and the two fields with links (Author and Publisher) don't need to be with links, they could be like the others fields (ISBN, Format, etc.)

I would like create a XSLT that could analyze all these options if is possible.


Thanks,

Arty
 
wow this is quite the other way around then what i am used to see.
You are using html to store data in. The most use of XML and HTML i have seen is to store the information in an XML file (with an appropiate XML schema so the structure can be verified) something like this

<book isbn=&quot;0999999999&quot;>
<author>
<firstname>Firstname</firstname>
<lastname>Lastname</lastname>
</author>
<format>Paperback
</format>
<pages>44</pages>
<date>1/1/1999</date>
<publisher>Publisher Company</publisher>
</book>

An XSL file then could be used to generate the needed HTML as in the example you showed before, while keeping the data format easily readable for other people. If you are generating the HTML directly from a database and now find out you want to do something extra with the data, I suggest you use the same method to retrieve from the database as you used to generate the html.

I am not totally sure what you intend to do. It will prove difficult getting the correct information from the html because of the versatility of the input records.
 
Hi kibje,

Actually I'm not using HTML to store data, what I'm trying to do is a screen scraping program and I found this problem in one of the pages.
If you or anybody have an idea on how to connect the name of the field to the data I will appreciatte.

Thanx,

Arty
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top