Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parsing html

Status
Not open for further replies.

dubBeat

Programmer
Apr 2, 2007
21
IE
Hi,

I've just completed some code that sends a post request to a servlet. This servlet then generates a link to a page where there is further information.

e.g. The link looks like this


This is the page source of the page that the link takes you to.
**********************************************************
<html><head><META HTTP-EQUIV='PRAGMA' CONTENT='NO-CACHE'><title>Position info</title><link href="style.css" type="text/css" rel="stylesheet"></head><body><table border='0'><tr><td class='tGray' valign='top'><table border='0' cellpadding='2' cellspacing='2'><tr><th>Attribute</th><th>Value</th></tr><tr><td>MSISDN</td><td>1234567</td></tr><tr><td>Time</td><td>2007-04-05 20:51:00</td></tr><tr><td>X coordinate</td><td>49 21 09N</td></tr><tr><td>Y coordinate</td><td>6 56 34W</td></tr><tr><td>Inner radius</td><td>825</td></tr><tr><td>Arc width</td><td>1100</td></tr><tr><td>Start angle</td><td>270</td></tr><tr><td>Extent angle</td><td>120</td></tr></table></td><td class='tGray' valign='top'><img src='image?type=arc&in=825&width=1100&start=270&extent=120'></td></tr></table></body></html>
****************************************************

The info on the page is in the format of Attribute Value.

I want to get the values out of the page for just the"X coordinate" attribute and put it in a variable.

I've been looking on the net about parsing html in c#. Most of what I've found seems very over complicated for what I want (just getting one attribute whose name I know). I'm not even sure if "parsing" is what I should be investigating.

I tried to access the page using a httpWebRequest get method

Again im not sure if this is the right way to go about it.
************************************************
using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
using (StreamReader readStream = new StreamReader (responseStream, Encoding.UTF8))
{
result = readStream.ReadToEnd();

}
**********************************************

but the source that that code returns is is missing all of the attributes and values.

e.g. the source is

************************************************
<html><head><META HTTP-EQUIV='PRAGMA' CONTENT='NO-CACHE'><title>Position info</title><link href="style.css" type="text/css" rel="stylesheet"></head><body><table border='0'><tr><td class='tGray' valign='top'><table border='0' cellpadding='2' cellspacing='2'><tr><th>Attribute</th><th>Value</th></tr></body></html>
************************************************
You can see that it says Attribute and Value but the information is not there.

Could somebody tell me the correct way to get the value from the page where the attribute I want is known?

Thanks

Dub
 
A quick "hack" would be to put this code:

Code:
[blue]int[/blue] start = result.IndexOf("[red]<td>X coordinate</td>[/red]", 0, result.Length);
result= result.Substring(start, result.Length - start);
result= result.Replace("[red]<td>[/red]", "[red]^[/red]");
result= result.Replace("[red]</td>[/red]", "[red]^[/red]");
[blue]string[/blue][] values = result.Split('^');

[blue]string[/blue] myValue = values[3]);

right after your:

Code:
result = readStream.ReadToEnd();:


Hope that works for you!


Ron Wheeler
 
There's no "correct" way to parse HTML, as the standard is very loose. You just have to hack at it with substrings and case-conversion methods.

XHTML 1.0 is an attempt to make it more parsable by requiring it to be a valid XML document, and is a good first step. But sadly, the vast majority of documents out there aren't compliant. :-(

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top