Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

problems in parsing html tags

Status
Not open for further replies.

Kirad

Programmer
Jul 13, 2001
5
CA
If you have an idea or an easier way of doing this please let me know. I'm trying to parse through the following html page, what I want to do is save each row of the table that has <tr> tag into a file, then print the next row starting with <tr> on the next line and so on till it reaches the bottom of the page which ends with </tr>. In other words I want to extract the Country names and the numbers and save it in a table format, into a file.
I tried several things but seems like I ain't getting anywhere. I'm trying to find a way to avoid all the tags and get country names and the digits.
I would appreciate your help if you have ideas of how to do this. Thanks.

Here is the html page, I ignored the <table> & </table> tags.

<tr><td colspan=&quot;8&quot;><div align=&quot;right&quot;>
<b>16 July 2001</b>
</td>
</tr>

<tr bgcolor=&quot;#d1deef&quot;&quot;>
<td width=&quot;31%&quot;>Currency Name</td>

<td width=&quot;10%&quot;>US Dollar</td>

<td width=&quot;10%&quot;>Euro</td>

<td width=&quot;10%&quot;>British Pound</td>

<td width=&quot;10%&quot;>Yen</td>

<td width=&quot;10%&quot;>Swiss Franc</td>

</tr>

<tr><td width=31% bgcolor=#d1deef>
US Dollar</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
&nbsp;&nbsp;-
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.8542

</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
1.4043

</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.008015

</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.5651

</td>

</tr>

<tr><td width=31% bgcolor=#d1deef>
Euro</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
1.1707
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
&nbsp;&nbsp;-
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
1.6426

</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.009355

</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.6611

</td>

</tr>

<tr><td width=31% bgcolor=#d1deef>
British Pound</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.7121
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.6088
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
&nbsp;&nbsp;-
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.005692

</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.4023

</td>

</tr>

<tr><td width=31% bgcolor=#d1deef>
Yen</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
124.7661
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
106.8947
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
175.6852
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
&nbsp;&nbsp;-
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
70.4792

</td>

</tr>

<tr><td width=31% bgcolor=#d1deef>
Swiss Franc</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
1.7696
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
1.5126
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
2.4857
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
0.014189
</td>

<td width=10% bgcolor=&quot;#bcdef6&quot;>
&nbsp;&nbsp;-
</td>

</tr>

 
Sorry I forgot to include the code I'm using to read in the page.


import java.net.*;
import java.io.*;

public class URLReader
{
public static void main(String[] args)throws Exception
{
String input;
String res;
int count=0;
URL carleton = new URL(&quot; BufferedReader in = new BufferedReader(new InputStreamReader(carleton.openStream()));
PrintWriter resultFile = new PrintWriter(new FileWriter(&quot;results.txt&quot;));
BufferedReader inputFile = new BufferedReader(new FileReader(&quot;results.txt&quot;));
while((input=in.readLine()) != null)
{
input = input.trim();
if(input.indexOf(&quot;<table&quot;) != -1)
{
count++;
//System.out.println(&quot;COUNT:&quot;+count);
if(count != 2)
{
//System.out.println(input);


while((input=in.readLine()) != null)
{
input = input.trim();

if (!input.equalsIgnoreCase(&quot;</table>&quot;))
{
//System.out.println(input);
resultFile.println(input);

}
else
{
System.out.println(input);
break;
}
}
resultFile.close();

}
}

}
in.close();


}
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top