I wrote a program for a friend that goes to music sites and retrieves the top artists for the week (e.g. or It works fine, but I got to thinking about how to make it more flexible. I have the parse routines working fine but they are hard coded in.
I would like to be able to store the site information and parsing information in a file, I thought XML would be a good choice.
Here is an initial idea for a basic structure
<musicSite>
<Address url = " <Column name="Last Week"></Column>
<Column name="This Week"></Column>
<Column name="Artist"></Column>
<Column name="Title"></Column>
</musicSite>
I need something before the columns to tell the program where to start parsing. On the muchmusic site where the <table> structure begins, the first row is not of any value to me. So I need to be able to account for this as well.
Take the second example, the billboard site. the Title and Artist are in the same column separated by a comma. Again I have a routine for this specific site, but I would like something more general.
The only thing that I can see that would be common is that they use the html table tags to format the information.
Ultimately I would like to be able to have the user create there own xml files. I am not stuck on xml, a simple plain text file would suffice. I want a system that is flexible enough to allow me to address 90% of the sites that may be used. I am only worrying about sites that use the html <Table> tag to format the data.
Any comments or ideas would be appreciated.
Troy Williams E.I.T.
fenris@hotmail.com
I would like to be able to store the site information and parsing information in a file, I thought XML would be a good choice.
Here is an initial idea for a basic structure
<musicSite>
<Address url = " <Column name="Last Week"></Column>
<Column name="This Week"></Column>
<Column name="Artist"></Column>
<Column name="Title"></Column>
</musicSite>
I need something before the columns to tell the program where to start parsing. On the muchmusic site where the <table> structure begins, the first row is not of any value to me. So I need to be able to account for this as well.
Take the second example, the billboard site. the Title and Artist are in the same column separated by a comma. Again I have a routine for this specific site, but I would like something more general.
The only thing that I can see that would be common is that they use the html table tags to format the information.
Ultimately I would like to be able to have the user create there own xml files. I am not stuck on xml, a simple plain text file would suffice. I want a system that is flexible enough to allow me to address 90% of the sites that may be used. I am only worrying about sites that use the html <Table> tag to format the data.
Any comments or ideas would be appreciated.
Troy Williams E.I.T.
fenris@hotmail.com