Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Structure Problem 1

Status
Not open for further replies.

fenris

Programmer
May 20, 1999
824
CA
I wrote a program for a friend that goes to music sites and retrieves the top artists for the week (e.g. or It works fine, but I got to thinking about how to make it more flexible. I have the parse routines working fine but they are hard coded in.

I would like to be able to store the site information and parsing information in a file, I thought XML would be a good choice.

Here is an initial idea for a basic structure

<musicSite>
<Address url = &quot; <Column name=&quot;Last Week&quot;></Column>
<Column name=&quot;This Week&quot;></Column>
<Column name=&quot;Artist&quot;></Column>
<Column name=&quot;Title&quot;></Column>
</musicSite>

I need something before the columns to tell the program where to start parsing. On the muchmusic site where the <table> structure begins, the first row is not of any value to me. So I need to be able to account for this as well.

Take the second example, the billboard site. the Title and Artist are in the same column separated by a comma. Again I have a routine for this specific site, but I would like something more general.

The only thing that I can see that would be common is that they use the html table tags to format the information.

Ultimately I would like to be able to have the user create there own xml files. I am not stuck on xml, a simple plain text file would suffice. I want a system that is flexible enough to allow me to address 90% of the sites that may be used. I am only worrying about sites that use the html <Table> tag to format the data.

Any comments or ideas would be appreciated.

Troy Williams E.I.T.
fenris@hotmail.com
 
I've done something similar in parsing FTP text files received from different sources. The basic description of definitional files follows:

Description

The details of this development is to have three tables to the describes the data file structures between the various components. The first table will contain one record for each site that you would visit. Included within this table will be the following columns:

SiteNames

The SiteNames Table will contain one record for each site that you could visit.

Site ID The ID of the site
Site URL The URL of the site

SiteRecords

For each Site defined within the SiteNames table, it will be necessary to define the rules for how that data will look upon retrieval. There would one record in this table for each type of data line that you would parse. This table might look something like the following:

Site ID The ID of the site
Record Type ID A specific Record Type ID
Record Format The layout of this record (F = Fixed, V = Variable)
Record Delimiter If Variable Format, the delimiter between the fields
Record Type Key A value with in the data indicating the Type of Record (html tag?)
Record Key Pos Where to find the key value

SiteFields

For each record type defined within SiteRecords, it will be necessary to define the layout of that data line. The columns in this table might something like this:

Site ID The ID of the site
Record Type ID A specific Record Type ID
Field ID An individual Field ID
Start Column Either Starting Column (Fixed), or Ordinal position (Variable) of field
Field Length Number of columns allocated for this field (Fixed only)
Field Value What this data item represents
Field Justify Justification (Left, Center, Right) of this field
Field Mask Specific Field Formatting rules

This is a very high level view, but I think it at least sheds some light on this approach, and may provide at least a primer or some ideas on how to build such a system for your needs. Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein
 
I was hoping someone who had done something similar would reply, Thanks!

You have a very interesting approach. I'll have to ponder it for a while to see how I can adapt it. From what I can see it would lend it self very well to an access db format.


Thanks again for the informative and very prompt response!

Troy Williams E.I.T.
fenris@hotmail.com

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top