Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse and merge data

Status
Not open for further replies.

JonoB

Programmer
May 29, 2003
147
0
0
GB
Hi all,

Need some tips to help me approach this problem. I would consider myself advanced in Access and VBA, so just looking for conceptual pointers.

In essence, I have a number of text files, each of which contains data that is parsed into various database tables. The data thus stored in the tables contains 7 fields of information (including where the records where derived - i.e. the producer of the data source). I have this part of the system working 100%.

The next part is that there are some peculuarties with regards to the data that is contained in the text files, as follows:
-Each text file comes from a different data source
-Each data source produces only one text file
-Each text file will contain approx 70,000 number of records
-I will parse anywhere from two to 6 text files per run (this takes about 1-2 minutes in total to run), which is all handled in VBA
-Each record in the text file contains a time stamp; however, the clocks betweene each recordsource are not syncronised. These time differences vary from recordsource to recordsource, but are always less than 30 seconds.
-Each record contains 7 fields of information, some of these fields may be null. The time stamp field will never be null and the data source field will never be null.
-There may be identical records from the seperate data sources (with the exception of the abovementioned time stamp). This means that 5 out of the 7 fields will be EXACTLY the same in one recordsource as one (or more) of the others, with the exception of the time and data source fields.
-The amount of time covered by each recordsource is usually around 3 hours.

Basically, what I would like to do is to take these x number of records and compare them to eachother, with the following:
*Where there are identical records, then I would like to delete all but one of the records. A record would be considered identical if 5 of the 7 fields matched, and the time stamp was within 30 seconds, but the data source is different.
*Where there are unique records (i.e. no matches found from the other data sources within the 30 second timeframe), then those records are retained.

I was originally hoping that I could do a SELECT DISTINCT on the records, but the one major flaw is that the time field varies from data source to data source.

Whew. Hope the above makes sense.

Thanks for any pointers that you can provide.
 
Otherwise you are going to have to do a select duplicates query on the 5 fields and sort and compare by time.

You could, of course round the date time to the nearest minute (or 2) to catch the 30 seconds. (int([Date time]*24*60*2+1)/2) if my logic serves.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top