Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can you speed up this algorithm? 2

Status
Not open for further replies.

nathanstevenson

IS-IT--Management
Oct 4, 2002
70
0
0
GB
Hi there,

Below is a routine to create unique IDs from a flat file, which enables direct importing into Access. The problem is that it takes a long time (approx 2 mins for 330 lines) to execute.

Is there someone REALLY bright out there that can help me speed this up? I don't know much about optimizing code.

Here is the routine:

Function CompareAndID(ByVal colID As Integer, NoToCompare As Integer, Optional col1 As Integer, Optional col2 As Integer, Optional col3 As Integer, Optional col4 As Integer, Optional col5 As Integer)

' Description:
' A programme designed to assign a unique ID to a set of data based on
' two values. It will recurse through the entire dataset and assign each
' unique entry the same ID, then progress and do the same for the other data.

' Use:
' When importing data into a clean Access database, it may be necessary to
' split the data up into your predefined tables. To maintain the relationship between
' the data while still making sure all entries are only stored once, unique IDs need
' to be assigned to table data. This programme simplifies the task so that a direct
' import into Access is possible.

' This is algorithm was developed by Mike Wolf, with additions and alterations
' from Rob Broekhuis and Nathan Stevenson in the Tek-Tips (VBA) forum. ' Date: 03 Jan 2003.
'
' Additional Notes:
' One has to set the "global" parameters for i, which is the row to start comparing from and
' totalrows, which has to be the exact number of rows of data. It makes it easier to setup if a row
' with column numbers is inserted ABOVE the column headings. This row can then be removed
' for importing to the staging table.

Dim bVal As Variant ' variable for 1 col to be compared
Dim cVal As Variant ' variable for 2 col to be compared
Dim dVal As Variant ' variable for 3 col to be compared
Dim eVal As Variant ' variable for 4 col to be compared
Dim fVal As Variant ' variable for 5 col to be compared

Dim i As Integer ' the row to start comparing from
Dim j As Integer ' subsequent rows to be compared
Dim IDnumber As Integer ' ID Start number
Dim totalrows As Integer ' the total number of rows to be compared

i = 2
IDnumber = 1
totalrows = 337

' Assign values to your parameters

If IsMissing(col1) Or Null Then


Else
colID = colID
col1 = colID + 1
col2 = col1 + 1
col3 = col2 + 1
col4 = col3 + 1
col5 = col4 + 1
NoToCompare = NoToCompare
End If

Select Case NoToCompare

Case 5

bVal = Cells(i, col1).Value ' Puts value of cell col1i into bVal
cVal = Cells(i, col2).Value ' Puts value of cell col2i into cVal
dVal = Cells(i, col3).Value
eVal = Cells(i, col4).Value
fVal = Cells(i, col5).Value

For i = 1 To totalrows ' If you have vals in rows 1 to 1000
For j = i To totalrows ' look at all the remaining rows
If Cells(j, col1).Value = bVal And Cells(j, col2) = cVal And Cells(j, col3) = dVal And Cells(j, col4) = eVal And Cells(j, col5) = fVal Then ' Loop through all and
Cells(j, colID).Value = IDnumber ' If col1 and col2 match first
End If ' entry then assign currentID
Next
If i < totalrows Then ' If not then go to next entry to be compared
If Cells(i + 1, colID) = 0 Or Null Then ' See if the next cell has been identified as matching
bVal = Cells(i + 1, col1).Value ' a higher row - if 0 then it has NOT been matched
cVal = Cells(i + 1, col2).Value ' So put new entries into bVal and cVal to be used as
dVal = Cells(i + 1, col3).Value
eVal = Cells(i + 1, col4).Value
fVal = Cells(i + 1, col5).Value
Else ' the new set for comparison
Do While i <= totalrows
i = i + 1 ' Find the next row where col1 and col2 have not been matched
If Cells(i + 1, colID).Value = 0 Or Null Then
bVal = Cells(i + 1, col1).Value
cVal = Cells(i + 1, col2).Value
dVal = Cells(i + 1, col3).Value
eVal = Cells(i + 1, col4).Value
fVal = Cells(i + 1, col5).Value
Exit Do
End If
Loop
End If
IDnumber = IDnumber + 1 ' Increment the ID number
End If
Next
End Select
End Function

Any help much appreciated!

Thanks,
Nathan
 
It would appear you have been given a solution to pre-process the data to make it Access ready, which is a good concept, but does not take advantage of some things you can do in Access to speed things up.

The problem with this algorithm is the use of sequential searches.

It makes sense from a database normalization process to create one table with the five unique fields (and any other attributes associated with those unique fields), and import the five columns into this table, allowing any duplicate keys to 'collide out'. This table should have an autonumber field to act as an ID number.

It should run fast, too.

A second import would add ALL data to a temp table.

A query would then be used to generate the second (permanent) table, which would be created by joining the first table to the temp table by the five columns, and setting a numeric ID field in the second table to the autonumber ID field from the first table.
 
Hi Beetee,

Good reasoning there, only thing is that the 5 fields are not unique, so this algorithm is used to create 1 unique field based on 5 different fields (which can be similar within the same column). This unique key is then used in the manner in which you speak of above, along with other keys created using any combination of 1 to 5 similar or unique fields.

Any ideas?
Nathan

 
A solution with no coding at all:

First step (to do only once):
Import (or link) your flat file into your Access database as FlatTable.

Create one query on FlatTable.

Include the 'identification' fields in the grid.

Turn it into a groups/totals query(in design view go toView-Totals).

Turn it into a make-table query

Run it once to create a table with unique combinations between your fields

Go to the newly created table, add an AutoNumber field and save the table.

Go back to your make table query design.

Add the newly created table to the query design.

Turn the make table query to an Append query.

Join the 5 fields, then turn each join into type 2 or 3(arrows should point to the new table).

Include the autonumber field in the query grid.

Set the condition for this field: Is Null

Make sure you do not attempt to append the field (delete it from the Append To line)

Save the query.

Now, all you have to do is run the append query. If you don't have the flat file linked, import the data prior to running the query.

This will append all data that does not exist in the access table and will allocate an ID to every record.

It's not so different from what beetee said...and it should run really fast for 332 lines.

Good luck,

[pipe]
Daniel Vlas
Systems Consultant
 
Hi Beetee,

Good reasoning there, only thing is that the 5 fields are not unique.

*** which is why the first step of the import finds unique combinations. ***

so this algorithm is used to create 1 unique field based on 5 different fields (which can be similar within the same column).

*** i really don't understand this sentence, sorry ***

This unique key is then used in the manner in which you speak of above, along with other keys created using any combination of 1 to 5 similar or unique fields.

*** in any case, kudos to danvlas for either originating his own ideas, or expanding my ideas into a more palatable format. ***
 
I am starting to see the light.. I have been searching for this answer for a very long time.

Just to check.. say the flatfile contains data for 8 tables. Do I create a query for each table with all the fields that need to be unique, but include the Primary ID from the flatfile table (so that it creates a temp table that has it's own key, the key from the flatfile and the rest of the fields).

Then do this for each table

Then make an IDs table by linking all these &quot;new tables&quot; by the flatfile key and putting them all into their own table.

So now each table would have it's own autonum key, plus no duplicates, but how then do you setup relationships aside from those through the flat-file key?

Between the two of you, I feel like I am a breath away from enlightenment. I have a fully functioning database, but because I pre-process the data in Excel it's very difficult to add in new data, as the IDs are assigned in Excel. In the meantime I have been creating a new database for each month, in the hope that I could always just link them all in once I had completed a final updateable version. I have been working on an autonum version for a while now, but it has always been just out of reach.

Thanks for your help!
Nathan
 
Hi beetee,

It appears your advice has been clouded by my confusion :)

For clarity: Is a new make-table query necessary for each set of table data coming from the flat file?

And: Are all the new tables linked via the relationship with the flatfile ID?

Also in response to Danvlas,

Can you skip the make-table bit by creating an append query (from the flatfile to the destination table) where the destination table primary key is an autonum?

I'm sorry if this all seems trivial, it's just kind of like code without comments ... I just don't completely understand. I am trying though ...

Thanks for your help!
Nathan
 
Howdy Nathan,

I think what you are saying seems right on the money.

To further obscure the matter:
1) I would suggest you design the table first, then use append queries. The reasons are: a) you have control over the table structure, and can specify indexes, b) any subsequent imports would be appends in any case.
2) Danvlas makes an excellent point in using 'group by' in the query to eliminate key collisions.

Regarding your other requests for clarification, consider this example:

Customers place Orders
Orders have OrderDetails

So, we need three tables.

The Customers table has some customer ID as the primary key, an autonumber field as a link field (used in orders), and other customer specific attributes.

The order table has some order ID as a primary key, an autonumber field for the link field (note: these *can* be the same field) (used in OrderDetails), the ID of the customer that placed the order, and other order specific data (e.g. order date, shipping address)

The OrderDetails table has an autonumber ID (just in case it's needed elsewhere), the order id, the item being ordered, qty, price paid, etc.

You would set up relationships accordingly, e.g. link the autonumber key in the customer table to the CustomerID field in the order table.

NOW, let's say you were importing a flat file that contained all this information, but with much repeated data.

1) Import the flat file into a temp table.
2) create a query to append fields from the temp table that are destined for the customer table. Use group by, and an append query.
3) create a second query to append to the orders table. Join the temp table with the customer table, add all fields that apply to orders (including the customer ID)
4) create a third query to append to the order details table, joining the temp table with the orders table...


Of course, in the real world you are going to run into problems with the import. Customer names will vary slightly, etc. I would suggest that you go through the imported data carefully, fix the input data, and re-import. This may require backing up the datbase prior to importing (which is a good idea in any case).

also, consider splitting your database so you can keep the code and data separate.

Thus, we should be able to solve your problem without code.
 
Well, the whole idea is to have a destination table. I doesn't make any difference how you create it. The make table query just creates and populates the table first time so you save some time. You will only have to add the AutoNumber field, as it is not created by the query. If you do use a make table query, be sure you delete it as soon as you got your result. See below for the reasons.

After that, everything is append. The grouping creates unique records from the flat file, the outer join with the Is Null condition makes sure the records have not been already imported, as it just selects records from the flat file that do not exist in the destination table.

In your case, as beetee pointed, your flat file would contain data for several related tables. To create a normalized database containing the same data, you will have to think about your data model. Then decide what flat file field should go to what normalized table. Then create the normalized table and its corresponding append query.

We talked here about creating the primary key for each table. However, this is just half of the process. In a relational database, you also have related tables that need a foreign key. So, all 'child' tables, such as Orders, should contain a Long Integer field (default value Null). After importing data to the child table, you need to populate the foreign key with the value from the parent table. This can be done through an update query or code. And this may be a slower, because the update query will need to look up values in the flat file and in the parent table to detect the identifier.

The whole process should have the following structure:
Append query to add data to the 'parent' table-this creates the primary key
Append query to add data to the child table - the foreign key is null so is does not violate the referential integrity rules.
Update query to populate the foreign key in the child table with the value created in the parent table

Now, the child table becomes a parent for another table and the process is repeated (like in real life [smile]).


One note about make table queries used regularly: I personally avoid them like plague and I call them &quot;make-trouble&quot; queries. Reasons:
They always delete the table if it exists. And it exists after running the query for the first time. Deleting one table does not release space in the database. Next, it re-creates the table with new data. Now your database is of course, larger, as it uses new space for data. Every time you run the make-trouble query, it will waste the space for the existing table and will consume some more space for the new one. The database will keep growing until you compact it.
Another reason: think of a multi-user environment. You run the make-trouble query and think [wiggle]you have your data ready for processing. By the time you click the 'Process' button, some friend of yours from another office runs the query again and [cannon]overwrites the table with his information. And when you get the results (based on his data), you [curse]start pulling your hair and blame everybody except the silly programmer that found an easy way to prove the system was working...[hammer]
I have the same appreciation for abusive use of temporary tables in Access (and any other) applications. That's because I've been burnt once badly [mad]and don't want it to happen again.
Disclaimer:[lol]There are cases where you have no other choice...

Just hope I didn't confuse you more...

And prolly someone will kick me for abusive use of smileys...
[pipe]
Daniel Vlas
Systems Consultant
 
I see the light!!

Beetee, your example was crystal clear and Dan, what can I say except classic. Bring on the smiley wars. I especially liked:

Now, the child table becomes a parent for another table and the process is repeated (like in real life :)).
 
Hi Guys,

I have met with a small problem in this process, and I am not sure why it is arising.

Everything is working smoothly except when I try and join on more than 2 tables. Say table no 4 has to include foreign keys for tables 1,2 and 3. What happens is that I get all the keys for tables 1 and 2, but only some of the keys for table 3.

I don't know why this is happening because I am following the same process for each of the tables, thus theoretically I should get all the keys for table 3. But in reality, there are missing keys.

Any ideas?

Nathan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top