Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Here's a challenge - Comparing 2 databases 1

Status
Not open for further replies.

nathanstevenson

IS-IT--Management
Oct 4, 2002
70
GB
Greetings,

I have 2 databases with the same tables, the same field setups and the same relationships. The first database records all historical data (called A). The second one is a staging database, which records one month's worth of data (called B). I am trying to build a program to import the monthly DB into the historical DB.

As I am in the process of switching over from an old process to this new all-Access process, I want to have this distinction so that the primary DB only contains consistent, robust data with very few inconsistencies.

So the question is.. how do I compare records in db B, with records in db A, if they are the same assign them Primary Key from db A, if not, append. My guess is that I make one HUGE query that creates a new table, with all fields from all tables. Then somehow compare the fields in this "Import" table to each record in each table in A, where there is a match, subset of data in "Import" table is assigned key from A. If there is no match, checks to see how many records in tbl A, and assigns ID, as if it was appended. But it also needs to check internally, to see what num we are at (in terms of appending), because all the keys need to exist BEFORE the data can be imported into the tables, or else referential integrity on the primary DB will be compromised.

E.G. to import you "start at the top", and add in your records as you go, table by table. This way you can maintain integrity.

Again, I am stumped when it comes to comparing tblA, rec1 with tblB, rec1. How is this done programmatically? I have had a bit of experience with VBA in Excel, but I am having difficulties even addressing the different objects (rows, cells, tables etc) in Access. Also I would far prefer to use recordsets as opposed to the macro type code, cause I need to learn!

Any help would be highly appreciated!
Nathan
 
There is a way to create an array from SQL. I'm not at a spot where I can look it up. Once you have the arrays created then you just compare the two row by row and set a flag when a change is found.

I'll try to post more this evening when I get home from work.

Have a great day!

PS. You might also check out the TableDefs collection. It will allow you to iterate through all the fields in each table in a database.

Good Luck!
 
Let's do this in pieces so it doesn't get too ugly to talk about. The following procedure opens Old and New databases and compares the table names to ensure that they are equal in name and number to each other. It does this by spinning through the TableDefs collection and ignoring system tables (eg those starting with msys). Feel free to post again if you have questions about the code.

Public Sub OpenDatabases()
Dim strPath As String
Dim dbsOld As Database
Dim tdfOld As TableDef
Dim avarArrayOld As Variant
Dim dbsNew As Database
Dim tdfNew As TableDef
Dim avarArrayNew As Variant
Dim intX As Integer
Dim intUBound As Integer

strPath = "ValidPath (eg c:\access\my database.mdb)"
Set dbsOld = OpenDatabase(strPath) 'Open old database
strPath = "ValidPathForNewDatabase"
Set dbsNew = OpenDatabase(strPath) 'Open new database

'Get table names for the Old database
intX = 0
For Each tdfOld In dbsOld.TableDefs
If InStr(1, tdfOld.Name, "MSYS", vbDatabaseCompare) = 0
Then 'Skip system tables
If intX = 0 Then
ReDim avarArrayOld(intX)
Else
ReDim Preserve avarArrayOld(intX)
End If
avarArrayOld(intX) = tdfOld.Name
intX = intX + 1
End If
Next tdfOld

'Get table names for the New database
intX = 0
For Each tdfNew In dbsNew.TableDefs
If InStr(1, tdfNew.Name, "MSYS", vbDatabaseCompare) = 0
Then 'Skip system tables
If intX = 0 Then
ReDim avarArrayNew(intX)
Else
ReDim Preserve avarArrayNew(intX)
End If
avarArrayNew(intX) = tdfNew.Name
intX = intX + 1
End If
Next tdfNew

'Are the table names equal in number and name
If UBound(avarArrayOld) > UBound(avarArrayNew) Then
MsgBox "Old table name array is longer"
intUBound = UBound(avarArrayOld)
ElseIf UBound(avarArrayNew) > UBound(avarArrayOld) Then
MsgBox "New table name array is longer"
intUBound = UBound(avarArrayNew)
Else
intUBound = UBound(avarArrayOld)
End If

For intX = 0 To intUBound
If intX <= UBound(avarArrayOld) And _
intX <= UBound(avarArrayNew) Then
If avarArrayOld(intX) <> avarArrayNew(intX) Then
MsgBox &quot;Table names not equal&quot; & vbCrLf _
& &quot;Old: &quot; & avarArrayOld(intX) & _
& &quot; New: &quot; & avarArrayNew(intX)
Debug.Print &quot;Table names not equal Old: &quot;; _
avarArrayOld(intX); &quot; New: &quot;; avarArrayNew(intX)
End If
ElseIf intX > UBound(avarArrayNew) Then
MsgBox &quot;Additional old table: &quot; & avarArrayOld(intX)
Debug.Print &quot;Extra old table: &quot;; avarArrayOld(intX)
Else
MsgBox &quot;Additional new table: &quot; & avarArrayNew(intX)
Debug.Print &quot;Extra new table: &quot;; avarArrayNew(intX)
End If
Next intX

dbsOld.Close
Set dbsOld = Nothing
dbsNew.Close
Set dbsNew = Nothing
End Sub

Good LucK!
 
Thanks for responding!

The code you have posted is super for the first bit. Very clear.

Now that we have checked to see that both the databases have the same table names, and the same field names, the next step is actually comparing the fields.

See it's not only about comparing. It's about putting the data in one DB into another. The relationships are the same, the fields are the same, all is similar. THe problem is the data is not the same.

So for eg. we take one entry, which is a thread made up of relationships between fields in different tables. Now 3 of the 9 fields might exist in the existing DB. So a NEW thread needs to be created that uses these 3 fields, but then appends the other 6 fields, BUT maintains the relationship.

Going on what you have said about SQL arrays, my guess is that we compare the two DBs - old and new. Then create a third array with all the &quot;different&quot; data using the keys of the &quot;INTO&quot; DB, where data is the same, then append this &quot;new&quot; data into the &quot;INTO&quot; DB.

So in summary, there's probably 3 procedures.
1. Check to see if DBs are congruent.
2.a. Make a list of fields that are different.
b. Make a list of fields that are the same.
c. Create a new third array which assigns all the
incoming data the keys of the existing data,
where they exist, or append them if they don't.
3. Insert/append this 3rd array to the existing database.

Am I on the right track? I am going to investigate SQL arrays in the meantime.

Thanks for your help! Now I know where to start.

Nathan
 
Since it sounds like you have multiple relationships defined, it will be very important to make the changes in the right order to ensure referential integrity is maintained. Are all of your relationships defined with cascade for both updates and deletes?

Check out the getrows method for a recordset. It will return an array for you given valid sql. Sorry I can't post in detail now but here is the basic idea:

Dim rds As Recordset
Dim dbs As Database
Dim strSQL As String
Dim avarArray As Variant
strSQL = &quot;Some valid SQL statement, query, etc&quot;
Set dbs = CurrentDB
Set rds = dbs.OpenRecordset(strSQL, other options, etc)
avarArray = rds.GetRows(100000) <= Large to get all

etc.

Good Luck!
 
A quick question to make sure I am understanding you correctly. Do you have additional data and/or additional fields between the databases? How we proceed will vary somewhat dependent upon the answers to the above question.

Different Data
Database1 Employee Table
Joe Smith
Andy Johnson
Sue Winters

Database2 Employee Table
Joe Smith
Andy Johnson
Sue Winters
Sally Jones
Phil Tanner

Different Table Definitions
Database1 Employee Table
LName
FName
SocSecNo
Birthday

Database2 Employee Table
LName
FName
SocSecNo
Birthday
HireDate
ManagerName

The simplest case is if all of the tables in both databases are defined the same (eg same tables, same fields, same field attributes). In that case, it is just a matter of appending new fields to the proper tables in the correct order.

If the table definitions has changed and the data is other wise identical then you do basically the same thing as above only instead of appending you update the new fields in the currently existing rows.

The most interesting case would be a combination of the above scenarios where we would be appending additional rows to the tables as well as updating new column information for the existing rows.

As far as referential integrity goes, you must make sure you start at the top and work your way down if rows will be appended to the tables. For example, say you have a CustomerID in the customer table, invoice table and invoice detail table and an InvoiceID in the invoice table and the invoice detail table. Additionally, say you had a relationship set up from customer table to invoice table to invoice detail table based on CustomerID and also had a relationship set up from invoice table to invoice detail table based on InvoiceID.

In the above case, customer table changes would need to be done first, invoice table changes second and invoice detail table changes last. You start at the highest level and work your way down. That way an invoice table update will be successful because the CustomerID will already exist, etc.

Just to make sure I don't misunderstand, in your next post please identify whether we are appending data rows or adding data columns, or both. Based on what you have posted thus far, I think we are appending data rows.

Have a great day!
 
Also, how many tables are we talking about here? What is the approximate largest number of rows in a single table? Do you need to update query definitions as well?

Sorry this is so disjointed, I can only take a quick look in and around other things.

Have a great day!
 
Hey there,

In answer to your questions:
1. We are appending data rows.
2. There are 10 tables.
3. We are appending approx 300 rows to an existing 700.

One thing I still don't get is that all the keys in db A are different to all the keys in db B (although all table defs and field defs are the same). So while the <data> might be the same, the <keys> are not. So similar data has to be assigned the new keys that correspond to the same existing data, whereas other data that does not exist in db B, has to do a count of the keys in that table and then add the new data on the end.

I understand the cascading fields bit and starting at the top. It took me a while to figure out when I was first importing the data.

Opening the database, running a SQL query and then getting the rows I understand (although bit rusty on syntax). It's the &quot;assigning similar data from DB A the keys of DB B&quot; and the &quot;counting no of keys in DB B and appending the other data from DB A that is not similar&quot;.

See I'm not really sure <where> or <how> the DBA data goes/ gets checked and transformed to be able to be slotted into DB B.

Every post you have added is extremely helpful!

Thanks!
Nathan
 
So to summarize, the table definitions are identical in both number of fields and individual field attributes such as type, size, etc. You need to combine 2 existing databases that are similar but have some differences into 1 database which has all of the data from both databases without anything that was in both of them being duplicated.
Am I understanding you correctly?

If you primary keys are defined as autonumber then the system takes care of that and you can't control it. If you are controlling it, then they can be modified to match. Can you post some examples of the types of data needing to be combined? If your primary key is autonumber, is there another field or fields which could give uniqueness?

For 10 tables, the easiest thing would be to screen print your relationships so you can set them up again later and then delete them while you do this process.

As an example of what we are going to try to do, create 2 identical tables named tblFamily1 and tblFamily2 with the following fields defined in them:

FamilyID - autonumber, primary key
FName - text
LName - text

Create tblFamily1 first and enter several rows of data, then copy it to tblFamily2 and delete all of the rows. Then enter in some data include in tblFamily1 along with some new data not in family one.

Then create a query with the following SQL:

SELECT tblFamily1.FamilyID AS Family1ID,
tblFamily1.FName AS FName1,
(SELECT tblFamA.FName
FROM tblFamily1 AS tblFamA
WHERE tblFamily1.FamilyID = tblFamA.FamilyID) AS FName1A,
tblFamily1.LName AS LName1,
(SELECT tblFamB.LName
FROM tblFamily1 AS tblFamB
WHERE tblFamily1.FamilyID = tblFamB.FamilyID) AS LName1B,
tblFamily2.FamilyID AS Family2ID,
tblFamily2.FName AS FName2,
(SELECT tblFamC.FName
FROM tblFamily2 AS tblFamC
WHERE tblFamily2.FamilyID = tblFamC.FamilyID) AS FName2C,
tblFamily2.LName AS LNameS,
(SELECT tblFamD.LName
FROM tblFamily2 AS tblFamD
WHERE tblFamily2.FamilyID = tblFamD.FamilyID) AS LName2D
FROM tblFamily1 INNER JOIN tblFamily2
ON (tblFamily1.FName = tblFamily2.FName)
AND (tblFamily1.LName = tblFamily2.LName)
ORDER BY tblFamily1.LName, tblFamily1.FName;

This SQL will bring back all of the information from both tables where FName and LName are equal. Once you have verified this to your own satisfaction, you can remove the parts that are redundant. As a result, you will have both table primary keys for the same data.

Using this as a basis, see if you can do the same type of things for the master tables you need to work with. If you have any questions, feel free to repost.

Good Luck!
 
Actually, this is much easier and you can verify it manually. I kind of went overboard on the last example.
Sorry.

SELECT tblFamily1.FamilyID AS Family1ID,
tblFamily2.FamilyID AS Family2ID,
tblFamily1.FName,
tblFamily1.LName
FROM tblFamily1 INNER JOIN tblFamily2
ON (tblFamily1.LName = tblFamily2.LName)
AND (tblFamily1.FName = tblFamily2.FName)
ORDER BY tblFamily1.LName, tblFamily1.FName;

Once you have something like the above setup, then you can select based on one key and update based on the other one.

Good Luck!
 
If you create an empty table tblFamily3 then the following code will insert the unique values from tblFamily1 and tblFamily2 into it. Do the same with your master tables and concatenate as many fields as you need to ensure unique values:

INSERT INTO tblFamily3
SELECT * FROM tblFamily1
WHERE tblFamily1.FName & tblFamily1.LName NOT IN
(SELECT tblFamily2.FName & tblFamily2.LName
FROM tblFamily2
WHERE tblFamily1.FNAME = tblFamily2.FName
AND tblFamily1.LNAME = tblFamily2.LName);

INSERT INTO tblFamily3
SELECT * FROM tblFamily2
WHERE tblFamily2.FName & tblFamily2.LName NOT IN
(SELECT tblFamily1.FName & tblFamily1.LName
FROM tblFamily1
WHERE tblFamily2.FNAME = tblFamily1.FName
AND tblFamily2.LNAME = tblFamily1.LName);

Once you have this done then post some field names so we can get the rest of the data changed as well. As always, repost if you have any questions.

Good Luck!
 
Oops! The code I just posted will insert what is in table1 But NOT in table2 and vice versa but will NOT insert what is in both table1 and table2. Do solve that problem, you can either make one of the tables the base table (eg copy it to tblFamily3 and then only run one of the SQL strings posted above) or you can run this last piece of SQL which will insert records which are found in both tables:

INSERT INTO tbltest
SELECT *
FROM tblFamily1
WHERE tblFamily1.FName & tblFamily1.LName IN
(SELECT tblFamily2.FName & tblFamily2.LName
FROM tblFamily2
WHERE tblFamily1.FNAME = tblFamily2.FName
AND tblFamily1.LNAME = tblFamily2.LName);

Good Luck!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top