Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Software To Automatically Discover Table Relationships? 2

Status
Not open for further replies.

BenSCooper

Programmer
Sep 28, 2009
15
GB
Dear All,
I've taken over a long established Access database with approx 400 tables. All of these tables (I think) have primary\foreign keys columns defined, but for whatever reason the actual relationships between tables were never created. I'm now in a situation where nobody at the company knows exactly how the tables link together without looking at queries\code to figure it out.
I want to actually define these relationships correctly, but first need to figure out how the tables link together. I could do this through a tedious process of analysing the existing db queries to figure it out, but it would be a great help if anyone knew of software that would do this automatically. Probably a long shot I know, but it will save me weeks if there is a product out there to do this.

Thanks in advance
Ben Cooper
 
Ben,
I developed a FAQ that would allow you to do this very easily. See FAQ700-6905. The purpose of this was to document tables and relationships to post on Tek_tips. But if you run the function documentQueries it loops through all the query definitions in the database. From that it pulls out a lot of information to look like this:

Name: {22726D22-DB47-4631-ABB9-920C01E1D9FD}
Table: datTblPersonnel
Foreign Table: joinTblStdyPrjRoles
PK: autoPersonnelID FK:intFKeyPersonID

1)It was not designed to do what you are asking, but some simple modification you could save:

ParentTable
ForeignTable
ParentTablePK
ForeignTableFK

2) I would then push that information into a table.
call that "tblRelationships"

3) Create a distinct query to return unique records from tblRelationships.

4) now loop through tblRelationships and programmatically create your relations.

5) the above is easy because you have all the information

Public Sub relations()
currentdb.CreateRelation(name,table,foreigntable,attributes)

6)The only part that is impossible is setting "enforce referential integrity, cascade updates, and cascade deletes". There is just no way to know the intent from existing data. I guess you could run the query and see if there are any instances where referential integrity is violated, but that does not tell you if it was intentional or inadvertant.

Not sure of your level of coding. If you need help come back, but it should be relatively easy.

Do this first in a copy of the database. Then you can click on all the relationships and determine if you want to enforce referential integrity.
 
Thanks Maj, that's some great info and with a bit of tweaking should save me a heap of time. Apologies for the slow response, I've been sidetracked onto another project. Hopefully I'll find time to give it a try sometime soon...

Ben
 
MajP,
I've finally found time to try your solution and have hit some issues.

Because my database doesn't have any relationships defined "DocumentRelations" doesn't return anything to the immediate window. Also, "DocumentQueries" doesn't output any info on what relationships are defined in the query and I'm not sure how to get at this info. Do you have any ideas?

Thanks again
Ben C
 
Yeah, I think I steered you down the wrong path. I gave you information from the relations object, but as you said they do not exist. Google "mdbdoc". This is a utility that can do a lot of database documenting.

However, if there is no defined relationships, I just do not know if there are any properties that you can pull to help you out. AFAIK, the querydef object does not even store the join type which could help some what in this. Atleast that would be an easy way to find which fields are joined to which fields in a query. Then you would have a list to start with.

I will take a look and see if there is any property that can give some information. I think if I had to do it now, I would loop through the qrydef collection and parse the sql string.
Example:
SELECT Categories.CategoryID, Categories.CategoryName, Products.ProductName
FROM Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID;

Using a few functions I think you could figure out the tables involved, and then search for the joins and figure out the fields involved in the joins.

Using a string manipulation like that would likely be able to give you a 100% solution but probably pull a lot of data. I would think from the above query you could pull

Table1: Categories
Table2: Products
Join Type: Inner Join
Join Fields: Categories.CategoryID = Products.CategoryID;
 


I think you can get a 99% solution from the hidden system tables:
mSysObjects
mSysQueries

mSysObjects lists all the db objects. The important fields are:
id (unique ID)
name (object name)

mSysQueries lists the query properties by attributes. The important fields are
attribute (see below table for some of the attributes)
expression (how it is written in the query)
Name1 (depends on the type of attribute but for a field/table it is its name)
Name2 (it is the alias name if used)
objectID (foreign key to MsysObjects)


To make things a little easier to read build this table, "tblAttributes":
Code:
attributeID AttributeName
6           Field Name
5           Table Name
7           Join Expression
8           Criteria Expression
9           Group By Expression
11          Order By Expression
0           Query Name

since 9 is the attribute of a query with a join expression you can build this query.
qryHasJoin
Code:
SELECT DISTINCT 
 MSysQueries.ObjectId
FROM 
 MSysQueries
WHERE 
 (((MSysQueries.Attribute)=7));
returns the IDs of all queries with a join expression

Then you can get the
Table names, field names, and the join expression with this query:
Code:
SELECT 
 MSysObjects.Id, 
 MSysObjects.Name, 
 MSysQueries.Expression, 
 MSysQueries.Attribute, 
 MSysQueries.Name1, 
 MSysQueries.Name2, 
 tblAttributes.AttributeName
FROM 
 qryHasJoin 
INNER JOIN 
 ((MSysObjects 
INNER JOIN 
 MSysQueries 
ON 
 MSysObjects.Id = MSysQueries.ObjectId) 
INNER JOIN 
 tblAttributes 
ON 
 MSysQueries.Attribute = tblAttributes.attributeID) 
ON 
 qryHasJoin.ObjectId = MSysObjects.Id
WHERE 
 (((tblAttributes.attributeID)=0 
Or 
 (tblAttributes.attributeID)=5 
Or 
 (tblAttributes.attributeID)=6 
Or 
 (tblAttributes.attributeID)=7))
ORDER BY 
 MSysObjects.Name, 
 MSysQueries.Attribute;

You can use a report to sort and group it to give you a printable answer
 
Wow, that's given me exactly what I need! :) I can confidently say I wouldn't have managed that myself, but have learnt a lot about the Access system tables in the process.
Course I've still got 1200 joins to plow through, but just getting to this point would have taken me weeks manually.

Thanks so much for the help. Hopefully this will be of use to some other poor unfortunate too...

Ben
 
If it was me I would build a form like the "edit relathionship form". The form you get when you click on the edit relationship

It would have
4 combos
cboLeftTable
cboRightTable
cboLeftField
cboRightField
1 text box
txtBxJoinExpression

3 Check boxes
chkEnforceReferential
chkCascadeDelete
chkCascadeUpdate

and the command buttons.

Now use the query to run a make table query and make a new table. Now you can add a "relationship completed field". To keep track of what joins you have reviewed.

Now loop the table reading each object ID.
read the join and show it in the txtBox
figure out the left table and set the value of the left combo
figure out the right table and set the value of the right combo
figure out the left field and set the left field combo
figure out the right field and set the right field combo

No you can look at the join and possibly determine if you want to create a relationship. If so check your choices and hit Ok and create the relationship.
Mark the "relationship completed field"

Read next record and populate the form...

Even without the looping you would have a form that would allow you to quickly create relationship.

You could even just simply make this a sub form and have the original query be the main form. You would then have all the information visible to quickly make relationships.
 

Here is a demo for making a "wizard". It steps through each join statement in the database. This includes any saved query, form query, or query used in a control

it shows the Join statement, and defaults to the correct table names and fields names based on the join statement. Then allows you to create the relation. You could create 100s of relations within a few minutes.

1zva2ic.jpg


The problem that I cannot figure out is that although the relations are created and can be verified in the MSysRelations table, they do not appear graphiclly in the Relationship window. I will try to figure this out.

Take a look at some of the additional queries I added. They make what you want to do a lot easier.
 
Create relationships from joins is a promising approach. I know rational rose software does allow you to reverse engineer a database and creates relations based on same field names or similar field names, for example allowing prefixes like PKName and FKName or suffixes.

It's much better to determine relationships from the queries and their ON condition.

I think noone and no program can determine the nature of a relation in terms of whether to cascade deletion to child tables or not, whether to enforce referential integritiy by checking existance of a parent key when used as a foreign key or allowing a foreign key to be optional, eg NULLable. You have to determine these things manually from the semantics of the tables.

This can be very hard for such a large database with 400 tables, especially if there is no description, but that's no job that can be automated I fear.

Good luck.

Bye, Olaf.
 
Olaf,

Not sure if you read my posts, but I already made the point that this can not be automated. For example just because you can show that two tables are joined, and there is a matching record in table 2 that does not mean the intent was to enforce referential integrity.
However, I think I have successfully demonstrated a way to speed up the analysis and create the relationships by several orders of magnitude. It will give you pretty close results. This is assuming you have already developed a large set of queries in the database (as stated by the OP). I believe this wizard approach would make even the analysis on doing 400 tables take min/hours vice days.
For example if you look at the image, there are six records. This is done on the Northwind sample database. It correctly identified 6/7 possible relationships. The missing relationship is not used in any query. If I have some knowledge of my business rules I could create a relationship between customers and orders in seconds.

However, thanks for giving me an idea that I had not considered adding a second analysis tool: Look for possible joins based on naming conventions. That would be the best approach if you had a bunch of tables, but no/few queries to start with. If I get around to it I will write the code for that. I would be easy. Pseudocode:

for each table in tables
for each field in table.fields
for each table2 in tables
if not table2.name = table.name
for each field in table2.fields
compare field1 to field2
'lots of possible ways to include
'instr, soundex, like to compare
if match write table names and field names
to new table
end if

next
next
next

(The code can be sped up by doing a single comparison of the tables. Instead of 400X400, just compare once which is 400+399+398...+1 comparisons)

There are a couple of features that could be added. If you try to enforce referential integrity on a tables that already violate referential integrity this code now throws an error. I could check before hand if the tables are currently candidates for referential integrity and present that information to the user. Then the user can go back and determine if they need to delete orphans so that they can enforce referential integrity.


Now I need to figure out how you create a relationship and get it to show in the relationship window. Any idea? I am confident the relationships are created in MSYSRelationships.
 
Actually you really only need to compare primary keys to other keys, and you therefore would want to check both directions. So the code would be more

for each table in tables
for each field in table.fields
if field is primary key
for each table2 in tables
if not table2.name = table.name
for each field in table2.fields
compare field1 to field2
'lots of possible ways to include
'instr, soundex, like to compare
if match write table names and field names
to new table
end if

next
next
next
 
PHV,
Not sure of your question. I clearly showed that if you have a database with queries I can pull out the joins, show the tables and related fields. Single field or composite.
And quickly allow the user to create a relation.

My assumption would be if you were dumb enough to build a database without any relationships, but using composite keys, and have not created any queries yet, then good luck and start digging through your tables.

However, it would not be hard to modify the second method as well.
Loop the table
find the primary keys.
If only one primary field continue as described
If more than one primary key you have a composite key
modify the search to attempt to locate N similar fields.
Report all possible tables with 1 to N similarities.

Another tool now that you made me think of it would be to
1)Loop the forms collection
2)Return any subforms
3)determine the linkmaster, and linkchild properties, the parent recordsource, and the child recordsource.

Another means for determining a candidate relation.

This is one of those things you cannot fully automate, but make decision tools to make things orders of magnitude easier.
 
Here are the functions you need find similar named fields in the database the are candidate relations. Does not completely handle composite keys.

Code:
Public Sub IdentifyLikeFields()
  Dim db As DAO.Database
  Dim tblOne As DAO.TableDef
  Dim tblTwo As DAO.TableDef
  Dim fldOne As DAO.Field
  Dim fldTwo As DAO.Field
  Dim strSql As String
  Dim strLikeNameType As String
  Set db = CurrentDb
  
  For Each tblOne In db.TableDefs
    'At this point the code to handle a single PK
    If numberPrimaryKeys(tblOne.Name) = 1 And Left(tblOne.Name, 4) <> "MSYS" Then
        Set fldOne = getPK(tblOne)
         For Each tblTwo In db.TableDefs
           If Not tblOne.Name = tblTwo.Name And Left(tblTwo.Name, 4) <> "MSYS" And tblOne.Name <> "tblLikeFields" Then
             For Each fldTwo In tblTwo.Fields
                 strLikeNameType = LikeNameType(fldOne.Name, fldTwo.Name)
                  If strLikeNameType <> "NoMatch" And tblTwo.Name <> "tblLikeFields" Then
                    Debug.Print tblOne.Name & "." & fldOne.Name & " " & tblTwo.Name & "." & fldTwo.Name
                    Debug.Print strLikeNameType
                    strSql = "Insert INTO tblLikeFields (tblOne,tblTwo,fldOne,fldTwo,matchType) Values ('"
                    strSql = strSql & tblOne.Name & "', '" & tblTwo.Name & "', '" & fldOne.Name & "', '" & fldTwo.Name & "', '" & strLikeNameType & "')"
                   'Debug.Print strSql
                    CurrentDb.Execute strSql
                  End If
             Next fldTwo
           End If
         Next tblTwo
    Else
      'future code to handle composite keys
    End If
  Next tblOne

End Sub
Public Function isPK(tblDef As DAO.TableDef, strField As String) As Boolean
  Dim idx As DAO.Index
  Dim fld As DAO.Field
  For Each idx In tblDef.Indexes
    If idx.Primary Then
      For Each fld In idx.Fields
        If strField = fld.Name Then
          isPK = True
          Exit Function
        End If
      Next fld
    End If
  Next idx
End Function
Public Function numberPrimaryKeys(tblName As String) As Integer
  Dim db As DAO.Database
  Dim fld As DAO.Field
  Dim tblDef As DAO.TableDef
  Set db = CurrentDb
  Set tblDef = db.TableDefs(tblName)
  For Each fld In tblDef.Fields
    If isPK(tblDef, fld.Name) Then numberPrimaryKeys = numberPrimaryKeys + 1
  Next fld
End Function

Public Function LikeNameType(nameOne As String, nameTwo As String) As String
  LikeNameType = "NoMatch"
  If nameOne = nameTwo Then
    LikeNameType = "ExactMatch"
  ElseIf nameOne Like "*" & nameTwo & "*" Then
    LikeNameType = "FieldOne Like *FieldTwo*"
  ElseIf nameTwo Like "*" & nameOne & "*" Then
    LikeNameType = "FieldTwo like *FieldOne*"
  ElseIf Nz(InStr(nameOne, nameTwo), 0) > 0 Then
    LikeNameType = "FieldTwo in FieldOne"
  ElseIf Nz(InStr(nameTwo, nameOne), 0) > 0 Then
    LikeNameType = "FieldOne in FieldTwo"
  ElseIf Soundex2(nameOne) = Soundex2(nameTwo) Then
    LikeNameType = "FieldOne sounds like FieldTwo"
  End If
End Function

Public Function getPK(tblDef As DAO.TableDef) As DAO.Field
  Dim fld As DAO.Field
  For Each fld In tblDef.Fields
    If isPK(tblDef, fld.Name) Then Set getPK = fld
  Next fld
End Function

' Computes the "Soundex" value of a string.
' This version produces exactly the same results as the Soundex
' function of Microsoft SQL Server 2000.
' Author: Christian d'Heureuse, chdh@source-code.biz
Public Function Soundex2(ByVal s As String) As String
   Const CodeTab = " 123 12  22455 12623 1 2 2"
   '                abcdefghijklnmopqrstuvwxyz
   If Len(s) = 0 Then Soundex2 = "0000": Exit Function
   Dim c As Integer
   c = Asc(Mid$(s, 1, 1))
   If c >= 65 And c <= 90 Or c >= 97 And c <= 122 Then
      ' nop
    ElseIf c >= 192 And c <= 214 Or c >= 216 And c <= 246 Or c >= 248 Then
      ' nop
    Else
      Soundex2 = "0000"
      Exit Function
      End If
   Dim ss As String, PrevCode As String
   ss = UCase(Chr(c))
   PrevCode = "?"
   Dim p As Integer: p = 2
   Do While Len(ss) < 4 And p <= Len(s)
      c = Asc(Mid(s, p))
      If c >= 65 And c <= 90 Then
         ' nop
       ElseIf c >= 97 And c <= 122 Then
         c = c - 32
       ElseIf c >= 192 And c <= 214 Or c >= 216 And c <= 246 Or c >= 248 Then
         c = 0
       Else
         Exit Do
         End If
      Dim Code As String: Code = "?"
      If c <> 0 Then
         Code = Mid$(CodeTab, c - 64, 1)
         If Code <> " " And Code <> PrevCode Then ss = ss & Code
         End If
      PrevCode = Code
      p = p + 1
      Loop
   If Len(ss) < 4 Then ss = ss & String$(4 - Len(ss), "0")
   Soundex2 = ss
   End Function

You could further beef up the likeNameType function and come up with some new rules.

Running this on Nwind finds all the candidate relations except One.
ShipVia and ShipperID in tables orderDetails and Shippers.

The output is (not so interesting becomes they are exact matches):
Code:
tblOne	tblTwo	fldOne	fldTwo	matchType
Suppliers  Products        SupplierID SupplierID ExactMatch
Products   [Order Details] ProductID ProductID ExactMatch
Orders     [Order Details] OrderID OrderID ExactMatch
Employees  Orders          EmployeeID EmployeeID ExactMatch
Customers  Orders          CustomerID CustomerID ExactMatch
Categories Products        CategoryID CategoryID ExactMatch

With these two tools finding the majority of candidate relations should be pretty quick.
 
WOW!!!

Since few of the databases I have worked on (including ones I created!!) have all of the relationships properly documented in the real relationships process, This could also be used to update even apps which one (I!!) think I am familiar with.

I know that I have, for instance, set up ad-hoc queries to answer some supposedly transien issues and later found they were not so transient after all.

Also, this tool set would be of immense value in translating an MS Access db to another platform.



MichaelRed


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top