General Design Question: Primary Key and Linking 6

Crowley16 · Feb 15, 2005

In situations where the primary key of a table consists of more than 1 field in a table, what would be the best way to deal with it:

1) Create a new primary Key, based on the fields in question?
2) Create a completely new primary key, probably autonumber or something?
3) Use All the relevant fields as a combined primary Key?

And, assuming that this primary key is also used as a key in another table (primary or foreign), how would you link the 2 tables, esp. if you have used multiple fields as primary key?

Please give reasons/examples for your answer, I'm considering making some rather large changes to some complicated tables and am trying to figure out any pitfalls in all the options I can take...

Thanks

--------------------
Procrastinate Now!

willir · Feb 15, 2005

My take is that if you are going to use a multi-key field for the primary key, you should have a good reason to do so. The most common reason is when you create the "joiner" table between two tables that have a many-to-many relationship. For example, patient and illness code...
[tt]
PatientID IllnessCode
1 Flu
1 Cold
1 Allergies
2 Cold
2 Bronchitus
[/tt]

...so the primary key is PatientID + IllnessCode. The primary key now creates a unique key for identifying the record, and easily links to the Patient and Illness tables.

But there is a problem. A person can get sick with the same illness more than once...

[tt]
PatientID IllnessCode InfectionDate
1 Flu 12/15/2001
1 Cold 01/30/2001
1 Cold 02/15/2001
1 Cold 01/10/2002
1 Cold 10/12/2002
1 Allergies 06/01/2001
1 Allergies 09/20/2001
[/tt]

...so now you want to have three keys together that you may want to make unique -- PatientID + IllnessCode + InfectionDate. But some may consider this as too complicated, and assign a unique independent key, say PatientIllnessID. And I would agree in this case.

What I would not do is...
- Create a dual field primary key where there is no reason. For example, on a contact table ConactLastName + PhoneNumber would probably be unique, but I would deffer to ContactID as the primary key. It makes sense when using a joiner table, but not here.
- Create a primary key on something that can change and wreck havic when changed. My classic example is using the GL Account number for a primary key. GL account number do not change very often, and using the GL account number as the primary key does make sense. ...BUT, when the GL Account number does have to be changed, all the transactions and registries that use the account number have to be updated too -- when you have 100,000's of transactions, this can be time consuming, and some may consider, dangerous. Rather, I would make the GL account number unique to prevent duplication, but still use another primary key, GLAccountID.

Note that you can still create an index on multiple fields to ensure uniqueness of the record AND use a different field for the primary key.

Hope I did not confuse.
Richard

BNPMike · Feb 15, 2005

I don't understand the difference between options 1 and 3.

In the relational model you can have as many keys as you like. The model says you should have at least one but relational databases like Access do not enforce that rule.

Ted Codd in his Extended Relational Model wanted there always to be a single field key (called the alias) buit this model never caught on.

If you have two or three fields that are together a natural key then by all means use that. SQL works just fine, and you can create indexes with multiple fields to maintain performance.

Creating a new (and therefore almost certainly artificial) field for a key means you now have another item to maintain which could complicate your business process. However it is often convenient to do so, so I often have tables with an autonumber id as well as a real key.

So I guess the bottom line is there is no right way.

Crowley16 · Feb 16, 2005

thanks willir, that's quite helpful...

The specific problem I have is that I have many tables, that all use the same 4 fields as a primary key, and they all link to a single reference table (not to each other) which also have those 4 fields, and one other, as a primary key.

These fields are a natural key, however when trying to link them, it gets quite complicated, so I was thinking of maybe creating another field that is a concatenation of the 4 pkey fields or some other id number, and using that to link to make things easier. But this would cause redundancy...

The other thing is, I'd like to enforce some sort of referential integrity on the fields, but can you do that with 4 different fields?

--------------------
Procrastinate Now!

SQLSister · Feb 16, 2005

Composite keys are usually a bad idea for performance reasons and for data storage reasons. It is far better to create a unique index on the natural key and use an integer field as the PK. Joins on integers are significantly faster in most databases. Additionally, it avoids the problem of what do when the key changes because it never will. Suppose your natural key is Company Name, Contact Name. Either of these can change multiple times, causing a major changes to the data. Now if you set up referntial integrity, the changes can happen automatically, but as the number of records grow the chances are that these changes will start eating up your processing time and may even time out. If you fail to set up referential integrity, you may have a data integrity problem.

Questions about posting. See faq183-874
Click here to learn Ways to help with Tsunami Relief

willir · Feb 16, 2005

Crowley16
Past life. (This was never my database) A database used a "natural" key for the porimary based on batch number = product code + production week + batch method + site + SequentialNumber. It made sense to make use of this "natural" or "smart" batch number from a management perspective, but from a coding perspective, it was a mess. SELECT ... WHERE ProdCode = "ABC" and ProdWeek = "L" and BatchMethod = "Drum" and "Site" = "MySite". As SQLSister said, very inefficient.

Then the decision was made to combine the relavent fields into the BatchNumber as one field. So now reports became messy. For product reports SELECT ... WHERE ProdCode = Left(BatchNum, 6) = ... --- you got it, reporting became messy. The owner of the database migrated from a flat file system to Access, and did not use the Relationship tool in Access. So all his joins were made through code -- again, messy.

So what happened afterwards? Various people would export the data to a spreadsheet application and even into other Access databases for analysis -- you got it, the creation of subsystems, and a lot of extra work.

My suggestion would have been to keep the required fields. If enforcing uniqueness for Product code, week, method, site and serial number, then uniqueness could have been enforced by defining an index. And then either use a a simple numeric field or the actual batch number as the primary key. Linkage / referential integrity would have been so much easier, and reporting ability would have been maintained. I actually redesigned the database and popuated it with some data to show how it worked -- all reporting and production requirements were met, and then some. ...But the issue became political. So much for progress.

However, since you appear to be in the driver's seat, you may want to go for it. Since you have access to the required fields, writing a conversion module should not be too hard.

BNPMike My point is that for a "joiner" table, using the two foreign keys as the primary key make sense. Your preserve uniqueness, and most queries will reference one foreign key or the other, but not both together. But creating a primary key based on two fields is do-able, but, from my perspective, queries become a tad more awkward. For the one table, this may seem okay, but when you create relationships with other tables, queries become a little awjward. Add a third key, and the complexity increases.

I realize that you have a wealth of experience and knowledge with large databases -- this is just my perspective.

I never knew about Codd's Extended Relational Model. Thanks for the info. Much appreciated.

SQLSister - Thanks for your comments. Which of Heinlein's book did you like the best?

Richard

BNPMike · Feb 16, 2005

SELECT ... WHERE ProdCode = "ABC" and ProdWeek = "L" and BatchMethod = "Drum" and "Site" = "MySite". As SQLSister said, very inefficient

I suspect that's hardly less fast than WHERE Newkey = "967522666". The entry in the index will be longer so you will have more pages for the same number of records but you tend not to access many index pages anyway. An extra i/o or two is hardly going to make much difference.

Crowley16 · Feb 17, 2005

hmm, SQLSister

I didn't know that integer joins are significantly faster, although it does sort of make sense when thinking about it.

however surely using only integer pk's means your pk's aren't (necessarily) meaningful to the table, as most id numbers tend to be alphanumerical...
I guess what I'm saying is, is it really worth the speed increase to create a purely integer pk, which is redundant, for a database that has, say, 1/2 a million records?
not too worried about the size of the db, there's plenty of space to work with...

--------------------
Procrastinate Now!

grande · Feb 17, 2005

If size isn't an issue, then here's what I recommend. It's not the fastest way, but it's still pretty fast, and it saves you creating a meaningless PK.

Have a concatonated PK, BUT, keep the other fields. Yes, there is redundant data in the record, which is a normalization problem, but it keeps things a lot simpler in the long run. You won't have to use Mid() or Left() to get data from the PK, because it's already in other fields.

Once again, this is only a good idea if you do, in fact, have memory to spare.

BNPMike · Feb 17, 2005

With Jet I recommend you concentrate on the logical model and not on trying to limit space. Jet uses something like six different strategies to perform a join so be careful about second-guessing the impact of field format changes.

If you find ignoring space issues causes a problem then migrate to a bigger database eg MS SQL Server.

If you still have a problem you must be building some serious professional application so you don't need my advice...

RoyVidar · Feb 17, 2005

One of the amusing things in the natural vs surrogate key debate, is that one is constantly reminded that surrogates make it so much easier on the developer, the SQL gets so clean and simple (that even a developer is able to write it)... but in a lot of cases, the only thing the user can use when trying to search the database, is the composite candidate key. So to fetch all records from the child table mathcing the criteria, the previous SQL might end up looking something like this:

[tt]SELECT ...
FROM tblParent P INNER JOIN tblChild C ON p.SurrogatePK = c.SurrogateFK
WHERE P.ProdCode = "ABC" and P.ProdWeek = "L" and P.BatchMethod = "Drum" and P.Site = "MySite"[/tt]

So to get at the correct child record(s) - or retrieve the correct surrogate key to use in the join, one need to "look up" the candidate key in the parent table – now, that's efficiency;-)

I see no problem with using a three field composite PK in Willirs first sample, as long as the junction table isn't a parent table to other tables, so the composite PK needs to be propagated to child tables. Even a lot of surrogate key fans, prefer using a composite primary key in junction tables. See for instance r937's answer here Composite Keys in M:N Relationship (you may have to register to enter that site, but I believe it's free).

Here's one quote, if registering isn't an option:
"The only time it makes sense to use a surrogate key is when the relationship table itself has child tables, and the child tables have many rows, and the queries using the child tables are complex. Relationship tables with child tables do occur, but they are rare. Unless you have a compelling reason, use composite keys."

I don't think I've gotten enough information here to decide upon what I would use, but I think, based on the complexity of the structure, I'd proabably include surrogate key in the considerations. Some of the things I'm not entirely sure of is the number of records, but that's another discussion than design - with more than half a million records, and a complex structure - BNPMike's suggestion of another database seems also to be worth considering (the free alternative to SQL-Server, MSDE?).

When I'm in a situation where records in multiple tables can be identified with the same PK structure, I suspect there might be flaws in the design, that there may be something so similar, it might perhaps be recorded in the same table, using a status switch to determine the difference between them, but of course, that's only a suspicion.

Creating an extra concatenated field, derived from the fields of the current natural key, what would that be - a supernatural key? - reduncancy yes, violating normal forms yes, recommendable - perhaps if you're into SM, love headaches and hate leaving work when business hours is over, you may consider that approach;-) I'd say either natural composite or surrogate. I guess BNPMike hit the nail pretty much on the head with "So I guess the bottom line is there is no right way.".

BTW - here's one article that might amuse KEY WARS: EPISODE 1

Roy-Vidar

willir · Feb 17, 2005

Thanks Roy, Mike, SQLSister, Grande and Rudy (r937, indirectly) for your thoughts on this debate. Excellent articles Roy - thank you.

So...
- We tend to agree that a "natural" / composite key makes sense in a "joiner" / "junction" / "intermediary" table.
- A surrogate numeric key is easier to use when writing SQL statements involving more than one table, and takes up less space than a composite key. (For example, One table with 3 fields + 1 surrogate and 3 related tables with the surrogate numeric key as an fk vs. 4 tables without the surrogate numeric key but will have all or part of the 3 "natural" fields as the fk)
- Being able to query a child table directly using part of the composite key as the fk is easier and faster since you can query the child table directly rather than linking the parent table to find the surrogate key and link to the child table with the surrogate numeric key as the fk.
- It is messy when a component of the composite key has to be changed. I did not see this point referenced in the linked dicussions, but I have run into it several times.
- Which to use - surrogate or composite / "natural" keys will depend on the needs and demands of the database and the database schema .

Richard

r937 · Mar 2, 2005

you're welcome

funny thing, i'm on tek-tips several times a day but i don't ever come into this forum (Access Tables and Relationships)

i found this thread via google today while searching for something else regarding natural versus surrogate keys

imagine my surprise to see my own article cited

thanks guys

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)

willir · Mar 6, 2005

Ain't it nice to be googled? A star for you too Rudy

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

General Design Question: Primary Key and Linking 6

Crowley16

Technical User

willir

MIS

BNPMike

Technical User

Crowley16

Technical User

SQLSister

Programmer

willir

MIS

BNPMike

Technical User

Crowley16

Technical User

grande

Programmer

BNPMike

Technical User

RoyVidar

Instructor

willir

MIS

r937

Technical User

willir

MIS

Similar threads

Part and Inventory Search

Sponsor