Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Integer or Character as Primary Keys

Status
Not open for further replies.

stanlyn

Programmer
Sep 3, 2003
945
US
Hi,

In my previous thread on comboboxes several contributors starting talking about the behavoir of comboboxes in relation to the data types, (char vs int). This has raised several question for me and would like your feedback...

My previous apps never really had a primary key designated as such, instead I used a field like "acct_id" that held user data and served as a primary key. I'm thinking about having a field designated just for the primary key named something like "key_primary" or "key_parent" and have another field serving as a foreign key named "key_foreign" or "key_child". This naming eliminates any confusion on what the key represents, parent or child. This way children never gets orphaned, no matter what the real world values are or gets changed to. Previously, if I changed the value of "acct_id", I'd first have to change the children's key and then change the parent. That need to update the child records would be eliminated if I used dedicated primary and foreign key fields.

Now, integers or characters is the question? I like integers because they are unique and easily human readable/useable. I like guids because they are guaranteed unique without consulting other resources. I hate guids because I cannot remember them and they are only copy/pasteable... Forget about writing them down...

And, is it better for us to generate the primary id at the client with guid() or at the server with antoinc?

So what are the pros and cons for each, whether we use vfp or sql from a vfper's perspective?

Like to hear your thoughts,
Stanley

 
Good idea to make this a follow up question.

I already mentioned I use guids. Many tables still have int IDs on top of GUIDs, as old systems still refer to records via the int IDs. Also some ean12 barcodes make use of Integer IDs. In the age or QR Barcodes you can also encode GUIDs, but that's also a step need to be done once with a grace period of both types of barcodes.

In regard to your naming idea, there are two standards, naming both IDs as you do already AccountID, UserID, CustomerID, etc. Another standard is naming primary keys ID only, as they are fields of the table, the full name is Account(s).ID, User(s).ID, Customer(s).ID (besides there are two standards of singular or plural in table names).

Anyway, "key_foreign" or "key_child" is a bad name, unless you don't mean it literally. You will have situations with multiple foreign keys in your table and they should refer to the parent in their name already.

There always is just one primary key, so that can be named ID, the con against also using a special name for primary key is, it looks as a foreign key and is only clearly the primary key by technical definition of primary index, or by the convention to have it as first column.

So that's not the discussion you wanted so far, as I rather addressed your plan to name these fields than what type to use. I already mentioned some reasons, you already gave your own preference. MSSQL also offers Integer Identity and uniqueidentifier with newid() default value, which you can also get by choosing rowguid.

There are ways to overcome the hard to remember sequence: Store important ones in a table syskeys or in a header file as constants. And ints also have a disadvantage of meaning different things, depending on what table they come from. You can misplace an ID value into the wrong foreign key field and thereby cause relations you didn't wanted, even though they still are valid relations pointing to existing parent records.

Depending on the DB to generate the key also has pros and cons. Distributed Computing should not depend on a single node. I had problems with disconnected system (notebooks in a lab not in LAN, nor WLAN) and guids were the solution, though we tried a mechanism of reserving int IDs with e newid function incrementing the counter by 100 or 1000, so a client could use 100 or 1000 ids before referring back to the server. It sounds nice, but is easily sabotaged, no matter if by intention or not. Guid has the big advantage, it can be generated anywhere, also on a disconnected system, and doesn't collide.

I tend towards guid for all reasons given, there are needs enforcing integers sometimes, like barcodes or downward compatibility. There is no absolute and single winner, it's always an individual decision. Of course once decided the type should be consistent throughout a database, I'd not stop using guids, just because some tables are very short.

Bye, Olaf.
 
Hi olaf,

>> So that's not the discussion you wanted so far
Not so... as my questions are exploratory in nature to develop a sense of what is best practice, when to use and not, where to use and other pros and cons. Discussion with vfpers...
.
So, creating a primary key of type guid as the logical 1st field with a name of "id" would be good practice. And, don't use any special foreign key fields, instead let the data fields server as the foreign key to their respective parents.

I didn't look far enough, down to a relationship that could be part of another relationship. I had always thought a child can only have one parent. But a child can have relationships with others, of which a single foreign key would most likely be meaningless to the other relationships. Thanks for suggesting that.

Thanks for sharing your experience of the integer blocking to a user and the problems you encountered. Good to see you like generating the primary key at the client, and yes, it solves the disconnected data issues.

Barcodes, should that be handled as a separate field as I'll be using the ean barcodes now that they supersede the upc codes. I've set its field length to 13 to hold their values. I's seen some suggesting 12 and 15, depending if check codes are present. Suggestions please...

>> There are ways to overcome the hard to remember sequence: Store important ones in a table syskeys
What would you consider important ones that needs saved somewhere? How do you implement and for what?

Thanks,
Stanley



 
Hi,

Is it customary to keep the leading and closing brackets in the guid, or strip them out? Ant pros or cons?

Thanks,
Stanley
 
> And, don't use any special foreign key fields, instead let the data fields server as the foreign key to their respective parents.
Who said that? Of course you'll use primary keys as foreign keys. Always. That's the consequence of defining the primary keys separate from net data, to use them as foreign keys, of course also separate from net data. What in my text led you to your conclusion?

>I had always thought a child can only have one paren
Even if you take this from the natural meaning of the word, there would be two.

You don't have to overestimate the meaning of "parent" as inheritance. This is not necessarily a hierarchical parent or header. Consider persons and addresses and a table personsaddresses. This one will point to persons and their addresses with two foreign keys PersonID and AddressID. It's a relation table, not really an object being a child of peoples and addresses, nevertheless an easy to see sample of a table having two foreing keys. This use of the terms parent/child has nothing to do with the oop inheritance chain. Parent tables are merely the table where the primary key coms from and child tables, where it is stored as foreign key. And that is easy to remember as if an ID (your personal ID document) identifies you in a foreign country, where it is a foreign ID.

Theres an even easier to see case where a table has multiple foreign keys. Think of attributes, which you don't denote with a simple type as string or number, but as an item picked from a list of possible values. As I already mentioned for example a list of product types. Add a list of vendors, a list of colors and a product will have a ProductTypeID, a VendorID, and a ColorID to point to the list item of these lists. And so a product has three parents, which indeed are just attributes of the product. The vendor could be seen as a parent in the natural sense, but not so the type or color, yet the primary/foreign key mechanism used is the same. Database relations are any relations, not just inheritance or hierarchy relations, any relations. And which table points to which is decided by the number of records. One vendor has many products, so you can't have a productID in the vendor table, that would limit the vendor to have one product. Of course the product points to the vendor. You can't let the colors point to a product, because then only one product could have this color. So again the product points to the color, not the color to the product. Who points where is not decided by the importance of records, but by these numbers. And in the previous case of persons and addresses both persons can have multiple addresses and addresses are the home or working place for many persons. And in that case you need a third table pointing to both and relating them.

So you only ever have to think about what has many of which and then it's clear where a foreign key goes and it's quite normal you have many or none. Only one foreign key may not be seldom in small databases, but data is typically designed to be related, we're talking about relational databases, so in mid size to large databases only the fewest short item lists have no foreign key and are just referred to and most tables are related with two or more others. Seems you haven't done this at all so far. Have a read on normalization and the four normal forms of databases which besides primary/foreign key also cover where which attributes will go.

Other questions:

>Barcodes, should that be handled as a separate field
In my case it's even a bit more complicated, the codes are composed of sometimes one, sometimes two primary keys and a sequence number. They are not stored but5 generated from the choice of the item(s) and decodded to refer to the items. This makes the printed barcodes foreign keys not stored in other tables but stuck on real objects. In my case I don't have to do with foreign barcodes, but with self defined, so the situation is of course different, your source of barcodes is not a primary key of some table, you get them generated or scan them to get them. I'd not make a UPS barcode a primary key, that's data or a foreign key to UPS data, which is out of the bounds of your database.

>>syskeys table
>What would you consider important ones that needs saved somewhere? How do you implement and for what?
Totally depends on your situation. Eg I have one for german stored there. Anything that is special and needs to be referred to often. If there is a special handling for a certain product type, eg special packaging, you'd put that product type guid there, so you don't hardcode the mere value into your application. So you see, it depends, there is no rule for this.

>Is it customary to keep the leading and closing brackets in the guid
That's a so unimportant detail question you could skip for when you start doing it, but it has a simple answer. I store them as mssql newid generates them, and that is without brackets, eg 33A23FA7-B2F9-4487-A23F-A844EDDDF8C5, but you could add these, you could remove these, so you always will be able to compare them. MSSQL actually stores them as 16 byte binary, not as 36 chars, but MSSQL does implicit type conversion of '33A23FA7-B2F9-4487-A23F-A844EDDDF8C5' to 33A23FA7-B2F9-4487-A23F-A844EDDDF8C5, so you get them as C(36) string in a passthrough result cursor, not as Q(16) and you can also compare guid field with such a string. On the MSSQL side that is case insensitive, on the VFP side of course case sensitive.

Bye, Olaf.
 
My preference is as follows:

- Always use integer for your primary keys.

- In every table, give the primary key the same easy-to-remember name. I use "ID".

- Use the AutoInc data type for primary keys.

- For foreign keys, use the name of the table to which the key relates, plus "_ID", for example, "Customer_ID".

When it comes to comparing and sorting, integer is just about the most compact and the most efficient data type you can have. A 32- or 64-bit integer is the closest thing to a native data type for the processor. By contrast, GUIDs are extremely inefficient, because they take up so much space and take a lot of time in sorting and indexing operations.

By using an AutoInc for the primary key, I don't need any special code to generate the key, and there is no risk of two different processes generating the same value.

By always naming the key "ID", I never have to stop and think what the name is. I know this doesn't sound like a big deal, but I can't tell you how much time it saves and how often it prevents avoidable errors in coding.

Final piece of advice: Give every table a primary key, even if it doesn't strictly need one. This is much easier than constantly having to remember whether a given table has a primary key or not.

Hope this helps.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
About sorting: A sequence number key is tempint to use for chronological order. In DBFs we have that (mostly) via the physical order of data in the dbf file. In other datbases records are stored in nodes of pages, whatever is free because of delted data, so it's in no particular order. If you want chronological order I recommend a timestamp field, no matter if that is a datetime or a int sequential number, like the timestamp type of MSSQL. Sorting on primary keys is of course broeken, if you sometime need to go to another type.

By the way, I recently wrote faq184-7743, which shows a way to use autoinc for primary key without needing it in the table itself, making use of the stable generator, avoiding append problems and some more. But this doesn't free your client from being attached to the database to call into this stored procedure id generator.

Bye, Olaf.

 
When I talked about sorting, that was perhaps a bad choice of word. I didn't mean pysically sorting into a different order. I really meant that integers are much better for index keys. Indexing involves making a lot of tests - for equality or greater or less than - and it's much more efficient to compare two integers than to compare two longer character strings.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
OK, but still, if you sort persons by lastname you neither sort by int or guid key and you can't choose a fast type for indexing, you have to index on what you want to sort by. A Name sorting might be made faster by indexing LEFT(lastname,4), but it will sort inaccurate, of course.

Bye, Olaf.
 
I name my keys similar to how Mike names his. All keys are named K_nameoftable, for example, K_customer. And like Mike noted previously, every table has a defined primary key even if it is not currently used. Once I started naming all of my keys this way, I no longer had headaches trying to remember what is an index key and what is not nor do I have collisions between data and keys.

I don't use integers at all for keys. Since my tables are relatively small (~100,000 records), there is no noticeable time penalty for using character keys.

mmerlinn


Poor people do not hire employees. If you soak the rich, who are you going to work for?

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Raymond
 
We're a bit running in circles, Mike, but you don't sort on primary keys.
Your argument of shorter and faster indexes holds for join conditions, not for sorting purposes.

Bye, Olaf.
 
Olaf, please forget I ever mentioned sorting. My point was that integers are usually an efficient choice for operations involving comparisons. Building indexes, retrieving data from indexes, selecting records according to a criterion, and indeed joins - these all involve comparing values, in one way or another.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike's answer is close to what I would have written. I use autoincrementing integers as PKs; I call mine iID, because my standard is to include the data type in the field name.

Foreign keys are iSomethingID, where "Something" describes the role of the referenced record. Most often, it's simply the name of the referenced table, but there are some situations where that doesn't work. For example, consider an employee table with a field for the Manager, who is also an employee. That field would be iManagerID.

Tamar
 
I also use what Mike describes when it's a DBF system. For naming, I tend to fall back on the Codebook naming conventions. (The first character of a field name gives its data type.)

Do I do it that way because I believe in it fervently? No. It's habit.

For a SQL database, I use GUIDs. It's easy. (And correspondingly difficult, because NOTHING in this business is ever a clear tradeoff.)

But the main thing I try to do is to pick a standard and STICK TO IT. Consistency pays off.



 
So, lets consider...

Customer table fields
ID(I) is the PK and autoinc,
name(c,30)
address(c,30)
other fields........

Orders table fields
ID(I) is the PK and autoinc,
order_num(c,12)
order_date(t,8)
order_amt(n,9.2)
customer_pk(I) is a foreign key that links to the customer's ID field
carrier_pk(I) is a foreign key that links to the carrier's ID field (ie. UPS, FedEx)

This carriers field might not make logical sense here, but the point is that we add a specific field as a foreign key field for whatever table Orders is acting as a child to??? If Orders is in a child like (many) to a parent table, then Orders needs to have a dedicated foreign key to hold the parent's ID value.

Several of you mentioned you use ID as the field name for the PK. Normally, I use ID as the field name for some user data that many would name it as customer_id. Thinking that this use is redundant, I'm no longer use customer_id, customer_name, customer_state, ... If the table name is customer, then why should I repeat it in the fields, therefore I use id, name, address, etc... In code, I refer to them as table.field, customer.id, customer.name. This style conflicts with what has been mentioned already. Any gotchas that I'm unaware of using my convention?

Is it safe to say that naming a foreign key something like customer_fk would be meaningless and confusing?

I like the idea of including the data type in the name, but vfp many times and unpredictable changes the case which would make the name harder to read. iName vs INAME ve iname... If it would stay as iName, I'd switch in a sec...

For me, I've never been able to efficiently work with them by remembering their values (always had to copy/paste them). They are easy to remember, tremendously smaller and probably a lot faster, (at least that's what fox has been teaching for years). The guids probably shines best with disconnected data, but does that outweigh the other arguments.

Also, is there ever a need to index the foreign keys? Looks like a performance hit if you didn't??

Please keep in mind that changing the backend to sql server is a high probability, so, keep that in mind. I did check and MSSQL does a autoinc equivalent so staying with integers should migrate easy enough?

Thanks.
Stanley

 
Now I remember why I avoided all this primary key/autoincrem stuff and worked around them... Because I have to fight it every step of the way...

I started playing around with adding a couple tables and wiring up the primary key stuff. Both tables have data in them. Now, I create the field named pk_id set to integer autoincrem with startat 1 and step 1. Next I created an index names pk of type primary with pk_id. Save it and we see 1st issue, uniqueness is violated. It quits... I open the table and all records in the pk_id field contains a zero. I was expecting the autoincrementer to take care of that when adding a new field even if the table contains data, as typically, vfp creates an empty structure whenever the structure is modified and afterwards it copys in the source to the newly created table. Well, not here... Then it says the field is readonly and the table is opened excl.

Any thoughts?

Will I have these same sort of issues when using inserts and/or appends?

Can I overwrite the values? For example, I'm importing unix records that contains a numeric key for relations. When unix exported it, it created the parent child relationship fields and populated them correctly with numerics. Surely, I don't have to switch into non autoincrem mode before importing and switch back when new non-imported records are added, do I?

What other gotchas are there?

Stanley
 
After some searching and reading articles (both here and MS) discussed in my previous post, using guids is looking better all the time...

Most all of the articles mentions dropping the autoincrem column, make changes, then add it back. Also mentioned was generating cursors on the fly to get the max. Dropping a column orphans others, and I'm sure it won't recreate all the keys exactly as before the drop.

How does all this work when dealing with packs, deletions, records that are already linked to the field?

If a column's autoincrem is dropped, does the column still contain its data for existing relations? If not, wouldn't you be orphaning a lot of relations?

Is there a udf that could generate a unique number much the same was as guids are generated? Place it in the "default value" field.

Or maybe better yet, create our own autoincrementing numeric udf and place a call to it in the default value, that way we are in control??

Currently, it looks like a mess to me...

Stanley
 
Your still confused about this.

Why would you name foreign key fields customer_pk or carrier_pk? They are the primary keys of the customer and carrier table, yes, but here in the orders table they are foreign keys. You still don't get what primary and foreign means. A primary key has the rule to be unique in it's own table, it is the identifier, like every person has an id document, your passport. This must be unique, so that a reference to this value identifies this record and only this record. This uniqueness is enforced for the one table the key is defined in, via an index of type primary. Free tables don't offer that index type, but the index type candidate. It also enforces uniqueness, but you may have multiple candidate indexes.

Now this uniqueness allows you to refer to a record and only that record via this value. And that's why you not only want but must copy it into other tables, when they refer to a record. That's the whole purpose of the key, to be used as reference for the whole records data. And for knowing this reference and remembering it you of course store it on the referencing child tables, which makes the fields you use for it foreign keys. KEys referring to records of other tables. Interpret each table as a country and you can easily see what foreign keys are, they are copies of your id document (or more concrete its ID number) a country makes to refer back to citizens (records) of other countries (tables). So foreign keys are not generated in the child table, keys are always only generated in the origin table and are copied as reference to them, as foreign keys.

Each single table has one primary key fields and 0 to N foreign key fields. Therefore the rule to have just one single name for this field, but more specific names for the foreign keys.

If you don't like ID, but pk and fk, then use these, but then you need to have customer.pk, carrier.pk as the primary keys of these tables and order.customer_fk, order.carrier_fk as foreign keys. Join condition of orders with customers then is order.customer_fk = customer.pk and to join carriers order.carrier_fk = carrier.pk.

>Is it safe to say that naming a foreign key something like customer_fk would be meaningless and confusing?
From what I said: no. Totally the opposite is true, naming foreign keys with fk would be a viable approach and of course not meaningless or confusing, they are foreign keys. You have seen many people including me using ID for both pk and fk, but ID only for pk and a more specific name for fks, like CustomerID, or like Mike does it Customer_ID. These are then the foreign keys of course. But foreign and primary keys differenciate the meaning of the keys, they are both keys and identifier or short ID is just a synonym for key, and that's why it's the norm to name these primary key fields ID and foreign keys <Source>ID, where source denotes the source.

Tamar also has given a good example, why an employee key might not be stored as EmployeeID but as ManagerID. I have InsertedByID and UpdatedByID fields in all my records and they contain the user IDs of the users having inserted and updated the record (of course only the last update can be held in a single field, if a history is needed, that's audit trailing, off topic now).

In regard of type prefixes, they are a matter of taste, but if you'd like to name fields pk and ..._fk, then you are specifying the type of field in the name, so why not be consistent? There is another camp saying these things are meta data you can receive from the schema, and indeed all fks might simply be named fks, if you would be able to name two fields identical, as you will define what table they refer to via the defining clause: ...FOREIGN KEY customer_id REFERENCES customer TAG id (that is, if you name index tags like the field names). Actually the primary index is enforcing the rules for primary key values on the field you index with it. But the mere reasons to not name all foreign keys fk or only id is that you a) can't have multiple fields with same name and b) want to see at a glimpse on the source code without looking up the table definition, where this foreign key comes from. And the latter has is still not arrived in your brain, from what you write to reflect your knowledge. Foreign keys COME FROM a parent table, they are copies of primary keys, the primary key of a table is defining what you need to copy somewhere else to reference the record, so a child table needs all the fields involved in the primary key as copied fields. That's why most of use are in the camp of having this separate id field as primary key, you only copy over a mere identifier value with no other meaning than to identify a record and you never need to update this, neither in the origin table nor in other tables, just because some net data about a record changes, that is part of the primary key.

Bye, Olaf.
 
One more thought:

Referring to the parent record, eg Order.CustomerID referring to the customer making the order, you don't store anything else of the customer record in the order record. This foreign key reference is there to refer to any attributes (fields) of the customer, so you refer back to all other fields aside of the Customer.ID via the Customer_ID foreign key, you refer back to the name, address etc., you even refer to all fields you may only define in the future, you have the key to all this data for joining it to the order. You even refer to all the things the customer refers to in it's record via foreign keys, eg if the Customer.AddressID refers to the customer address, then Order.CustomerID of course also indirectly refers to this address.

Here's the catch: If you look into your data after 5 years and the customer has moved your 5 year old order points to the new address, but not the one you used 5 years ago for shipping. If you need to know this for reasons of some law, you have to copy over net address data somewhere to know the address valid at the time the order was made. But that's not a reason to avoid the key concept of a separate identifier, this is addressed (no pun intended) with audit trailing of data. In this case another easy solution is of course saving PDFs with the generated delivery slip having the address as it was current at order time.

Bye, Olaf.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top