Integer or Character as Primary Keys

stanlyn · Mar 25, 2015

Hi,

In my previous thread on comboboxes several contributors starting talking about the behavoir of comboboxes in relation to the data types, (char vs int). This has raised several question for me and would like your feedback...

My previous apps never really had a primary key designated as such, instead I used a field like "acct_id" that held user data and served as a primary key. I'm thinking about having a field designated just for the primary key named something like "key_primary" or "key_parent" and have another field serving as a foreign key named "key_foreign" or "key_child". This naming eliminates any confusion on what the key represents, parent or child. This way children never gets orphaned, no matter what the real world values are or gets changed to. Previously, if I changed the value of "acct_id", I'd first have to change the children's key and then change the parent. That need to update the child records would be eliminated if I used dedicated primary and foreign key fields.

Now, integers or characters is the question? I like integers because they are unique and easily human readable/useable. I like guids because they are guaranteed unique without consulting other resources. I hate guids because I cannot remember them and they are only copy/pasteable... Forget about writing them down...

And, is it better for us to generate the primary id at the client with guid() or at the server with antoinc?

So what are the pros and cons for each, whether we use vfp or sql from a vfper's perspective?

Like to hear your thoughts,
Stanley

Olaf Doschke · Mar 30, 2015

When you'd use this stored proc you won't be able to initialize new ID integer fields as Tamar suggests with RECNO() per Table, the GAI (global auto inc) would not generate a sequence per table. But you could also modify it that way in making a counter table per normal database table and making the corresponding table name a parameter of the stored proc call.

That's one of the things easier with an autoinc per table, initing a new ID field with RECNO, then ALTER to an AUTOINC Field setting NEXTVALUE to MAX+1 once.

Bye, Olaf.

stanlyn · Mar 31, 2015

Yes, I see what you are saying, but since its generating unique numbers, why does it matter if done at the per tablelevel vs database level?

Now, with Tamar's way, I need to run a script that sets all the record's pk field values to recno(). Once done I alter the field type from int to autoinc and set the nextval property, then create the primary index. And this only needs done once to fix the existing records.

So, now, what are the real benefits of all this? I must admit its a little strange thinking in terms of pks and fks with no relation to the data. The field data that means something to the user or report can be anything and all the wireup is thru the pk and fk fields.

Before this pk-fk stuff, my customer table had a field named user_id and the orders table had a pairing field named user_id and I created a relation on these two fields. But now, we are kind of disconnecting the user data from the relationship and relying on "meta data" (maybe thats the right term) for the relationship...

Now, is it common practice to always had another user value that would help in repairing an out-of-sync condition? Something like keeping another field that could be used to lookup the parent table to get key? Things do go wrong...

What are some "way out there" use cases of doing it this way. I'm just trying to see the big picture...

Thanks,
Stanley

Olaf Doschke · Mar 31, 2015

>Before this pk-fk stuff, my customer table had a field named user_id and the orders table had a pairing field named user_id and I created a relation on these two fields.
You have to have an index on the field to make a relation. This is essentially already making a pk-fk relation. What changes really? That you enforce uniqueness in the primary index type. That's for the better, isn't it?

>Now, is it common practice to always had another user value that would help in repairing an out-of-sync condition?
No, you don't store redundant data. You rely on the right reference in the same way as you rely on the right values within the net data fields. If you copy over any additional data, how are you sure this data leads you to the right parent. How are you sure this additional data is right instead of the key? You canÄt decide. The whole point of removing redundant data is you can't be sure which of two redundantly stored values is the right one. There always is the doubt of errors, that's why we test software before releasing it. There can be sabotage or unintentional data errors, but no matter if using surrogate or "real" keys, the data can be wrong. Are you afraid of toggled bits? We have one wanted redundancy: BACKUPS.

Bye, Olaf.

stanlyn · Mar 31, 2015

>> net data fields

What do you mean by this? "user" or "meta"

>> how are you sure this data leads you to the right parent
By copying the correct "user" fields (you need to know your data). If I know that the invoice table has unique invoice numbers, then adding an invoice number field in the child would give me a way back to the parent, should the fp gets hosed (out of sync)on the child.

Thanks, Stanley

Olaf Doschke · Mar 31, 2015

net data: Data that is not technically for references, data except IDs and other technical fields. Something you display automatically is net data. Some applications I know display primary keys (or foreign keys), which doesn't really make the keys net data, but that's just breaking the rule.

The paytype name is not a key. It may be in your situation now, but it's a name, it's natural/user/net data. The bad thing about storing "AMEX" in invoices instead of ID 1 (of that is the ID for AMEX) is, it could also be amex,AM-EX,Am-Ex or (if the invoice.paytype field is large enough) Amercan Express. Even if you let the user only pipck from a list, in browse or with any DBF aware tool (people used Excel to modify data directly) you can get names not in your paytype list.

> thinking in terms of pks and fks with no relation to the data.
You overlook one very important relation of the pk to the data. The pk is a reference key, which stands for the whole record, for all the net data in it. In the same way as a parameter passed by reference rather then by value, a pk is the reference to the object having the pk and all the other attributes in it: the whole record.

Bye, Olaf.

Olaf Doschke · Mar 31, 2015

Also net as in net profit, net weight, netto (real data you want to store, the content) vs brutto (all data including all the containment and structural data needed, comparing to the packaging of products).

Bye, Olaf.

Olaf Doschke · Mar 31, 2015

>should the fp gets hosed (out of sync)on the child.
Assuming this hosing of fps happens, why are you so sure the invoice number you also copied isn't hosed?
And if there is any discrepancy, which data is right? What data can be stored more reliable? 4 bytes for an int key or 10 to 20 bytes for a alphanumeric invoice number?

This leads nowhere, this just shows your distrust in computing and data storage. Go back to pen&paper if you don't trust your bits.

Bye, Olaf.

Mike Lewis · Mar 31, 2015

Also net as in net profit, net weight, netto (real data you want to store, the content) vs brutto (all data including all the containment and structural data needed, comparing to the packaging of products).

Olaf, just for your information, in English the opposite of net in this context is "gross". We talk about the gross weight of a product or shipment, as opposed to the net weight. I think this is the quivalent to the German Bruttogewicht or the French poids brut.

I just thought you'd like to know.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Olaf Doschke · Mar 31, 2015

Thanks Mike, I seldom use these words, you remind me I once knew it, but I forgot to have forgotten it.

Bye, Olaf.

Olaf Doschke · Mar 31, 2015

In regard of data safety in the sense of the reliability of data storage there are three things mentionable:

1. VFP does no transaction log like most any database severs do. You never know who added, changed, deleted which data when and PACKs are even really removing deleted data.
You can add triggers recording that. I earlier already referred to "audit trailing".

2. Backups. Even if someone intentionally tempers with data, this doesn't temper with all or even any backup. You could theoretically compare data with backups especially data, which should not have changed. Of course this is not as practical as an audit trail of all changes. All printed data can also be evidence of what the data was at a certain time, backups don't only mean file backups. Laws asking you to maintain payment data etc. are more about invoices and other documents themselves, than about your database. So you do exports in the form of reports and store documents about these important things in paper or PDFs or both.

3. Checksums. You can introduce row checksum. And of course you don't let the DBC or database compute them automatically. Then any tempering of the data also would aligns the checksums and the value to checking them is lost. You compute checksums only in your application. Any change coming from elsewhere then is detectable as any tempering will lead to the stored checksum computed at the time your application stored it from the cehcksum you compute for comparison on all fields except of course the checksum field.

All these things are available no matter if you use generated keys or define natural keys from net data. Natural keys are not safer than generated keys, because they are natural and don't add to the data. And your example is even a good example of a generated key, because how natural are invoice numbers anyway? Just because the concept of invoice numbers is older than databases and even computers they still are numbers specifically generated to have a unique reference to an invoice.

Bye, Olaf.

TamarGranor · Mar 31, 2015

Stanley - I'm with Olaf. You don't repeat the data in the child record because that's setting you up for trouble. (The man with one clock knows what time it is. The man with two is never sure.)

I started to write pros and cons here and realized I've already done it:

http://www.tomorrowssolutionsllc.com/Articles/The Right Keys are Primary.pdf

The solution to FKs getting clobbered is UPS's on all machines, plus back-up. FWIW, I can't remember ever having to fix that kind of problem, when both of those criteria were met.

Tamar

stanlyn · Apr 1, 2015

I haven't left, just extra busy today. I'll be back and reply tomorrow...

Thanks, Stanley

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Integer or Character as Primary Keys

stanlyn

Programmer

Olaf Doschke

Programmer

stanlyn

Programmer

Olaf Doschke

Programmer

stanlyn

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Mike Lewis

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

TamarGranor

Programmer

stanlyn

Programmer

Similar threads

Part and Inventory Search

Sponsor