Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

indexes vs. identity columns 1

Status
Not open for further replies.

Orionov

Programmer
Jul 5, 2001
1
US
What is the best approach to linking tables when thinking about performance and data integrity.
for example, lets say I have a column of type char, and it is a primary key that has a relationship with another table on a column of the same type. Since an index is created between the two tables, is the efficiency of searches between those tables based on the index or the char strings themsleves.
Right now I have an extra column on my tables that is of type numeric called ID. I have this ID in both tables and I base the relationship between the tables on this number. This numeric field really gives no info other than linking the tables becuase the real info I care about is a char field that, as of now, I consider inefficient to place on all tables and base my SQL searches on.
So, the 2 major issues I am concerned about are:
1. Should I eliminate this extra numeric column becuase the indexing of the char fields themselves will perform just as well, and
2. Is having this numeric column unsafe in the case of relationships? If by some system error or user error if we lose an important table we may lose the connection of the numeric field and the char field it is representing. therefore we could lose track of what some of our tables are trying to tell us about.

table 1: table2:
name type name type
--------------- -----------------
ID(pk) numeric ------> ID(fk) numeric
serialnum char task char
Location char date datetime


so the central item is the serial number(ex: 'SN365676')
would I rather have that in every table? or is the ID a better link that would perform better.
Is this good practice? If we lost table 1, table 2 loses any meaning.
Thanks for any help you can give,
Marc
 
A few generalizations:

* Integer columns are usually stored more efficiently and (and assumedly joined more efficiently) than Char columns.

* Standard, but not universal, practice is to NOT use any column that has a business meaning (such as a Social Security Number, Part Number, etc) as a primary key, but instead use a sequential (i.e. Identity column) system-assigned number. Sometimes it makes sense to break this general habit, such as for (in the U.S.) State, since we don't change the names of our states.

* Losing Table 1 in itself should not be a particular design issue, unless you are in some weird environment where you randomly lose tables and not the entire database. In most any database of any complexity, losing one or more of the tables is disastrous.
Robert Bradley
Sr. DBA, some big company
cheap prints and oil paintings:
 
From my perspective, I do subscribe to the idea that as much as possible your primary key of any table SHOULD BE business information. This is what provides data integrity. having proper primary keys of business data is what provides data integrity.

if you are using an identity field as a primary key, it think it is a was of space and provides no particular meaningful value.

I will also say that in my shop, more often then not, the primary keys are business values. Paul
 
Paul, I also presume that in your shop these business values NEVER change? Or do you try to cascade primary/foreign key changes through your tables when they do?

 
yes you are correct, the business values used for primary keys do not change. part of the design. in some cases the primary keys are multipart.

depending on the data, a soft delete can be used and a new row inserted..... i guess in the end it depends on how the app is designed.

something like a acct nbr should not ever change for the life of a customer. same with sin's they do not change. obviously, there are execeptions for everthing, but i always push to try and create 'proper' primary keys first.

in a previous life, we had all kinds of problems due to duplicate user's. in the user table, the primary key was a meaningless number and the app did not check for the exsitence of a user before creating it. if we had a primary key of user, then the db would not have let the user be added.

my two cents at any rate. Paul
 
In many instances, you and I agree, Paul. Things like account number can be used just as easily as a pseudo-key, as long as they never ever ever change.

But what about something like a payment by a customer on an account? Do you use Account Number + Payment Date as the PK? What if they sent in two checks at the same time, which might be considered two payments on the same day? In situations like this, it is often preferred to generate a system-assigned PaymentKey value.

Single-column PKs also can be joined much more efficiently (both in code and in execution) than multi-column keys, which could be an important consideration for tables that are joined to frequently. Robert Bradley
Sr. DBA, some big company
cheap prints and oil paintings:
 
yes, there are times where an identity key or an arbitrary id is the only reasonable solution, but i also think that a well designed model can avoid a lot of this type of thing. but yes, i do agree that there are times that it can't be avoided.

my personal pet peeve are designs that just use identity ids because it is the easy answer.

but a sort of side question... what is more important performance or data integrity? using an arbitrary int key probably performs better but does not naturally provide data integrity. Paul
 
what is more important performance or data integrity? using an arbitrary int key probably performs better but does not naturally provide data integrity.

I don't think one vs. the other is a data integrity issue. Take this example:

I have a database that stores retail store purchases, and it is a hard business/integrity rule that a given customer cannot purchase at the same time at two different stores.

The choices:

A) Use an Identity column as the PK for the table

B) Use CustomerID + StoreID + PurchaseTime as the PK

C) Use an Identity column as the PK, and create an alternate key on CustomerID + StoreID + PurchaseTime

With choice C, I believe you get the performance and storage benefits, plus a data integrity check, plus the ability to tweak your AK without impacting other tables.

I agree that it seems silly to have a sequential PK on a table "because thats the way its done"; but I've also been on the other side of the fence in my younger years, and occasionally got bit in the backside when what I thought was an unchangeable key was, in fact, fluid (example: I used EmployeeID as the PK; turns out that some users re-assign EmployeeIDs as an employee moves up through the ranks). Robert Bradley
Sr. DBA, some big company
cheap prints and oil paintings:
 
I think your design is OK. Usualy is good using NOT data but other column as PK, because all other data could be changed. Also index based on small data like int, or small int is better then using bigger. Processor faster compare 4 bytes against 4 bytes as comparing long strings.Also I read somewhere using Clustered index for colums with Identity is not so good, better performance have non-clustered index.
Loosing data is bad, but is good specify when you may possibe lose your data. Maybe helps using transactions and commiting it after finishing everithing. This protect your data for loose it. Same system use replication process.
 
I guess in the end, it is partly what seems to be the most logical answer and your personal experiences....

in your example, i would tend towards B, not C. I presume by 'alternate key' you simply mean a unique index. Anyway, all things being equal, my design would use the 'business' primary key unless there was a compelling reason otherwise.

I find, generally speaking that developers tend toward 'identity' fields becuase it allows for faster developement as a solid business model is not needed to begin coding. Designers, tend toward a solid business model before development begins.

I personally, tend towards the middle with a design leaning. For me, I have been involved with enough projects that end up with data integrity problems that a good model would have solved that I prefer to have a good model to start from....

For me, enough said on this topic. You could write a book on it. I certainly see your points, and hope you see mine.

Regards,
Paul Paul
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top