Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

performance of text join vs number join?

Status
Not open for further replies.

Crowley16

Technical User
Jan 21, 2004
6,931
GB
Just how much of a performance hit would I be incuring if I were joining a text field, as opposed to a integer field?

bascially, I'm wondering if it's worth while to create a autonumber field, instead of using the exsisting serial number...

Thanks

--------------------
Procrastinate Now!
 
yes, but if there's 20k records, and I'm joining on a 20chr text field, which introduces a .001 sec delay each time, then the time difference will be quite significant...


--------------------
Procrastinate Now!
 
I would have to disagree. Use the autonumber field for all Primary keys. There are various reasons for this, to name but a few:
1. As a general rule, Primary keys are not 'for user consumption' they are designed purely for use by the db to Uniquely identify records and create links between tables.
2. A primary key is always indexed. If you are using a string (or any other number) you will not necessarily be guaranteed incrementing values. This means that the index must 're-jig' itself in order to fit the new value into it's index in the appropriate position, therby taking longer to insert and update PK values. Whereas an autonumber can be guaranteed to increment, thereby reducing this overhead.
3. Using a serial number as a pk could lead to problems down the line. What if serial numbers completely change format further down the line, or even the company stops using these serial numbers for whatever reason. You will have to rebuild the table to utilise a new PK field.
4. Autonumber is just plain EASIER. :)

There was a massive discussion (verging on argument) over surrogate versus natural primary keys on another forum (which I won't mention out of courtesy :) ) it lasted about 4 weeks! the general 'upshot' of the discussion was that you should use surrogate PKs i.e. Autonumbers.
Caveat, everyone has their own opinion, this is simply mine and most of my colleagues.
egards
Jim
 
so you have run timings on your own database?

is that where you got the .001 number from?

i guess you answered your own question, then

if you haven't run your own timings, how do you know it's .001 and not .00001? or .1?

:)

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts May 8 2005)
 
The difference depends on how the join is performed and how long the text field is.

The speed of join will simplistically increase as the field size increases. Numbers can be several bytes but generally will be shorter than text. However if you are making good use of indexes when the join is performed, the difference may be negligible. Alternatively you might have a smallish table that can stay in memory or cache so eliminating a lot of i/o.

I would agree with rudy. Choose a key logically and then worry about tuning only if you have to.

 
Rudy said:
if there is a satisfactory natural primary key, use it
Agreed!

Jim said:
Use the autonumber field for all Primary keys. There are various reasons for this, to name but a few:
Use of Autonumber in a multiuser environment is sometimes problematic if using sequential / incremental.

Jim said:
Primary keys are not 'for user consumption' they are designed purely for use by the db to Uniquely identify records ... Autonumber is just plain EASIER. :)
Agreed

the index must 're-jig' itself in order to fit the new value into it's index in the appropriate position

Interesting. You are saying that the index used by Access is sequential (vs btrieve and other indexing strategies), and the index has to be inserted inbetween other values to preserve the sequential sequence. Do you have a reference on this -- I would be interested in reading up in this further.

Jim said:
Using a serial number as a pk could lead to problems down the line. What if serial numbers completely change format further down the line, or even the company stops using these serial numbers for whatever reason.
What are your thoughts of using character based pk's? Customer or VenderCodes, ProductCodes, CourseCodes, etc.

Use of character based codes have the advantage of not having to include the supplemental table.

For example, in an Order Entry system where you want to find Orders for Product "XYZ"

Using the Autonumber PK...
SELECT tblProduct.ItemName, tblOrderDetail.*
FROM tblProduct INNER JOIN tblOrderDetail ON tblProduct.ProductID = tblOrderDetail.ProductID
WHERE tblProduct.ItemName = "XYZ";

vs use of a ProductCode...
SELECT tblOrderDetail.* FROM tblOrderDetail
WHERE ProduceCode = "XYZ"

Use of ProductCode resulted in not having to use the INNER JOIN.

I believe BNPMike referred to this with his mention of good use of indexes and joins.

...But I think we may be missing the point.
Databases are used mainly for either inserts and for reporting, or a mixture of the two.

Insertions and deletions are slowed down by the number of indexes used on the table.

Queries (reporting and updates -- have to find the record or records) are improved by proper indexing.

Crowley16 - what type of database do you have?
The objective of your database will influence your indexing and primary key strategies.

...Moving on
Objective said:
...there's 20k records...
For large databases, indexing and design strategies become much more important.

For example, the technical level of understanding in this post is high. So we all know that by creating a form based on a table can result in awful performance if the table has a lot of records entered into the table. When the form is opened, the form retrieves all records in the table -- can generate a lot of network traffic, and can generate local desktop performance issues, especially if there is not much room in the TEMP work area. A better approach is to either to only bring other a few fields, or use an unbound form, and only bring other the record in question.

In many cases, these types of approaches to minimizing data movement will have a much bigger impact on performance than small changes to indexes.

I agree with Jim that this is a big topic with many valid views and perspectives.

Richard
 
lots of good points...

currently I'm still in the build stage, so don't have the ability to check timings, but potentially, there could be a LOT of entries in this db. I've got a projected max of roughly 100k within 2 years (this is VERY rough) and that's basically what I'm aiming for, (after this time, hopefully this will be upsized)

There will be a lot of data entry, and reporting, so minimising network traffic is definitely another consideration...

jimirvine is that other thread still active? Have you got a link? and I'd also be interested in some documentation about the re-jigging index thing...

Thanks for the ideas so far, and if there's anything else, then keep them comming, I'm still not really sure which way to take...

--------------------
Procrastinate Now!
 
Okay, my 2 cents...

I'll not add anything to the issue of text vs numeric key. My input is in the form of a recommendation based on personal experience.

Design your database to easily move the back-end to SQL Server or some other similiar DB. When DB size is expected to be an issue and performance is already an issue, you had better be ready to cross over at some point. The right time from a development standpoint is probably NOW. But, only you and your management can make the appropriate call.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top