Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Surrogate keys & join performance

Status
Not open for further replies.

aravindk

Programmer
May 8, 2003
43
US
Hello all,

Lots have been talked about the surrogate keys improving the join performance. Has anybody really tested it? Can someone point me to some benchmarking test results? Any help in this direction is much appreciated.

Thanks & Regards,


 
From my experience an integer surrogate key speeds up join performance since it usually replaces a longer or character-based natural key, thus reducing disk I/O. It also may speed up indexing on that column for the same reason.

Here are some articles (google is your friend) :


And one from this very forum:



Hope that helps.
 
Thanks for the posting. I did come across the first two links with some googling, before posting the message here.

As you may have already noted in Kimball's article he says that by using the surrogate keys that the join performance is *likely* to improve. He also underlines the fact that it has not been proven. I am wondering if anyone experienced any such performance boost.

I am meeting with our DBA this week for whom the datawarehousing principles are a total anathema. If I can see some real benchmark results of surrogate key vs natural key, that would be a big selling point for me.

Thanks & Regards,
 
The best test is to prove it to yourself with your data in your operating environment.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright
 
When you ask about advantages of staging area and ODS, do you mean compared to a data warehouse? I'll assume such for my response, but correct me if I'm wrong.

An ODS is typically set up for a smaller data set than a data warehouse, and expected to answer queries with fast response times. It's usually set up and optimized for just one operational system, whereas a data warehouse typically covers multiple business areas. An ODS can usually only answer a limited number of questions but does it rapidly.

A staging area is basically where source data for a data warehouse is loaded prior to populating the actual data warehouse tables. An ODS might not use a staging area, but most DW implementations do. Some are persistent staging areas which remain and are archived in that state, and some are deleted after each load.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top