Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL blanking column data when creating composite PK 3

Status
Not open for further replies.

1DMF

Programmer
Jan 18, 2005
8,795
GB
Hi,

Can someone explain why SQL would blank a column on a table, when I try to create a composite PK?

I have a table with 1.5M records (RecID, UserID, Area, Used), I want to replace the identity insert PK (RecID) with a PK made up of the three columns in the table and delete the old identity insert PK.

However, when I edit the table in Management Studio , remove the old PK and select the three columns (UserID, Area, Used) that I want for the composite PK, the UserID is blanked?

Basically it is completely corrupting the table when the PK index is created, why?

Thanks,
1DMF

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
As I understood indexes / hash maps; it's based on the first three chars isn't it?

I think I will go with no PK in either and use a heap table for production and clustered index for the archive.

Regards,
1DMF

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
As I understood indexes / hash maps; it's based on the first three chars isn't it?

Not that I am aware of.

There are a couple of criteria that are necessary for SQL Server to use an index. The Where/Join clause must be "sargable" and the criteria must be selective enough.

Sargable basically means, starts with for alpha/numerics. For numbers, it's a value (equal to) or a range. Selectivity is a little more difficult to explain. Basically, SQL Server maintains statistics regarding your table data. If the data is not different enough, SQL Server will ignore that column for indexing purposes. For example, if you have a table of people, it wouldn't make any sense to include gender in an index (with the possible exception of the include clause for covering indexes). Basically, SQL Server decides that it would be slower to use an un-selective index than it would be to use the table data. Earlier, you said there was a million rows in your table and your query returned ~ 870K rows (roughly 87 % of the data). This is NOT selective enough for SQL Server to use an index.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Well I re-ran the tests with a criteria that returned 125,599 records.

There was no discernible difference, is that selective enough?

I think this has shown that in this situation a surrogate PK has no purpose, there is no FK referencing it, so joins aren't helped and it isn't part of the searchable columns needed for the index.

However it has also highlighted that with over 1M records in a table that has only a few columns, the difference is negligible.

Though it does raise the issue that if a PK was required due to a relationship, and you wanted a clustered index across the searchable columns, it wouldn't be possible if you had a surrogate PK, as only one clustered index is allowed, plus as a surrogate (auto identity) PK acts like a heap, you lose the benefit a clustered index provides when querying / retrieving the records.

I really appreciate all the input and support this thread has provided, so thanks to all who participated.

Incidentally, at what point does a clustered index make a significant difference, is it number of records, number of columns, both, and how many?

Regards,
1DMF



"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
>Though it does raise the issue that if a PK was required due to a relationship, and you wanted a clustered index across the searchable columns, it wouldn't be possible if you had a surrogate PK, as only one clustered index is allowed, plus as a surrogate (auto identity) PK acts like a heap, you lose the benefit a clustered index provides when querying / retrieving the records.

You still think you must use a clustered index on the surrogate PK. No. You can create a clustered index on the searchable columns and a unique index plus a non null constraint on the surrogate PK. That wouldn't make the table a heap and would accelerate queries for certain userids. It would make an insert slower, but what do you expect? Getting fast reads without investing an effort of storing data sorted? If you're concerend how much time an audit record for a user navigating your site costs, it doesn't matter much. You stage an insert to the SQL Server service and your website script continues while SQL Server saves the audit record. It's a parallel process, isn't it?

Bye, Olaf.

 
Olaf,

Every time I create a PK it auto creates a clustered index, why wouldn't I think that?

However, I tested and it means I have to change it or remove it, create the desired clustered index and then re-assign the PK, what a palaver, but that's GUIs for you!

You stage an insert to the SQL Server service and your website script continues while SQL Server saves the audit record. It's a parallel process, isn't it?
Sorry I don't understand this. What do you mean stage an insert and continue parallel processing my script? Can you do that with Perl, does DBIC do this for you? I don't know enough about ORMs let alone DBIC to answer that.

hmm I'll go ask in the DBIC forum and find out.

Regards,
1DMF

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
>Every time I create a PK it auto creates a clustered index, why wouldn't I think that?
Well, that's just the automatisms of SQL Server. You don't need to specify a field or several fields as Primary Key. Yes, that always creates a clusterd index, but that's merely a technical thing. Theory says a primary key is a unique non null value. Don't use the golden/yellow little key, then all is OK.

You are too much adhering to technical implications, this is what I meant as I said you think that you need to define a PK this way. You are not forced to do that, you don't need to take what SQL Server defines as technical primary key as your primary key, you only need to specify a pk field according to theory.

It's just my oppinon, not what some SQL Server purists would do.

>parallel processing my script?
No, not parallel processing your script. I don't know what you program in, but it doesn't matter much. In case an SQL Server is involved you send an INSERT INTO sql command to the sql server and your script continues. SQL Server does the INSERT while your script continues, so the performance hit needed to eventually rearrange the clustered index is not your concern, you shoot your SQL and SQL Server processes it. As SQL Server is a separate process it runs in parallel to your script anyway and at that moment you give it that job of the insert and then go on with your code. No matter if the programming language you use allows multi processing or threading, you use two processes, one executing your script, the other executing SQL Server and sql commends or requests coming to it.

Bye, Olaf.
 
>pulling apart theory for practical implementation.

Indeed that's how you can call it. Taken from that perspective I could also be counted as a fan of a natural compound key clustered index and a surrogate key unique non null field/index, where that makes sense to store data ordered in the way I often need to read it or pick from it.

It's a bit of a compromise in SQL Server. To summarize, this way you use the way the clustered index/primary key is meant for sorting data, but then not use it for defining the real primary key. But you can use it as natural key purist and as surrogate purist as well. Adding the surrogate key field (if you add one) to the natural compound key fields for the clustered index, you still can use the clustered index to find it by the surrogate key and read the full record from the clustered index leaf nodes.

My experience is, fetching very selectively - as I mostly need to, eg a few 10 rows of millions for a formula - I get data very fast, as the clustered index concentrates the records in a few pages mainly sorted by foreign key and not primary surrogate key value. A clustered index on a identity field also puts records generated in sequence together, but if you edit a customers details over a long time period, these detail data is spread with large gaps in the surrogate pk value of the detail table all on the same foreign customer id, while a clustered index on that foreign id plus some other fields and the detail id keeps customer detail data close together at all times.

Other databases don't have the feature of a clustered index at all and still run very well, eg by constant record size.

Random spread data and therefore random access also is no problem with SSDs, in regard of that you also might say the clustered index has no benefit anymore, as it was thought of to optimize hdd heads reading rotating platters, reducing the seek time by reducing seeks and optimising data for sequential reads. But that's not the physics of reading data anymore. Rearranging pages to keep data close together in the clustered index on SSDs rather leads to wearing those drives more than without a clustered index. Indeed that get's into thoughts about other ways to store data, as big data does. But that all is merely the technical perspective on performance and not so much a theoretical perspective.

Bye, Olaf.
 
Every post I learn something new!

Didn't realise you can wear out SSD's so easily simply by having clustered index sorting, but to be fair, I don't really know what's inside them, unlike having smashed apart plenty of HDD's in my time! [hammer]

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
About SSD that was merely a thought, not experience. I have worn out a small one in a netbook, but not in servers, they are still old fashioned HDDs, here, not SCSI, but SATA.

Win7 or Server 2012 are now managing the wear levelling of SSDs good. I assume there is a reason for server SSDs with longer life single level cells, anyway. Indeed the sorted storage of data means rewrite of SQL Server pages and that results in SSD page rewrites, which of course wears them out more than an additional record entry in a page of a heap table. Most probably page sizes of the MDF/LDF files differ from SSD page/cell sizes anyway.

I found some articles having concluded with clustered indexes still being of practical importance for performance, even with SSDs, even if the data is in RAM (cached):

Especially see linked PDFs in the comment on the question from Dec 30 '12 at 11:52:

This is really off topic, but yes, I also learn new things in discussions, you not only learn when asking questions, also when answering :)

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top