Question for gurus: Most efficient way to handle compound incremental numbers? 1

Katerine · Dec 5, 2012

Hi,
I have what I hope is a fairly simple question.

I have a database (SQL Server 2008 R2) that I'm designing from scratch. It's in the beginning coding phase, and this particular table doesn't have any code based on it yet, so I'm pretty flexible with changing things.

This database will potentially have several companies using it, all using the same tables, but black-boxed with procedures and insulated from each other (with the exception of a very few things, nothing any company enters should be able to affect any other company in any visible way). Each company has clients, who send the company jobs. Each company will be inputting their own data.

These jobs come in from various clients by the thousands. Each gets input into the system, and given the following "keys":
[ul=1]
[li]DatabaseJobID - PK of the Jobs table. Just your standard auto-increment. Used entirely behind the scenes, for internal searches and FK contstraints. Not clearly visible to humans.
[/li][li]CompanyJobRefNumber - Reference number used by the company to refer to the job. This consists of the year entered in one of the date fields (we'll call it StartDate), followed by a dash, followed by an incrementing number (displayed with leading 0's) which starts back at 1 each year. Needs to be unique, for that company and for that year. Not really negotiable - this is how the companies want it.
[/li][li]ClientJobRefNum (optional) - Reference "number" (I say number, but it really could be anything) optionally given to the company by the client. The company has no say in this number, but searches do need to be performable on this column.
[/li][/ul]

How to store the data and implement the logic is still pretty flexible at this point, but the behavior that's needed is pretty set in stone.

My question is: regarding #2 above... what is the most efficient way of doing this? The "auto-increment" portion needs to auto-increment within that company, and within that year. It should start each year back at 1, it can't have duplicates (within the same company and year), and it shouldn't skip numbers because of other companies.

Many thanks! [bigears]

Katie

flapeyre · Dec 5, 2012

For CompanyJobRefNumber, how about calculating it as follows (warning: not tested because I don't have SSMS handy):

Create a SP to calculate the unique CompanyJobRefNumber. To do this, you will first need to create a table with one column, call the table CompanyJobRefNumber_Used and the identity columnCompanyJobRefNumber_ID.

Initially, create the table with CompanyJobRefNumber_ID as [tt]IDENTITY(((YEAR(CURRENT_TIMETAMP) * 1000000000) + 1) , 1)[/tt].

Now you have your job number as an integer. Cast it to character, and STUFF a dash after position 4.

Every January 1 thereafter, you just do this to reset the counter to 1:

SQL:

DBCC CHECKIDENT ("CompanyJobRefNumber_Used", RESEED, ((YEAR(CURRENT_TIMETAMP) * 1000000000) + 1) WITH NO_INFOMSGS;

-- Francis
There are laws to protect the freedom of the press's speech, but none that are worth anything to protect the people from the press.
--Mark Twain

flapeyre · Dec 5, 2012

As I said, before, I don't have SSMS. You'll have to play with the syntax (especially the parentheses, which I see don't look right). But this basically what you do. You can have the SP check the current year, and if that plus 1 million is greater than the MAX(CompanyJobRefNumber_Used), run the DBCC command. Then just INSERT INTO the table and it will create a new number.

-- Francis
There are laws to protect the freedom of the press's speech, but none that are worth anything to protect the people from the press.
--Mark Twain

Katerine · Dec 5, 2012

Hi flapeyre,
Many thanks for the quick response.

Unfortunately, I can think of a couple of business issues with this solution. I blame my description, which wasn't clear enough. Sorry about that...

Here's the behavior that we need:

Let's say the system has 3 companies using it, and the current date is 1/3/2013. The following inserts happen in chronological order. These are the Company reference numbers that should automatically be applied:
[ul][li]Company 1 adds a job that they received today (1/3/2013). This happens to be the second job that Company 1 has entered for 2013. The job automatically gets a job reference number of 13-0000000002
[/li][li]Company 2 adds a job that they received today (1/3/2013). This happens to be the first job that Company 2 has entered for 2013. The job automatically gets a job reference number of 13-0000000001. (Again, different companies are completely insulated, so reference numbers should only be unique within each company, not globally).
[/li][li]Company 3 adds a job that they received today (1/3/2013). This happens to be the first job that Company 3 has entered for 2013. The job automatically gets a job reference number of 13-0000000001.
[/li][li]Company 3, working down their pile of papers on their desks, then adds a job that they received last week (12/29/2012) but didn't get around to entering until today (1/3/2012). This happens to be the 704,298th job that Company 2 has entered for 2012. The job automatically gets a job reference number of 12-0000704298.[/li][/ul]

Also, all of this needs to happen automatically. It's for a website, so reseeding manually probably isn't a good idea. [thumbsdown]

So that's the needed behavior.

I know how to get these results with aggregate queries, but I'm sure there must be a more efficient way than that! Isn't there? [ponytails]

I would also like to mention that I'm not married to the reference number being stored in a single field - it can always be formatted for display after the fact. If there's a more efficient way that utilizes a number that's stored as two separate integer fields, then I'm definitely open to doing that.

Again, many thanks for the suggestion; I just don't think it will work with what we need. Any other ideas? Many thanks again!

Katie

flapeyre · Dec 5, 2012

Okay. In that case, maybe just store your Company ID, year, and ID number in that table. Forget the identity column.

On the front end, you can have a dropdown for the year which defaults to the current year (but allow them to change it to the previous year).

Code:

declare @Company int
, @Job_ID int
, @Year int


if not exists (select 1 from CompanyJobRefNumber_Used where CompanyJobRefNumber_Used.Company_ID = @Company and CompanyJobRefNumber_Used.Year = @Year)
begin
insert into CompanyJobRefNumber_Used(Company_ID, Year, JOB_ID) values(@Company, @Year, 0)
end
set @Job_ID = (SELECT MAX(CompanyJobRefNumber_Used.JOB_ID) + 1  from CompanyJobRefNumber_Used where CompanyJobRefNumber_Used.Company_ID = @Company and CompanyJobRefNumber_Used.Year = @Year)

insert into CompanyJobRefNumber_Used(Company_ID, Year, JOB_ID) values(@Company, @Year, @Job_ID)

-- Francis
There are laws to protect the freedom of the press's speech, but none that are worth anything to protect the people from the press.
--Mark Twain

Thargy · Dec 5, 2012

Katerine,

I disagree with the approach because you're making a compound field for no good reason.

First of all, using a surrogate primary key seems fine, because the database is taking care of it for you. In order to avoid using a primary key as a programming value, duplicate the field to another integer column.
That way you can change primary and foreign keys without altering customer's data.

Joining the date to a number is just daft. Put the date in a date field, and the integer in an integer field. When you want to find the jobs added for a company in 2012, then the where clause just has the date in it.
If you do it the way you're intending, you have to split the field into the date part and the number, and then query by date. Why convert a date to a string, when you want to do date-based operations?
Also, indexing won't work on the compound field with any efficiency, compared with indexing an integer and a date.

As a general design principle, each table should have one (and only one purpose), and each field within a table should have one (and only one) purpose.
A date and an integer together is a bastardisation of two perfectly sensible separate fields. Don't do it.
How can you validate a date, when it's stored as a string? I know, you can write your own functions. Since your date is wrapped up with other bits, you can no longer use all the built-in date functions, can you?
How do you validate an integer, when it's stored as a string? I know, you can write your own function.....

If you store things as what they really are, the database does the donkey work for you. Ever tried to put a date in an integer field, or vice versa. The database won't let you do it. Correct data typing automatically and for no effort guarantees your data.
Why make it hard?

Regards

T

gmmastros · Dec 5, 2012

Personally, I would go with the aggregate functions. It's ugly, and you'll need to be careful with multiple jobs being created simultaneously (so you'll need to use transactions).

What I would do is....

I would store the CompanyId, YearNumber, and JobNumber as integers in the database. I would also make sure to have an index that includes those 3 columns. Strictly speaking, this one index would be considered a covering index for the aggregate query because all of the information you need for the query would be contained within the index data.

Then, I would create a computed column for the CompanyJobRefNumber. The advantage with the computed column is that you'll be able to change the number of zeros without needing to change the underlying data.

Ex:

TableName
CompanyId Int,
YearNumber Int,
JobNumber Int,
Description VarChar(100),
StartDate DateTime,
etc.....

Alter Table YourTableName Add CompanyJobRefNumber As Convert(VarChar(4), Year) + '-' + Right('00000000000000000000' + Convert(VarChar(20), JobNumber), 20)

Create Unique index idx_CompanyJobNumber On YourTableName(CompanyId, YearNumber, JobNumber)

Then.....

Code:

Select Max(JobNumber) + 1 As NextJobNumber
From   YourTableName
Where  CompanyId = X
       And Year = Y

This should perform very well even with millions and millions of rows in the table. If you were to go with the string manipulation method (where you have a string for the CompanyJobRefNumber, then the aggregate would likely be a lot slower).

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Katerine · Dec 5, 2012

Many thanks for the replies, flapeyre, Thargy, and gmmastros! :-D

flapeyre and gmmastros: I'll take a closer look at the code you provided later, but on first glance, it looks like it would work. Many thanks again!

Thargy: I think you may have misinterpreted my post. As I mentioned in my initial post, this field is not the primary key!

The PK is just an integer, and is meant to be completely invisible to the outside world (or as near as possible). In contrast, the company reference number is set according to existing rules set by the company I'm working for. Its purpose is a human purpose (much like first name, last name, etc.); it gives the companies something to print on their reports and refer to specific jobs unobscurely within the company, and to their clients. They have their own ideas about what format for reference codes is useful to them.

Anyway, many thanks again, all of you, for your responses!

Katie

Katerine · Dec 5, 2012

Sorry to continue to bug you guys, but I just thought of a couple other questions:

- Is TOP 1 in a query with partitioning faster than MAX?

- Theoretically, I suppose it might be possible for two people from the same company to run the Insert procedure at the same time. How would I go about ensuring that one insert statement waits for the other to finish before it starts?

Thanks again!

Katie

Katerine · Dec 5, 2012

Sorry again about the multiples. I just found gmmastros's comment about transactions, and I think a partitioning query probably would be faster, so never mind the questions in my last post (unless, of course, I'm wrong about the partitioning thing, in which case please feel free to correct me!)

Katie

gmmastros · Dec 5, 2012

What do you mean by a partitioning query?

It is entirely possible that a TOP 1 query will be faster than a Max query in this situation. The change is relatively minor, therefore it should be simple to try it both ways to see which is faster.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Katerine · Dec 5, 2012

I meant a "SELECT ROW_NUMBER() OVER(PARTITION BY ...) ORDER BY ..." query... but now that I've actually looked at it, I realize that it's overkill for these purposes. [dazed]

It's just that there have been so many times since I discovered it that it's been extremely useful, that I just kind of assumed that it would be here, too!

Katie

gmmastros · Dec 6, 2012

That's what I thought you meant, but wasn't sure.

Using Row_Number for this functionality would be really bad because when the Row_number and the Job number get out of synch, you will have problems.

Suppose there are 10 row, and someone delete's the 3rd. The row_number for the next one would be 10 (because after the delete there would only be 9 rows). The Top 1 (or aggregate) method would still work correctly.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Question for gurus: Most efficient way to handle compound incremental numbers? 1

Katerine

Programmer

flapeyre

Programmer

flapeyre

Programmer

Katerine

Programmer

flapeyre

Programmer

Thargy

Technical User

gmmastros

Programmer

Katerine

Programmer

Katerine

Programmer

Katerine

Programmer

gmmastros

Programmer

Katerine

Programmer

gmmastros

Programmer

Similar threads

Part and Inventory Search

Sponsor