I am looking for an elegant solution to the following data validation constraint requirement.
The National Health Service (NHS)in the U.K. uses unique identifiers for every individual in the country, called NHS Numbers. These are validated according to the following rule:-
To summarise, the validation rules are that either the check digit matches the calculated check value, the check sum was 10 and the number is disallowed, or the calculated check value was 11 and the check digit is zero. Strings of repeated numbers are disallowed.
I have implemented this as follows:-
My question is this. Can anyone think of a more elegant solution than my last whopper of a constraint? The sheer amount of text in it is daunting. What is the limit on the size of the text in a constraint, and am I in danger of reaching this limit?
Second, I detest implicit casts, and view them as a bug in code. I always explicitly say "CAST" if I want to do so, but it would be ridiculously cumbersome to do so in the constraint. There is a lot of implicit casting from numbers to chars and back again. Can anyone think of a better way to achieve the constraint condition without such bad practice?
Regards
Tharg
Grinding away at things Oracular
The National Health Service (NHS)in the U.K. uses unique identifiers for every individual in the country, called NHS Numbers. These are validated according to the following rule:-
Code:
NHS NUMBER VALIDATION
The NHS number comprises 10 digits. The first nine are the
identifier and the tenth is a check digit used to confirm the
number's validity. The check digit is calculated using the Modulus
11 algorithm. There are four steps in the calculation:
• Step 1 - multiply each of the first nine digits by a weighting
factor as follows;
Digit Position
(Starting from the left)
Factor
1 10
2 9
3 8
4 7
5 6
6 5
7 4
8 3
9 2
• Step 2 - add the results of each multiplication together
• Step 3 - divide the total by 11 and establish the remainder
• Step 4 - subtract the remainder from 11 to give the check digit
[COLOR=blue]There are two occasions where the check digit calculation process
must be modified slightly:
• if the result of step 4 is 11 then a check digit of 0 is used
• if the result of step 4 is 10 then the number is invalid and not
used.[/color]
Example: Suppose the first nine digits of the number are 401 023 213
• Step 1 - apply weighting factors
Digit
Position
Value Factor Result
1 4 x 10 = 40
2 0 x 9 = 0
3 1 x 8 = 8
4 0 x 7 = 0
5 2 x 6 = 12
6 3 x 5 = 15
7 2 x 4 = 8
8 1 x 3 = 3
9 3 x 2 = 6
• Step 2 - add the results of each multiplication together
40 + 0 + 8 + 0 + 12 + 15 + 8 + 3 + 6 = 92
• Step 3 - divide the total by 11
92 ÷ 11 = 8, remainder 4
• Step 4 - subtract the remainder from 11 to give the check digit
11 - 4 = 7
The complete NHS Number in this example is therefore: 401 023 2137
[COLOR=blue]Please note there are some restrictions
Constant number strings such as 444 444 4444, 666 666 6666 etc are
not issued.[/color]
So, although they pass the checksum, they are invalid.
The reason for excluding numbers based on a single repeating digit is to safeguard the integrity of the numbering system and prevent potential
mistaken use or abuse.
Source
Newcastle Primary Care Trust 6
Data Quality Policy – NHS Number Procedure (PCTP(A)04b)
To summarise, the validation rules are that either the check digit matches the calculated check value, the check sum was 10 and the number is disallowed, or the calculated check value was 11 and the check digit is zero. Strings of repeated numbers are disallowed.
I have implemented this as follows:-
Code:
CREATE TABLE NHS_CONSTRAINT_TEST
(NHS_NUMBER NUMBER(10,0));
/* Disallow ten repeated digits*/
ALTER TABLE NHS_CONSTRAINT_TEST
ADD CONSTRAINT CHK_DIGITS_MUST_DIFFER
CHECK (NHS_NUMBER != RPAD(SUBSTR(NHS_NUMBER,1,1),10,SUBSTR(NHS_NUMBER,1,1)));
/* Apply all the other validation rules */
ALTER TABLE NHS_CONSTRAINT_TEST
ADD CONSTRAINT CHK_VALID_TENTH_DIGIT
CHECK (
/* If checksum is 10, don't use the number */
(MOD(SUBSTR(NHS_NUMBER,1,1)*10 +
SUBSTR(NHS_NUMBER,2,1)*9 +
SUBSTR(NHS_NUMBER,3,1)*8 +
SUBSTR(NHS_NUMBER,4,1)*7 +
SUBSTR(NHS_NUMBER,5,1)*6 +
SUBSTR(NHS_NUMBER,6,1)*5 +
SUBSTR(NHS_NUMBER,7,1)*4 +
SUBSTR(NHS_NUMBER,8,1)*3 +
SUBSTR(NHS_NUMBER,9,1)*2,11) != 1)
OR
/* If number is exactly divisible by 11, check digit = 0 */
(MOD(SUBSTR(NHS_NUMBER,1,1)*10 +
SUBSTR(NHS_NUMBER,2,1)*9 +
SUBSTR(NHS_NUMBER,3,1)*8 +
SUBSTR(NHS_NUMBER,4,1)*7 +
SUBSTR(NHS_NUMBER,5,1)*6 +
SUBSTR(NHS_NUMBER,6,1)*5 +
SUBSTR(NHS_NUMBER,7,1)*4 +
SUBSTR(NHS_NUMBER,8,1)*3 +
SUBSTR(NHS_NUMBER,9,1)*2,11) = 11
AND SUBSTR(NHS_NUMBER,10,1) = 0)
OR
/* Check digit matches calculated checksum */
(11-MOD(SUBSTR(NHS_NUMBER,1,1)*10 +
SUBSTR(NHS_NUMBER,2,1)*9 +
SUBSTR(NHS_NUMBER,3,1)*8 +
SUBSTR(NHS_NUMBER,4,1)*7 +
SUBSTR(NHS_NUMBER,5,1)*6 +
SUBSTR(NHS_NUMBER,6,1)*5 +
SUBSTR(NHS_NUMBER,7,1)*4 +
SUBSTR(NHS_NUMBER,8,1)*3 +
SUBSTR(NHS_NUMBER,9,1)*2,11) = SUBSTR(NHS_NUMBER,10,1)));
/* Sample data, first row fails (repeated digits), second should succeed */
INSERT INTO NHS_CONSTRAINT_TEST (NHS_NUMBER) VALUES (1111111111);
INSERT INTO NHS_CONSTRAINT_TEST (NHS_NUMBER) VALUES (4010232137);
My question is this. Can anyone think of a more elegant solution than my last whopper of a constraint? The sheer amount of text in it is daunting. What is the limit on the size of the text in a constraint, and am I in danger of reaching this limit?
Second, I detest implicit casts, and view them as a bug in code. I always explicitly say "CAST" if I want to do so, but it would be ridiculously cumbersome to do so in the constraint. There is a lot of implicit casting from numbers to chars and back again. Can anyone think of a better way to achieve the constraint condition without such bad practice?
Regards
Tharg
Grinding away at things Oracular