Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pseudonymisation function 5

Status
Not open for further replies.

UHNSTrust

Technical User
Dec 2, 2003
262
GB
I am new to SQL server so am looking for a bit of guidance (and hopefully in simple terms).

I have to create a pseudonymisation function to turn a 10 digit number into a 10 digit alpha string. This is then has to have an unlock function to be able to turn it back.

Basically it needs to be the first digit of the number can be 0 to 9. This would then equal the first to tenth digit of the alphabet (number 2 would change to the letter C). The next number digit would be the number of the letter between the eleventh and twentieth letter of the alphabet (number 3 would be the letter N). From then on the alphabet would have to repeat (i.e. U,V,W,X,Y,Z,A,B,C,D etc) to give the third number 7 = B (i.e. 28th letter).

Hope that makes sense. Can anybody point me in the right direction as I'm a total noob. I have to be able to pass the 10 digit number into a function to then return the output. This has to be able to be used in queries.

Thanks in advance for any help.
JP
 
Pehaps you could provides some examples and expected results.

Simi
 
So someone is going to give you a number like
3589634210
and you want to return
DPCNUBMUDM
?

Simi

 
CREATE TABLE [dbo].[convert](
[place] [int] NULL,
[number] [int] NULL,
[Character] [char](1) NULL
) ON [PRIMARY]

GO


INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,0,'a')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,1,'b')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,2,'c')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,3,'d')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,4,'e')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,5,'f')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,6,'g')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,7,'h')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,8,'i')
INSERT INTO [dbo].[convert]
([place],[number],[Character]) VALUES (1,9,'j')

select * from [dbo].[convert]

declare @test int

select @test = cast(substring('3589634210',1,1) as int)

print @test

select upper([Character])
from [dbo].[convert]
where Number=@test

Here is the first position in the number you would just need to complete table and concantiate the rest of the positions to a string.

Simi
 
Here's another way...

Code:
Declare @Input VarChar(10)

Set @Input = '3589634210'

Select	Convert(VarChar(10), 
          (  Select Char(65 + ((Num - 1) * 10 + SubString(@Input, Num, 1)) % 26)
             From   (Select 1 As Num
                     Union All Select 2
                     Union All Select 3
                     Union All Select 4
                     Union All Select 5
                     Union All Select 6
                     Union All Select 7
                     Union All Select 8
                     Union All Select 9
                     Union All Select 10
                     ) As Numbers
             For XML Path('')
            ))


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Simi/George,

Thanks for responding so quickly. I was just trying to expand on Simi's method when George's came through and seemed to solve it completely.

Thank you both for your help. I can use it but need to spend the time to understand it!

JP
 
I suppose I should explain how this works....

The first thing I do is split the string of characters in to multiple rows. This allows me to easily work on each character separately. To do this, I use a numbers table and the substring function.

First, the numbers table. If you run this in a query window, you will see a simple list of numbers.

Code:
Select Num
From   (Select 1 As Num
         Union All Select 2
         Union All Select 3
         Union All Select 4
         Union All Select 5
         Union All Select 6
         Union All Select 7
         Union All Select 8
         Union All Select 9
         Union All Select 10
         ) As Numbers

I then add in the input variable and use the substring function, like this.

Code:
Declare @Input VarChar(10)

Set @Input = '3589634210'

Select SubString(@Input, Num, 1)
From   (Select 1 As Num
         Union All Select 2
         Union All Select 3
         Union All Select 4
         Union All Select 5
         Union All Select 6
         Union All Select 7
         Union All Select 8
         Union All Select 9
         Union All Select 10
         ) As Numbers

Now, when you run this, you will notice that each character in the string is separated on a different row.

For the next step, we need to realize that each character in the string starts (in the alphabet) 10 positions later than the previous character. Since Num starts at 1, and we want our first position to start at 0, we subtract 1 from num and multiply by 10.

Code:
Select (Num - 1) * 10,
       SubString(@Input, Num, 1)
From   (Select 1 As Num
         Union All Select 2
         Union All Select 3
         Union All Select 4
         Union All Select 5
         Union All Select 6
         Union All Select 7
         Union All Select 8
         Union All Select 9
         Union All Select 10
         ) As Numbers

When you run the code above, you will see that the first column in the output is 0, then 10, 20, etc...

Next, we can add the 2 columns together to get the alpha character offset for each input character, like this:

Code:
Declare @Input VarChar(10)

Set @Input = '3589634210'

Select (Num - 1) * 10 + SubString(@Input, Num, 1)
From   (Select 1 As Num
         Union All Select 2
         Union All Select 3
         Union All Select 4
         Union All Select 5
         Union All Select 6
         Union All Select 7
         Union All Select 8
         Union All Select 9
         Union All Select 10
         ) As Numbers

Notice that our alpha character positions exceed the alphabet (26 characters). The next step is to use the MOD function, like this:

Code:
Declare @Input VarChar(10)

Set @Input = '3589634210'

Select ((Num - 1) * 10 + SubString(@Input, Num, 1)) [!]% 26[/!]
From   (Select 1 As Num
         Union All Select 2
         Union All Select 3
         Union All Select 4
         Union All Select 5
         Union All Select 6
         Union All Select 7
         Union All Select 8
         Union All Select 9
         Union All Select 10
         ) As Numbers

This will give us a list of numbers in the range of 0 to 25 corresponding to the position of the letter in the alphabet. The next step is to convert this to the actual letter. 65 is the ASCII number for the letter A. We can add 65 to each number in our list and then convert to ASCII. The CHAR function converts a number to it's ASCII equivalent. Like this:

Code:
Declare @Input VarChar(10)

Set @Input = '3589634210'

Select [!]CHAR(65 + [/!]((Num - 1) * 10 + SubString(@Input, Num, 1)) % 26[!])[/!]
From   (Select 1 As Num
         Union All Select 2
         Union All Select 3
         Union All Select 4
         Union All Select 5
         Union All Select 6
         Union All Select 7
         Union All Select 8
         Union All Select 9
         Union All Select 10
         ) As Numbers

When you run this, you will see that each input character is converted to the appropriate letter. The final step in the process is to concatenate this back in to one string, that's where the FOR XML part comes in.

Code:
Declare @Input VarChar(10)

Set @Input = '3589634210'

Select	Convert(VarChar(10), 
          (  Select Char(65 + ((Num - 1) * 10 + SubString(@Input, Num, 1)) % 26)
             From   (Select 1 As Num
                     Union All Select 2
                     Union All Select 3
                     Union All Select 4
                     Union All Select 5
                     Union All Select 6
                     Union All Select 7
                     Union All Select 8
                     Union All Select 9
                     Union All Select 10
                     ) As Numbers
             For XML Path('')
            ))

This is a relatively complicated process, so I thought a rather lengthy explanation was in order. If you have any questions about this, please let me know.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Dang, George, no stars for that post?

+1 for ya; nice job!

PH

-------++NO CARRIER++-------
 
I don't care about stars, but I do like to be appreciated. Thanks.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I don't think so. Why?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hey George,

How you like this....

Declare @Input VarChar(10)

Set @Input = '3589634210';

WITH CTE AS
(
SELECT 1 AS Num
UNION ALL
SELECT Num + 1 -- add month number to 1 recursively
FROM CTE
WHERE Num < 10 -- just to restrict the number up to 10
)
Select Convert(VarChar(10),
( Select Char(65 + ((Num - 1) * 10 + SubString(@Input, Num, 1)) % 26)
FROM CTE
For XML Path('')
))


Got some inspiration from


Simi
 
Simi said:
Hey George,

How you like this....

Simi,

Would it surprise you to know that recursion is slow? Generally speaking, I try to avoid recursion because of the performance. For complicated things it may be necessary and it may actually improve performance because the alternative is messy. In this particular case, recursion isn't necessary.

I just tested the performance. I ran this against my development server. Everyone is gone for the day except me, so there is no other activity going on with this server. It's running SQL2005 standard edition.

I have a numbers table in my database with 100,000 rows. I created 2 separate functions. One with your query and another with mine, like this:

Code:
Create Function dbo.Pseudonymisation_gmmastros(@Input VarChar(10))
Returns VarChar(10)
AS
Begin
Declare @Output VarChar(10)
Select    @Output = Convert(VarChar(10), 
          (  Select Char(65 + ((Num - 1) * 10 + SubString(@Input, Num, 1)) % 26)
             From   (Select 1 As Num
                     Union All Select 2
                     Union All Select 3
                     Union All Select 4
                     Union All Select 5
                     Union All Select 6
                     Union All Select 7
                     Union All Select 8
                     Union All Select 9
                     Union All Select 10
                     ) As Numbers
             For XML Path('')
            ))
	Return @Output
End

GO
Create Function dbo.Pseudonymisation_Simian336(@Input VarChar(10))
Returns VarChar(10)
AS
Begin
Declare @Output VarChar(10)
;WITH CTE AS
(
      SELECT 1 AS Num
      UNION ALL
      SELECT Num + 1 -- add month number to 1 recursively
      FROM CTE
      WHERE Num < 10 -- just to restrict the number up to 10
)
Select    @Output = Convert(VarChar(10), 
    (  Select Char(65 + ((Num - 1) * 10 + SubString(@Input, Num, 1)) % 26)
     FROM CTE
     For XML Path('')
     ))
     Return @Output
End

Notice that both functions have the same input and output parameters. Each one set's an internal output variable based on each (different) query, and then returns the output variable. I purposefully made the code as similar as possible so that we could measure (comparatively) the difference in execution speed.

After creating the functions, I then ran this code.

Code:
Declare @Start DateTime,
		@Junk VarChar(10)
---------------
Set @Start = GetDate()

Select	@Junk = dbo.Pseudonymisation_Simian336(Num)
From	Numbers

Select 'Simian336', DateDiff(Millisecond, @Start, GetDate())

---------------

Set @Start = GetDate()

Select	@Junk = dbo.Pseudonymisation_gmmastros(Num)
From	Numbers

Select 'gmmastros', DateDiff(Millisecond, @Start, GetDate())
----------------

Notice the @Junk variable. I added this to the test code so that 100,000 results for each query would not be displayed in the query window, which could affect performance if there was extra network activity going on in my network.

The results were this:

[tt]
Simian336 11330
gmmastros 1056
[/tt]

The only real difference here is that your code is using recursion to make a list of numbers from 1 to 10, and my code is essentially hard-coding that list.

Please don't misunderstand the intent of this post. I am NOT claiming that my code is the fastest possible. My only intent is to demonstrate that recursion is slower than hard coding a list.

I also created a version of the function that used the numbers table I have instead of hard coding the list. That function took approximately 2000 milliseconds (twice as slow as the hard coded list). The reason it's slower is because the function is accessing the numbers table 100,000 times. You'll hear a lot of people complaining about user defined functions being slow. This is true if the UDF is accessing a table. However, if you are only performing calculations, UDF performance is actually pretty good. For example, my function took 1056 milliseconds for 100,000 executions, which is 1.056 seconds for 100,000 executions or 10.56 micro-seconds per execution.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hey George,

Yep, I can see the performance ramifications. I was considering the possiblity of varying length input.

Thanks for the follow up.

Simi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top