Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Random numbers: reloaded

Random Numbers

Random numbers: reloaded

by  vongrunt  Posted    (Edited  )
From time to time features involving some sort of "randomness" are necessity. Although SQL Server is capable of going random, things don't always behave as expected - at least compared to procedural languages. Purpose of this FAQ is to explore common pitfalls and provide code examples that work in a set-based fashion - with minimal looping of any kind and no dreaded cursors at all.


RAND() fundamentals

OK. We know there is RAND() function. What next?

According to Books Online, RAND() returns "random float value from 0 through 1". Apparently this means [0, 1) - 0.000 can be generated (albeit very rarely) but 1.000 can not. By using simple linear math this interval can be translated/stretched to suit real-life needs. Here are two common examples:

Code:
-- random float from 0 up to 20 - [0, 20)
SELECT 20*RAND()

-- random float from 10 up to 30 - [10, 30)
SELECT 10 + (30-10)*RAND()

When it comes to random integers, things are a little bit different:

Code:
-- random integer between 0 and 20 - [0, 20]
SELECT CONVERT(int, (20+1)*RAND())

-- random integer between 10 and 30 - [10, 30]
SELECT 10 + CONVERT(int, (30-10+1)*RAND())

Why +1 stuff? Number of integers between 0 and 20 (including boundaries) is 21, not 20. Without added +1 RAND() would never generate value 20. This way it has an equal chance to be generated as 0, 5, 13 or any other number within specified interval.


Function determinism

When called with same input arguments (if any), some functions always return the same value. These functions are deterministic. Some functions don't behave that way and are therefore nondeterministic. Naturally, RAND() belongs to second group. Because of "unpredictable" output such functions cannot be used for creation of persistent DB objects, including indexed views and indexes over computed column.

Arguably no less important, certain ND functions cannot be used within user-defined functions (UDFs). List includes RAND(), GETDATE(), GETUTCDATE() and NEWID(). Why is that so, we can only guess [borg2]. Anyway, wanna write function called fn_rollDice - forgetaboutit :(. *

* Valid claim for SQL Server 2000. Behaviour is changed in 2005; GETDATE() in UDFs is now allowed. [smile].

Seed value

When supplied with optional integer seed value, RAND() shows deterministic behaviour. To illustrate:

Code:
SELECT RAND(2)
WAITFOR DELAY '00:00:02'
SELECT RAND(2)
USE Northwind
SELECT RAND(2)
USE myDB
While generated number appears to be random, it is in fact pseudo-random. This determinism can be used for generating sequence of pseudo-random numbers:
Code:
declare @seed int; set @seed = 391247 
declare @randnum int
declare @t table( randnum int )

DECLARE @cnt int; SET @cnt = 0
WHILE @cnt <=100
BEGIN
	SET @cnt = @cnt + 1
	SET @randnum = CONVERT(int, 1e6 * rand(@seed))
	INSERT INTO @t SELECT @randnum
	SET @seed = @randnum
END

select randnum from @t
In order to reconstruct (remember?) 100 numbers from above, all you have to know is initial seed - and have identical pseudo-random number generator. This opens space for some interesting applications, most of which fall beyond scope of this FAQ.

Another purpose of seed is to "initialize" random number generator with... random value. Kind of shuffling dices before you let 'em roll. Example:

Code:
SELECT RAND(RAND()*999983)

However, with time-based seeds there is always the danger that subsequent calls happen too soon, especially on those 3+ GHz room heaters we're running these days. Example (partially taken from BOL):

Code:
DECLARE @t table( randnum float )

DECLARE @cnt int; set @cnt = 0
WHILE  @cnt <=10000
BEGIN
	set @cnt = @cnt + 1
	INSERT INTO @t
	SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 )
        	   + (DATEPART(ss, GETDATE()) * 1000 )
	           + DATEPART(ms, GETDATE()) )
END

SELECT randnum, count(*)
FROM @t
GROUP BY randnum
As a result subsequent calls may produce identical values - and overall, complete distribution is far from random.


RAND() and set-based operations

Let's make small sample table first:
Code:
CREATE TABLE myTable
(	id int identity(1, 1) primary key, 
	name varchar(20),
	randnum int
)

INSERT INTO myTable(name) VALUES('Joe')
INSERT INTO myTable(name) VALUES('Bill')
INSERT INTO myTable(name) VALUES('Mary')
INSERT INTO myTable(name) VALUES('Fred')
INSERT INTO myTable(name) VALUES('Sue')

Our objective is to assign random number in range [1, 10] to each row individually. Instant answer is usually:

Code:
UPDATE myTable
SET randnum = 1 + 10*RAND()

SELECT * FROM myTable
Ouch. Doesn't work. Optimizer evaluates RAND() only once per query - seed remains unchanged - and all generated values are the same. At this point it seems looping is inevitable.. not.


Using NEWID() and CHECKSUM()

NEWID() function also generates random value. Bad thing is about type of returned value (uniqueidentifier). Good thing is that NEWID() is "super-nondeterministic". It returns different value for each call, even within a single SELECT query. To illustrate:

Code:
SELECT name, NEWID()
FROM myTable

To solve bad thing we must somehow convert GUID to integer seed. That's where CHECKSUM() comes handy:
Code:
UPDATE myTable
SET randnum = 1 + 10*RAND(CHECKSUM(NEWID()))

If you need pseudo-random values that depend on row contents, try CHECKSUM(*):
Code:
UPDATE myTable
SET randnum = 1 + 10*RAND(CHECKSUM(*))

Both functions - especially NEWID() - are very useful for solving various "random" problems.


Random TOP N rows

Simple:
Code:
SELECT TOP 3 *
FROM myTable
ORDER BY NEWID()


Unique random numbers

How to fill above table with random numbers, and ensure there will be no duplicates? Try this:

Code:
SELECT IDENTITY(int, 1, 1) as randid, id 
INTO #tmp
FROM
(	SELECT TOP 100 PERCENT id
	FROM myTable
	ORDER BY NEWID()
) X

UPDATE myTable
SET randnum = #tmp.randid
FROM myTable
INNER JOIN #tmp ON myTable.id=#tmp.id

DROP TABLE #tmp

SELECT * FROM myTable
Here is singleton SELECT INTO statement used to fill temp table with identity column in random order. After that UPDATE with INNER JOIN gives us unique random values back to source table. Generated values fall in range [1, COUNT(*)], as dictated by identity. TOP 100 PERCENT prevents optimizer from being too smart [pipe].

Side note: INSERT with ORDER BY is... well, not "clean" SQL. While theory says that order of insertions is irrelevant, identity column fundamentally breaks theoretical stuff - as long as SQL Server allows such constructs.


Random numbers per group

To make new playground let's duplicate sample data:
Code:
UPDATE myTable
SET randnum = NULL

INSERT INTO myTable (name, randnum)
SELECT name, randnum
FROM myTable
We'll assume that name is grouping column. The goal is to assign different random numbers in range [1, 10] for each group. Numbers inside group must be the same.
Code:
UPDATE T
SET randnum = Y.randnum
FROM myTable T
INNER JOIN
(	SELECT TOP 100 PERCENT name, 1+10*RAND(CHECKSUM(NEWID())) AS randnum
	FROM
	(	SELECT DISTINCT name
		FROM myTable
	) X
	ORDER BY name
) Y ON T.name=Y.name

SELECT * 
FROM myTable
ORDER BY name
Again, TOP 100 PERCENT with explicit ORDER BY gets the job done. Same code without these additions won't produce desired results (aka: try it).

If that looks clunky, the same thing can be done with temp table:
Code:
SELECT DISTINCT name, CONVERT(int, NULL) as randnum
INTO #tmp
FROM myTable

UPDATE #tmp
SET randnum = 1+10*RAND(CHECKSUM(NEWID()))

UPDATE myTable
SET randnum = #tmp.randnum
FROM myTable
INNER JOIN #tmp on myTable.name=#tmp.name

DROP TABLE #tmp


------------------
Revision history
2005/08/31 - typos, typos, typos
2005/08/30 - initial draft. Added some smileys :p
2006/02/12 - minor addendum about GETDATE()/UDF changes in SQL2005
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top