I need to extract a random set of records from a table.
How can i do this in the quickest way?
Te source table is in the order of 1.000.000 records and i whish to extract about 5.000-10.000 records.
Thanks for help
ASSUMING you have a unqiue numeric ID on the Table somewhere then you can Generate a Set of 5,000 - 10,000 numeric IDs and then just join the tables and extract the records. (IF you don't I guess as a one off you could just add them)
Here is code to Generate Random int ID's and then you just join the two tables and hey presto you have your random selection.
Oh by the way you will get some errors due to sometimes the RAND will select the same number hence the retry and primary key, this is due to RAND being executed within the same batch / millisecond etc....
-- Create Temp Table for id's Add Key so we get and error to
DROP TABLE #RandIDs
CREATE TABLE #RandIDs(IDS INT PRIMARY KEY)
-- Declare and init variable
DECLARE @i int
SET @i = 0
-- Process for the number of Unique ID's we want the 5000 could be 10000 or what ever number you want
WHILE (@i < 5000)
BEGIN
retry:
INSERT INTO #RandIDs
SELECT convert(INT,(rand() * 1000000)) + 1
OK... i tworks... but the problem is that i haven't to extract the records from the entire table, i have to do the extractions n times from n subrecordsets (each of about 1M records). To do that i've extracted every time the subset of records i need to consider and inserted it in a temp table with the unique numeric ID... but it works too slow!
Have you looked in the FAQ area of this forum? Topic #6 has three entries for generating Random Numbers. You may find something help there though much of the code is similar to Raj's code.
Just click on the FAQ link in the bottom right corner of this text box. Terry L. Broadbent Programming and Computing Resources
Ok, well i have joined and am no longer a visitor.
Max I assume you don't have an ID on the records already and that is why it is taking so long..... because your creating a temp table with 100000 records to number them.
I would suggest a cursor but that would just take as long...
I am puzzled, so if you could provide any further info about indexes on the table and any Primary keys you may have, and I will see if I can think of something.
However at present that is all I have.... and it works for me on my large database.
In answer to the why, well you sometimes need to sample a random selection of customers. Not you best, Not you worst just a random selection.
I have a table of contracts. Each contract has a unique number (the primary key), a birth date and an end date: i have to extract a percentage of contracts that are "active" (alive) each month of a 5-years period and that have not been selected in the previous extraction (the contracts already selected doesn't have to be counted in order to calculate the number of records to extract in the next periods, they only have to be flagged, then it's like they doesn't exist).
I hope you can understand from my poor englis... i'm sorry if i wasn't clear enough.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.