Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Multiple nth based on count of one field in table

Status
Not open for further replies.

ladyhawkfl

Technical User
Jan 31, 2005
9
US
I have a table of 25,000 records and need to nth based on zip5 field with variable quantities per zip5. I've run a select distinct to provide the actual quantities per zip5, but need to nth each group with a unique nth value for each. Any suggestions, for the nthing, other than running each individually?
 
Maybe others might understand what you are after, but I am not clear on what the "nth" or "unique nth value" is referring to.

Maybe is you can give us a simple example it would become clear and we could better assist you.

Good Luck,
JRB-Bldr
 
Do you perhaps mean that you've got a list of unique zip5 fields, and you now want, say, every 5th record (where n = 5) in Zip5 order?

If so, would this meet your needs:

Code:
lnN = 5  && or whatever value you want for n
SELECT Zip5 FROM MyTable ORDER BY Zip5 INTO csrTemp
SELECT Zip5 FROM csrTemp WHERE MOD(RECNO("csrTemp"), lnN) = 0

If I've misunderstood your question (and I probably have), please clarify it.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Sorry for being so vague. I have a distinct count for each zip5 value from my main file. My associate has provided me with a count of the records she wants for each zip5, which means I need to randomly nth each group of records with a zip5 value to her requested quantity so that I will only have a total of 10000 records.

Example:
Zip5 Distinct Count Desired Nth Quantity
10001 3260 1304
10016 2178 871
10017 2394 958
10018 3006 1202
10019 2119 848
10022 2394 958
10036 2244 898
33166 2089 836
92121 1791 716
92618 1814 726
92660 1711 684
Totals 25000 10000
 
I try to understand your request.

Do you mean your table have 25000 records.
3260 of them have Zip5 = 10001
2178 of them have Zip5 = 10016 and so on.

You want to extract 10000 records from these 25000, this way:
Only 1304 with Zip5 = 10001
Only 871 with Zip5 = 10016 and so on

Now you want to extract the first 1304 from the 3260 (having Zip5 = 10001), or to choose randomly those 1304?

Respectfully,
Vilhelm-Ion Praisach
Resita, Romania

 
I will also see if I understand what you want using only a single Zip5 value.

You have:
Zip5 Distinct[tab]Count[tab]Desired Nth Quantity
10001[tab][tab][tab][tab]3260[tab][tab][tab]1304

So there are a total of 3260 records with Zip5 = '10001'
* You either want the 1304th single record out of that group for Zip5 = '10001'.
* Or you want 1304 records from that larger group (random or sequential).
* Or something else?

Once we are clear what you are after, we can readily assist you.

Good Luck,
JRB-Bldr

 
I need to randomly nth each zip5 to the desired quantity.
 
So there are a total of 3260 records with Zip5 = '10001'
* You either want the 1304th single record out of that group for Zip5 = '10001'.
* Or you want 1304 records from that larger group (random or sequential).
* Or something else?

I think what he wants is 1304 records from the group in which Zip5 = '10001'; and so on with the other groups. That would make sense. It looks like this is for a mail shot test, where they want to try mailing to sample of different geographical groups.

The thing I don't understand is the statement: "I need to randomly nth each group of records with a zip5 value". That is a contradiction. Either you pick your sample at random, or you pick it by taking every, say, 5th record from the group - not both.

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
OK. Here is a possible approach. This is totally off the top of my head, and not tested. Obviously, you'll need to modify it to reflect your field names, etc.

Code:
* First,put your table in random order
SELECT RAND() AS lRand, Name, Address, City, State, Zip5 FROM MainTable ;
  ORDER BY 1 INTO CURSOR csrOrdered

* Create a cursor to hold the final results
CREATE CURSOR csrResults (Name C(15), Address C(15), City C(15), State C(2), Zip5 C(5))

* Loop through your Zip5 table; this is the table that gives the 
* target count for each Zip5
SELECT Zip5
SCAN
  lnTarget = Zip5.Target    && No. of records needed for this Zip5
  lcZip = Zip5.Zip5 
  
  * Create a SQL command to extract first N records for current Zip5
  lcCommand = "SELECT TOP " + TRANSFORM(lnTarget) + ;
    " Name, Address, City, State, Zip5 FROM csrOrdered" + ;
    " ORDER BY lRand " + ;
    " WHERE Zip5 = " + lcZip + " INTO CURSOR csrTemp"
    
  * Execute that command
  &lcCommand
  
  * Transfer the records thus selected to your results cursor
  INSERT INTO csrResults SELECT * FROM csrTemp  

ENDSCAN

Remember, this is un-tested. Try to understand what the code does rather than blindly paste it into your program. You are likely to get syntax errors and data type mismatches. Please try to fix these yourself before complaining that the code doesn't work.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Here is my solution.
Randomly select a requested number of rows, per group of data.

Code:
* Create a table with 25000 rows, approximately with your distribution
RAND(-1)
CREATE TABLE MyTable (zip5 I,cWhatEver C(10))
FOR lni = 1 TO 25000
	lnRand = RAND()
	IF m.lnRand <= 3260 / 25000
		INSERT INTO MyTable VALUES (10001,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178) / 25000
		INSERT INTO MyTable VALUES (10016,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394) / 25000
		INSERT INTO MyTable VALUES (10017,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006) / 25000
		INSERT INTO MyTable VALUES (10018,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006 + 2119) / 25000
		INSERT INTO MyTable VALUES (10019,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006 + 2119 + 2394) / 25000
		INSERT INTO MyTable VALUES (10022,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006 + 2119 + 2394 + 2244) / 25000
		INSERT INTO MyTable VALUES (10036,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006 + 2119 + 2394 + 2244 + 2089) / 25000
		INSERT INTO MyTable VALUES (33166,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006 + 2119 + 2394 + 2244 + 2089 + 1791) / 25000
		INSERT INTO MyTable VALUES (92121,SYS(2015))
	ELSE
	IF m.lnRand <= (3260 + 2178 + 2394 + 3006 + 2119 + 2394 + 2244 + 2089 + 1791 + 1814) / 25000
		INSERT INTO MyTable VALUES (92618,SYS(2015))
	ELSE
		INSERT INTO MyTable VALUES (92660,SYS(2015))
	ENDIF
	ENDIF
	ENDIF
	ENDIF
	ENDIF
	ENDIF
	ENDIF
	ENDIF
	ENDIF
	ENDIF
NEXT

* prepare the original data recno() is used as an Id 
SELECT * FROM MyTable ORDER BY zip5 INTO CURSOR MyCursor
SELECT RECNO() as nRow,* FROM MyCursor INTO CURSOR MyCursor

* create a cursor containing the desired number of rows
SELECT zip5,COUNT(zip5) as no,MIN(nRow) as minRow,MAX(nRow) as maxrow FROM MyCursor GROUP BY 1 INTO CURSOR MyCountCursor
CREATE CURSOR MyTarget (zip5 I,Count I)
INSERT INTO MyTarget VALUES (10001 , 1304)
INSERT INTO MyTarget VALUES (10016 , 871)
INSERT INTO MyTarget VALUES (10017 , 958)
INSERT INTO MyTarget VALUES (10018 , 1202)
INSERT INTO MyTarget VALUES (10019 , 848)
INSERT INTO MyTarget VALUES (10022 , 958)
INSERT INTO MyTarget VALUES (10036 , 898)
INSERT INTO MyTarget VALUES (33166 , 836)
INSERT INTO MyTarget VALUES (92121 , 716)
INSERT INTO MyTarget VALUES (92618 , 726)
INSERT INTO MyTarget VALUES (92660 , 684)
SELECT MyCountCursor.*,MyTarget.Count FROM MyCountCursor,MyTarget WHERE MyCountCursor.zip5 = MyTarget.zip5 ORDER BY 1 INTO CURSOR MyCountCursor

* create a cursor containing the randomly extracted rows from MyCursor
CREATE CURSOR RandRow (nRow I)
INDEX on nRow TAG nRow 
SELECT MyCountCursor
SCAN
	FOR lni = 1 TO MyCountCursor.Count
		lnNewVal = MyCountCursor.minRow + INT(MyCountCursor.no*RAND())
		DO WHILE SEEK(m.lnNewVal,'RandRow')
			lnNewVal = MyCountCursor.minRow + INT(MyCountCursor.no*RAND())
		ENDDO
		INSERT INTO RandRow VALUES (m.lnNewVal)
	NEXT
	SELECT MyCountCursor
ENDSCAN

* Generate the new table

SELECT MyCursor.Zip5,MyCursor.cWhatEver FROM MyCursor,RandRow WHERE MyCursor.nRow = RandRow.nRow INTO TABLE MyNewTable 
* Some tests
*!*	SELECT zip5,COUNT(zip5) as no FROM MyNewTable  GROUP BY 1 
*!*	SELECT distinct * FROM mynewtable

* Some additional tests
*!*	SELECT MyCursor.* FROM MyCursor,RandRow WHERE MyCursor.nRow = RandRow.nRow INTO CURSOR cResult
*!*	SELECT zip5,COUNT(zip5) as no FROM cResult GROUP BY 1

Respectfully,
Vilhelm-Ion Praisach
Resita, Romania

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top