Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Theory Help 4

Status
Not open for further replies.

Scott24x7

Programmer
Jul 12, 2001
2,826
JP
Hi All,
I'm having some difficulty getting my head around an "experiment" I'm trying to run. (This is just for some idea I have related to a thought experiment I'm working on and thought that I could generate a data set based on it, but I'm stuck on how to actually make it work).
The idea is based on Duffey & Saull's "Jar of Life" experiment, which has 25 white balls, and 5 black balls which are in a jar. Each of the 30 balls is drawn in succession. At the end you have a result that would look something like:


Where W = W=white and B = black.
SO this sequence is a random sequence every time, but there will always be exactly 25 White and 5 Black (this is the part that is doing my head in).

I created a routine to randomize 0 and 1 (Where 0 represents white, and 1 represents black) This I will actually store in a text field in a table, C30 in length, so later I can just use SQL commands to pluck out the patterns that I want after I run about 1,000,000 iterations of the test.

So I can do something like:
Code:
White = 0
Black = 1
APPEND BLANK
FOR X = 1 to 30
   REPLACE SEQUENCE WITH ALLTRIM(SEQUENCE)+ALLTRIM(STR((INT((White - BLACK + 1) * RAND( ) + Black))
ENDFOR

I'll of course bury that into a loop that runs 1,000,000 iterations, but this is the idea.

The problem is, HOW do I get it to produce only 25 White outcomes and 5 black outcomes in an iteration???


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Scott

I tried my code for 1,000,000 iterations, it runs in my computer for aproximattely 8 seconds (including storing the results in a cursor).

These are my benchmarks for you to compare against your findings (I ran the generator 4 times, marked from R1 to R4)

Code:
Runs		R1	R2	R3	R4
-----------------------------------------------
Time		8	9	8	8
%11111%		172	192	185	181
11111%		6	2	9	9
%11111		7	11	7	7

 
I don't see how stuffing in 5 ones into 30 zeros would get a stronger tendency towards one end, that's totally random.
But of course, there are only 26 combinations with all 1s in one rows, out of the 17,100,720. As you rarelyget them even in a million tries there will be a bulking, you just have to do more to get that evenly. Up to simply creating all combinations.

And Tom, I thought of changing the way to stuff, you can start with the 25 "O"s:
Code:
=Rand(-1)
LOCAL lcString, lnI, lnRows

Create Cursor crsCombinations (cCombination C(30))

For lnRows = 1 to 1000000
lcString = REPLICATE( [0] , 25 )
For lnI = 1 to 5
   lcString = STUFF( lcString , INT( RAND() * (Len(lcString)+1))+1, 0 , [1])
EndFor lnI
Insert into crsCombinations values (lcString)
EndFor lnRows

Browse for "11111" $ cCombination
I didn't see a preference for the 1s bulking together more often at the start. There might be a small tendency because of what I said, you can stuff at position 0 and 1, both stuff before the first character of the string, and to get after the last position you have to stuff at position len(string)+1, i.e.:

Code:
Clear
lcString = "abc"
For lnPos = 0 to 4
 ? Stuff(lcString,lnPos,0,"z")
EndFor
*result:
zabc
zabc
azbc
abzc
abcz

So:
1. Avoid using position 0, your random number result should be 1 at minimum.
2. go up to LEN(lcString)+1 to also add a 1 at the right end
3. notice RAND() never is 1, it's always <1. Besides that INT always rounds down, so INT(RAND()*5) will create numbers 0 to 4.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Hi All,
So the next update.
I was running Olaf's Second solution, and it was interesting... after a few minutes, I killed it, as it was still running. Wasn't sure why. So I put an update in to show how often 1,000 results were produced MOD(X,1000) = 0 and for a while it was about 1,000 every 6 seconds. I got tired, let it run while I was asleep, and 8 hours later, it was at 314,000. I watched, and the time to increase from 314,000 to 315,000 was about 5 minutes! So it gets slower and slower and slower as it runs. So I decided to terminate that one, and use another approach. So that was amusing.
I'm now testing Olaf's latest approach, and I'm going to re-run Tom's original approach with 17,000,000 iterations and see if I get any 11111 at the end...


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
So thought you might like to hear some of the findings.
I can now identify that there are 142,506 possible combinations. 1,000,000 iterations doesn't result in all the combinations, but gets close. But 2 million iterations will.
So this helps with my first phase, which is to get this to work. And then I need to determine the number of variables involved, versus number of failure opportunities. THEN I'm on to something.
Many thanks everyone.

Mike Lewis,
I'm still going to run your test as well. I quite like that idea.
I think it may be faster.


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
And by the way Tom, there is something wrong with your solution.
Sequences of 11111 leading occurred 306 times in 17 million iterations, but ending in 11111 resulted in 0 outcomes, and the total distribution was 118,755 unique outcomes, and we know there are 142,506 possible results, so that solution isn't a good random distribution.


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Hi,


I don't think it has to do with Tom's solution but with the way he implemented the RAND() function. Nevertheless there remain chances that duplicates occur. Please read below form Hacker's Guide to VFP 7.0

All random-number functions need a seed value based on which the function computes a random result. Passing the same number should (and does, in FoxPro) return the same "random" number. This ability allows you to test on the same data set.

To get the most random results, pass a negative number or zero as the initial seed. That seeds the function based on the system clock. On subsequent calls, omit the seed and FoxPro uses the previous number as a seed. You'll always get the same sequence if you start with the same value, but with a negative initial seed, that's very unlikely.

To mathematicians, a "random number" is always between 0 and 1, and that's what RAND() returns. Since you'll usually want a random integer in some range, you need to scale the result appropriately. The formula to convert the result to an integer between 0 and some boundary seems simple:

INT(nBoundary*RAND())

but things are seldom what they seem.

In all versions of FoxPro before VFP 5, RAND() returns both 0 and 1, but very rarely. Using the formula above, you get many fewer occurrences of nBoundary from a sequence of calls to RAND() then any other number in the range. You can fix that by adding 1 to the result (INT(nBoundary*RAND())+1), but then you get too few zeroes. There's no good solution to the problem. The best solution is for RAND() to never return one or the other of the endpoints (that is, RAND() should either never return 0 or never return 1).

Good news. Starting in VFP 5, that's exactly how it's done. RAND() never returns 1, so you can use INT(nBoundary * RAND()) to get values from 0 to nBoundary-1 or INT(nBoundary * RAND()) + 1 to go from 1 to nBoundary. No more ugly correction code that fouls up statistics.

One question you'd think we'd hear a lot is "How random are the results of RAND()?" There are various tests you can run to test the randomness of a sequence of random numbers. The downloads ( contains two tests. Each generates a sequence of random numbers and then computes a statistic about it. The first is the Chi-Squared test. It returns a value you can look up in a table (try a statistics textbook, or check out Excel's CHIDIST() function) to see how random RAND() is. The second test is called the "Coupon Collectors" test. The return value is the average number of random numbers you have to generate in order to be sure you've gotten at least one of each value in a range.

If you want you may delete the multiples (code based on Tom's solution)

Code:
LOCAL lcString as Character

CLOSE ALL 
 
CREATE CURSOR csrSequences (cSequenz c(30))

RAND(-1)

FOR i = 1 TO 10000
	lcString = REPLICATE( [1] , 30 )

	DO WHILE OCCURS( [0] , lcString ) < 5
		lcString = STUFF( lcString , INT( RAND() * 30 ) , 1 , [0] )
	ENDDO 

	Insert into csrSequences VALUES (lcString)
ENDFOR 

DELETE FROM csrSequences where cSequenz in (SELECT cSequenz FROM csrSequences GROUP BY cSequenz HAVING COUNT(cSequenz ) > 1) 

= MESSAGEBOX("N° of duplicates " + TRANSFORM(_Tally), 0, "Duplicates generated by RAND()", 2000)

BROWSE

USE

hth

MK
 
142,506 possible results you're right. The 17 million resulting from being able to put the first 1 on 30 positions, the second on 29, the third on 28, overall 30*29*28*27*26 have to be reduced by the fact the same 5 bits can be set in differing sequences.

142,506 is a number resulting from combinations calculators, yes. You have to divide by 5!=5*4*3*2*1=120 as that is the number of different sequences of the 5 bits.

And sure, using a collection to compute a combination is slower. My cursor solution would be faster.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Even with 2,000,000 iterations, one may find occasionally a run that misses all possible combinations.

I don't see an evident pattern biasing the results one way or another, but having a fast generator will provide more data to analyze.

Code + more 4 runs stats:

Code:
LOCAL NumResults AS Integer
LOCAL ARRAY Stats(1)

m.NumResults = 2000000

CREATE CURSOR Results (Result Char(30))

LOCAL Start AS Datetime
LOCAL Lapsed AS Integer

LOCAL Iteration AS Integer

m.Start = DATETIME()

FOR m.Iteration = 1 TO m.NumResults
	INSERT INTO Results VALUES (CalcResult())
ENDFOR

m.Lapsed = DATETIME() - m.Start

? "Sample size:", m.NumResults
? "Time to calculate:", m.Lapsed
SELECT COUNT(*) FROM Results WHERE Result LIKE '%11111%' INTO ARRAY Stats
? "11111 sequences:", m.Stats
SELECT COUNT(*) FROM Results WHERE Result LIKE '11111%' INTO ARRAY Stats
? "11111 sequences at the beginning:", m.Stats
SELECT COUNT(*) FROM Results WHERE Result LIKE '%11111' INTO ARRAY Stats
? "11111 sequences at the end:", m.Stats
SELECT DISTINCT * FROM Results INTO ARRAY Stats
? "Distinct sequences:", ALEN(m.Stats)
SELECT TOP 1 Result, COUNT(*) FROM Results GROUP BY Result ORDER BY 2 DESC INTO ARRAY Stats
? "Most frequent sequence:", m.Stats(1)
SELECT TOP 1 Result, COUNT(*) FROM Results GROUP BY Result ORDER BY 2 INTO ARRAY Stats
? "Least frequent sequence:", m.Stats(1)

FUNCTION CalcResult

LOCAL Result AS String
LOCAL Whites AS Integer
LOCAL Blues AS Integer
LOCAL White AS Character
LOCAL Blue AS Character
LOCAL StoreBlue AS Integer

m.Blues = 5
m.Whites = 25

m.White = "0"
m.Blue = "1"

m.Result = REPLICATE(m.White, m.Blues + m.Whites)

FOR m.StoreBlue = 1 TO m.Blues
	m.Result = STUFF(m.Result, AT(m.White, m.Result, 1 + RAND() * ((m.Blues + m.Whites + 1) - m.StoreBlue)), 1, m.Blue)
ENDFOR

RETURN m.Result

ENDFUNC

Clipboard01_sx9w54.png
 
atlopes,
You got the important one... that it should generate 142506 in 2,000,000 iterations, which is the maximum number of unique results. I see your last one is missing 1, and I agree that's within a reasonable "variation" for only 2,000,000. In 17 million, as Olaf mentiones 2^30, it should always provide the full range. I see the frequency of leading 11111 and trailing 11111 is reasonably the same as well, so I'm comfortable/confident that the randomness is reasonable. That's the critical point here.

My next step is to determine the number of factors that result in a major outage, and the frequency of events before an outage occurs. That frequency will then be represented by the 0's, and the number of incidents leading to an outage will be the 1's. Then I can change the 25 0's and the 5 1's and begin to predict some data about the outages (human errors) I'm looking to model and predict.


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
> I'm comfortable/confident that the randomness is reasonable. That's the critical point here.

mjcmkrsr quoted about the random number generator. You can seed it in a way you always get a repeatable random sequence:
Code:
Clear
? nonrandomsequence(1)
? nonrandomsequence(1)
? nonrandomsequence(1)
? nonrandomsequence(2)
? nonrandomsequence(2)
? nonrandomsequence(2)

Function nonrandomsequence(tnSeed)
   Local lcSequence
   Rand(tnSeed)
   lcSequence = ""
   For lnI = 1 to 10
      lcSequence = lcSequence+Transform(Int(Rand()*10))
   EndFor 
   Return lcSequence
This outputs:
[pre]2740197506
2740197506
2740197506
0637474573
0637474573
0637474573[/pre]

So this is a typical random number generator that can only be randomized. RAND(-1) makes the seed depend on system time, so do that once and you get different sequences, but only call RAND(-1) once with that -1 seed. System time does not change that fast, so it's also no good idea to always call RAND(-1), this is what happens then:

Code:
Local lnJ
For lnJ = 1 to 100
Clear
? nonrandomsequence()
Wait '' timeout 0.01 
Wait clear
EndFor 

Function nonrandomsequence()
   Local lcSequence
   lcSequence = ""
   For lnI = 1 to 1000
      lcSequence = lcSequence+Transform(Int(Rand(-1)*10))
   EndFor
   Return lcSequence

For certain scientific experiments with real randomness needed there are specific hardware devices for that, TRNG. No programming language I know has a true random number generator. See
It's good enough if you only use it for generating 0s and 1s. It might lack to ever create a longer sequence of same bits as a TRNG would do, albeit as seldom as that happens. The characteristic of PRNG is to use integer arithmetic which covers a whole range of integer, say 2 billion 32bit numbers and then repeats. but the aftermath of converting that to a number range 0..1 and then taking only a few values like 1-6 or just 0s and 1s you can forget that nature, it's random enough.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Hi Olaf,
Yes, I was aware of the non-randomness of the RAND() function, and particularly TRNG not being software based, but for what I need, this will do fine. The idea here is to examine generally speaking how often the 5 1's would actually "clump" together. My "test" to see if it was giving a reasonable distribution was to look at leading and trailing clumps, and Tom's solution made it clear that it was not giving a distribution that would be reasonable. (It in fact, never comes up with all 5 1's at the end, even when I ran 17 million iterations against it).
I'm not sure why, but I moved on with the other solutions to see if I could get a better result, and I did.
Ironically, part of my simulation is about human error, which isn't "truly random" either. And we can even be quite predictable about it. For instance, any high skill task requires someone with high skill at that task. I can assure you that despite having read recently a great book on brain surgery, I would still have a 100% failure rate should I try to perform brain surgery on you or one of your friends at your choosing.

I'm working to assign particular "elements" to the 1's now, and working to determine how many of these I need to assign to progress my theory. So one may represent skill, another attitude, another concentration, another involuntary reaction, etc. Then when they align... we have big failures. BUt I need to work out "opportunities" which represents the 0's. I'm not sure yet if 25 is enough. I may need to increase it.



Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
>I would still have a 100% failure rate should I try to perform brain surgery on you or one of your friends at your choosing.
Has been done already. Not on me, not on a close friend or family member, but from a brain surgeon.

Tom's first solution has that error of choosing too low range of random numbers, to get a 1 at the right end you have to allows length+1 for the position of stuff, that's all. I had written about that.

Bye, Olaf.


Olaf Doschke Software Engineering
 
Olaf,
Hahahaha... yes, lots of brain surgery done, but not by people who just "read about it".
I hope it turned out good.

Sorry I missed that point about the stuff+1.
I will modify it and try again, as it was the fastest method.


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Scott24x7 said:
as it was the fastest method.

Not entering in racing mode, but mine, Tom's and Olaf's methods are variations of the same method.

In fact, Tom's would be the slowest since it admits uneffective STUFF() calls, that is, STUFF() in which a blue goes over a blue. That adds about 7% to calculations (roughly the number of innocuous calls).

STUFF() to insert seems also to be a tiny bit slower than STUFF() to replace, so starting with a 30-character string would be marginally faster than with a 25-character string.

Finally, using variables instead of constants adds time to each run of the experiment (but using variables is the base of experimenting, right?). Change from variables to constants in code, and the gain of speed may be in the order of one third.
 
>I will modify it and try again, as it was the fastest method.
It's already done by Tom, atlopes and me in variations of Tom's original idea.

If you're after performance you'd not use string operations, STUFF does not operate on the memory of an existing string, also not when used to replace instead of insert.
You'd need to use SYS(2600) with 30 bytes allocated or better create a DLL in C++.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top