Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Weighted Random Number formula? 3

Status
Not open for further replies.

JulieS

MIS
Nov 23, 2000
48
0
0
CA
Hi,

I have a list of 100 employees.

I want to randomly select 5% of the employees per month for a test (there's a test once per month for one year).

To start, each employee has an equal chance of getting selected.

Once an employee is selected, his chance of being randomly selected again drops down to 5% (while other employees remain at 95%). So the random pool is weighted on a 5%:95% ratio of tested:untested.

Every employee must be tested in the given time period.

This is the problem at it's most basic level. Even just a start (on how to assign these weights; how to calculate weighted randoms; how to ensure every employee is selected) would be extremely helpful.

Thanks,
Julie
 
I want to randomly select 5% of the employees per month for a test (there's a test once per month for one year).

Every employee must be tested in the given time period.
What time period are you talking about?
In one year you can't even test all employees by testing 5% per month.

With higher time period like 3 or 5 years you might make the probability higher of testing all employees... but even then you can't guarantee it as long as you force a random selection with your 5% and 95% rules.
 
Actually maybe you need to define what 5% and 95% represent.... since your probability of being selected on the first monthly round is 5%

perhaps you mean to suggest that the probability of retesting someone already tested is approximately 1/20 of testing someone who hasn't been tested.

What happens if someone is tested twice? Is his probability in the next month different than someone who was tested once.
 




Would you not have to randomly assign a number EVERY employee ONCE, and then cycle thru the list in order to test each employee in the given period in that order?

Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Thank you for your replies:

Response 1. Sorry for not being clear. You're right, you can't test every employee. The weights and the timeframe are variables (they can change by user). So one user might want to test their X number of employees in Y timeframe at Z weights. There definitely will need to be a check to ensure the values entered are possible. But that's the least of my problems at the moment ..

Response 2. I"m open to suggestions on what makes the most sense for weights. Essentially, if an employee is tested once, his chance of being tested again decreases by some margin. Right now, it doesn't matter what that margin is, as at this point, I'm unsure on simply how to apply a calculation.

Response 3. It needs to be a random test. i.e. the employees will be randomly selected. And once they are selected, they know they're not off the hook, because they could be tested again.
 
I'm not sure which aspect you need help with, but I assume it is primarily picking an employee using a weighting factor. I have defined below a ProbabilityFactor, which will have no meaning other than: an employees probability of being selected is equal to his ProbabilityFactor divided by the sum of ProbabilityFactors of all unchosen employees.

It is probably easier to implement in a spreadsheet, but I don't know how to upload a spreadsheet (I couldn't read yours by the way), so I have done it in vba. You can run the program a few times and see the results come up a little different each time.

If the lines wrap around when you view them in browser, use ctrl-minus to make your font size smaller.
Code:
Option Explicit

Type employee
    ProbabilityFactor As Double        ' Employees prob of being chosen is equal to their PF over sum of all unchosen employees
    elbound As Double        ' employee lower bound... used for converting random number to employee choise
    eubound As Double        ' employee upper bound... used for converting random number to employee choise
    ChosenThisMonth As Boolean        ' Keep track of whether employee has been chosen during the current program
    ename As String
End Type

Sub test()
    Dim employees(1 To 5) As employee
    Dim ectr As Long        ' employee counter
    Dim choosectr As Long        ' count number of choices
    Dim PFsum As Double        ' Sum of ProbabilityFactors for unchosen employees
    Dim randno As Double        ' Randno which will be compared to elbound and eubound to select emplooyee
    Dim ebound As Double        ' temporary variable used to establish elbound and eubound
    Dim NmatchesThisRandno As Double        ' Number of Matches for this Randno - used to double-check only one match for each randno, as designed

    Const ReductionFactor = 0.6        ' How much does employees probability factor decrease (for next month) after chosen this month


    ' Simulate Read employee weights from spreadsheet.
    ' In this case we will create them directly in vba
    employees(1).ProbabilityFactor = 95
    employees(2).ProbabilityFactor = 50
    employees(3).ProbabilityFactor = 90
    employees(4).ProbabilityFactor = 95
    employees(5).ProbabilityFactor = 85

    employees(1).ename = "smith"
    employees(2).ename = "jones"
    employees(3).ename = "skip"
    employees(4).ename = "epete"
    employees(5).ename = "Julie"

    ' Initialize all employees as not yet selected this month
    ' Initialize employee elbound and eubound to negative number
    For ectr = 1 To UBound(employees)
        employees(ectr).ChosenThisMonth = False
        employees(ectr).elbound = -100
        employees(ectr).eubound = -100
    Next ectr


    Const NumberToChoose = 2        ' number of employees required to choose for this month

    For choosectr = 1 To NumberToChoose



        ' Initialize elbounds, eubounds, and PFsum based on unchosen employees
        PFsum = 0
        ebound = 0
        For ectr = 1 To UBound(employees)
            If employees(ectr).ChosenThisMonth = False Then
                employees(ectr).elbound = ebound
                PFsum = PFsum + employees(ectr).ProbabilityFactor
                ebound = ebound + employees(ectr).ProbabilityFactor
                employees(ectr).eubound = ebound
            End If
        Next ectr
        ' Error check:
        If PFsum = 0 Then MsgBox "Error - not enough untested employees available to choose required number"

        ' Choose Random number from 0 to PFsum
        randno = Rnd * PFsum

        ' Find the employee that matches our selected random number
        NmatchesThisRandno = 0
        For ectr = 1 To UBound(employees)
            If randno > employees(ectr).elbound And randno < employees(ectr).eubound Then
                employees(ectr).ChosenThisMonth = True
                employees(ectr).ProbabilityFactor = employees(ectr).ProbabilityFactor * ReductionFactor

                NmatchesThisRandno = NmatchesThisRandno + 1
            End If
        Next ectr
        ' Error check:
        If NmatchesThisRandno <> 1 Then MsgBox "Error, rand number did not match exactly one employee"
    Next choosectr

    ' Output Results  to immediate window
    For ectr = 1 To UBound(employees)
        Debug.Print "Employee " + CStr(ectr) + "(" + employees(ectr).ename + ") Status: " _
                    + Switch(employees(ectr).ChosenThisMonth, "Selected! ", True, "Not Selected! ") + _
                    "Pfactor for next month will be: " + CStr(employees(ectr).ProbabilityFactor)

    Next ectr

    ' Note that at this point, the chosen employees "ProbabilityFactor" should be updated and saved for use next month

End Sub
 
Whoops. I think the last version leaves the possibility for the same person to be chosen twice in one month, yielding less chosen people than desired. That is corrected in bolded code below
Code:
Option Explicit

Type employee
    ProbabilityFactor As Double        ' Employees prob of being chosen is equal to their PF over sum of all unchosen employees
    elbound As Double        ' employee lower bound... used for converting random number to employee choise
    eubound As Double        ' employee upper bound... used for converting random number to employee choise
    ChosenThisMonth As Boolean        ' Keep track of whether employee has been chosen during the current program
    ename As String
End Type

Sub test()
    Dim employees(1 To 5) As employee
    Dim ectr As Long        ' employee counter
    Dim choosectr As Long        ' count number of choices
    Dim PFsum As Double        ' Sum of ProbabilityFactors for unchosen employees
    Dim randno As Double        ' Randno which will be compared to elbound and eubound to select emplooyee
    Dim ebound As Double        ' temporary variable used to establish elbound and eubound
    Dim NmatchesThisRandno As Double        ' Number of Matches for this Randno - used to double-check only one match for each randno, as designed

    Const ReductionFactor = 0.6        ' How much does employees probability factor decrease (for next month) after chosen this month
    Const Unchoosable = -100

    ' Simulate Read employee weights from spreadsheet.
    ' In this case we will create them directly in vba
    employees(1).ProbabilityFactor = 95
    employees(2).ProbabilityFactor = 50
    employees(3).ProbabilityFactor = 90
    employees(4).ProbabilityFactor = 95
    employees(5).ProbabilityFactor = 85

    employees(1).ename = "smith"
    employees(2).ename = "jones"
    employees(3).ename = "skip"
    employees(4).ename = "epete"
    employees(5).ename = "Julie"

    ' Initialize all employees as not yet selected this month
    ' Initialize employee elbound and eubound to negative number
    For ectr = 1 To UBound(employees)
        employees(ectr).ChosenThisMonth = False
        employees(ectr).elbound = Unchoosable
        employees(ectr).eubound = Unchoosable
    Next ectr


    Const NumberToChoose = 2        ' number of employees required to choose for this month

    For choosectr = 1 To NumberToChoose



        ' Initialize elbounds, eubounds, and PFsum based on unchosen employees
        PFsum = 0
        ebound = 0
        For ectr = 1 To UBound(employees)
            If employees(ectr).ChosenThisMonth = False Then
                employees(ectr).elbound = ebound
                PFsum = PFsum + employees(ectr).ProbabilityFactor
                ebound = ebound + employees(ectr).ProbabilityFactor
                employees(ectr).eubound = ebound
           [b] Else
                employees(ectr).elbound = Unchoosable
                employees(ectr).eubound = Unchoosable[/b]
            End If
        Next ectr
        ' Error check:
        If PFsum = 0 Then MsgBox "Error - not enough untested employees available to choose required number"

        ' Choose Random number from 0 to PFsum
        randno = Rnd * PFsum

        ' Find the employee that matches our selected random number
        NmatchesThisRandno = 0
        For ectr = 1 To UBound(employees)
            If randno > employees(ectr).elbound And randno < employees(ectr).eubound Then
                employees(ectr).ChosenThisMonth = True
                employees(ectr).ProbabilityFactor = employees(ectr).ProbabilityFactor * ReductionFactor

                NmatchesThisRandno = NmatchesThisRandno + 1
            End If
        Next ectr
        ' Error check:
        If NmatchesThisRandno <> 1 Then MsgBox "Error, rand number did not match exactly one employee"
    Next choosectr

    ' Output Results  to immediate window
    For ectr = 1 To UBound(employees)
        Debug.Print "Employee " + CStr(ectr) + "(" + employees(ectr).ename + ") Status: " _
                    + Switch(employees(ectr).ChosenThisMonth, "Selected! ", True, "Not Selected! ") + _
                    "Pfactor for next month will be: " + CStr(employees(ectr).ProbabilityFactor)

    Next ectr

    ' Note that at this point, the chosen employees "ProbabilityFactor" should be updated and saved for use next month

End Sub
 
Also, it would be more descriptive to rename the Constant "Unchoosable" as "Unchosen
 
Thank you so much, this is a brilliant solution and exactly what I need.

One other thing that's really stumped me is how to ensure 100% of employees are tested over the given time period? I'm not even sure if that's possible, as new employees could be hired right up to the last test period.

Would be curious to know your thoughts.

Thanks again,
Julie
 
T = Total time period in months (12?)
E = Total Number of employees
EPM = Number of employees required to test per month (E = 0.05*E... rounded up)
M = Existing month number (1..12?)

For any given month we can compute:
C = Number of employees that have been chosen at least once.
U = E-C = number of employees that have not yet been chosen in the period (Unchosen)

Near the beginning of the time period, we keep everything totally random. A given employees probability factor will be:
PF = PF0 * RF^q where PF = probability factor, PF0 = initial (unchosen probability factor), RF = Reduction factor, q = number of times chosen.

As we draw near the end of the period, we may need to start enforcing choices, because sooner or later we run out of slots to test the untested employees (assuming EPM is restricted, which may or may not be a real constraint). That point in time would occur when number of months required to test remaining employees is equal to number of months left to test i.e. U / EPM = T-M or maybe to provide a 1-month margin U / EPM = T-M –1 and at this point in time we want the already-tested employees probability factors to go to zero.

To accomplish an orderly transition from totally random at beginning of period to forced at end, we might modify our formula to be something like:

PF = PF0 for employees not yet tested during the period
PF = PF0 * RF^q *[1 – {U/EPM} / { T – M - 1}] ^p for employees already tested during the period

Hopefully you can see that for {U/EPM} << { T – M – 1}, the quantity in square brackets is approximatley 1, and we are sticking with the same random approach as we started out. That's because there are plenty of slots left to accomplish testing of the untested guys.

For {U/EPM} = { T – M – 1}, the quantity in square brackets becomes 0 and we enforce PF=0 for all previously-tested people and only test the untested people (because we have run out of time for randomness).

p is an exponent that controls how quickly the deviation from randomness creeps into the picture.

A low value of p (0.1, 0.2, 0.5) would keep everything random for as long as possible and only start heavily pushing the untested guys toward the end of the period, and then only if it is needed to accomplish the objective of testing everyone.

A high value of p (2, 4, 10), would push towards testing the untested guys earlier in the period, and avoid waiting until the end where if something changes (like a new employee showing up), you might not have enough slots left to test them.

Of course, if it is not a constraint to test the same number every period, you could just add an extra mandatory test for every new employee so you don't have to worry about that complication.

I could end the discussion there. I should also mention that the choice of RF steers things in a certain direction, for given number of employees and given EPM requirement etc, a lower reduction factor also reduces the probability that you will end up with too many untested employees near the end of the period. Also to further this objective you could make a significant decrease in PF after the first test, and then not so much after remaining tests. That gives a little more complicated factor for tested employees:
PF = PF0 * RF1^q1 *RF2^q2 *(1 – {U/EPM} / { T – M}) ^p
where RF1 is reduction factor based on first test (low number like 0.5) and RF2 is reduction factor for subsequent tests (higher number like 0.9). q1 is 0 or 1 dependeing on whether or not employee has been tested once. q2 counts the number of tests beyond the first test. So the sequence of (q1,q2) as an employee is tested would progress as follows: (q1,q2) = (0,0), (1,0), (1,1), (1,2), (1,3) etc

Hopefully I did not make it more complicated than it needs to be. I would encourage you to select an approach that makes sense to you and don't pay too much attention to what I wrote because I really know nothing about the real drivers of your decisions.
 
Picking the correct probability factor is not intuitive because it is realtive to the amount of people and the amount of previously choosen people. But, if you want to ensure that those previously choosen are choosen at a defined reduced rate then here is how to do it. You would just have to modify the part of the code where the new probability factor is calculated.

Your example was .05 probability for those previously choosen.

To do this I would make the weights simply 0 to 1 (or you could stick with 0 to 100). Those people never choosen would get a weight of 1 (or 100 if you want to modify). Then find the probability factor (weight) for those that have previously been choosen.


N is total people
X have been choosen prior
N-x have not been choosen
P_x = The probability of a person being choosen if choosen prior (for this example .05)
W_(N-x) = the weight (probability factor) for those never choosen and always eqaul 1
W_x = is the weighting factor for those previously choosen.

So Assume
N = 10 total people
X = 3 choosen prior month
N-x = 7 unchoosen prior


Then to find the probability factor (weight) Wx

W_x = P_x(N-x)
--------
1-X*P_x

W_x = .05(7)
---------
1-3*.05
W_x = .4112

PFSUM = 7 * 1 + 3 * .4112 = 8.235

Therefore to verify: the 3 priorchoosen would have a
.4112/8.235 = .05 chance of being choosen

the remaining seven would have a
1/8.235 = .12 chance


So in words
The factor for the previously choosen people is

desired probablity * Number not choosen / (1 - Number Previously Choosen * desired Probability)
 
After I wrote that I realize, I was incorrect. Although that formula works for defining the weights the starting probability I used was wrong. I said p_x was .05, but that is the probability of drawing a person on any draw. p_x should be something smaller than that so that the probability is .05 for drawing a person in a month. The binomial distribution answers the question, given a probability of being choosen on any draw (p_x) what is the probability of being choosen in k draws. For your problem you want .05 probability of being selected once in 5 draws. Actually that would be true if you did not pull the person from the list once choosen in a month. In truth you want a .05 probability of:

being choosen in one draw
or
being choosen on the second draw of 2 draws
or
being choosen on the third draw of 3 draws
or
...
being choosen on the fifth draw of 5 draws

This probability is:
.05 = p + (1-p)(p) + (1-p)(1-p)(p) + (1-p)(1-p)(1-p)(p) + (1-p)(1-P)(1-p)(1-P)(p)
(p is the probability of being choosen on any draw,1-p is the probability of not being drawn)

The value for p_x becomes about .0085 instead of .05 and this gives you a weight of .061.
 
I would maintain that the code is correct as I submitted it within the assumptions I stated. Whether assumptsions are correct or not is a different discussion which only OP can answer.

I have defined probability factor in relative terms. If we know the ratio of probability factors, we know the ratio of probabilities. Knowing or predicting the absolute probability doesn't get you very far if you can't enforce a target for it. And you cannot enforce an arbitrary specified absolute probability for all categories unless you also have control of the number of employees selected per month, which was stated as a constant ("I want to randomly select 5% of the employees per month for a test")
 
Sometimes the way the problem is stated gets in the way of solving the original problem.

The original aim is to make sure all employees get tested eventually, but an employee who has just been tested isn't left knowing he/she is in an exclusion period in which they are guaranteed test-free.

Instead of defining that 5% of all will be tested, but with variable probabilities for previous testees, wouldn't it be vastly simpler to define that 100 (or whatever) test places will be allocated to previously untested individuals (if available), and 10 places to previously tested individuals (if available)?

The procedure is then very simple indeed. In pseudocode:

(1) nu = Count of untested individuals.
(2) nt = Count of tested individuals
(3) If nu < 100, TargetNumber = nu, else target = 100
(4) if nt < 10, TargetNumber = 110 - nt
(5) SELECT top TargetNumber individuals from set ORDERED BY random number WHERE IveBeenTested = false
(6) SELECT top 110-TargetNumber individuals from set ORDERED BY random number WHERE IveBeenTested = true

Joining together the lists from (5) and (6) gives you the list of testees.

(OK, OK, my example doesn't work if you allocate more test-places than you have employees, but I didn't promise error-checking)

Guarantees a complete turn-over in a set time, as the procedure will deplete the untested set by 100 every time a test is run. When untested set is empty, reset and start again...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top