Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Calculating Percentile Values

Status
Not open for further replies.

sirskeezy

Technical User
May 23, 2012
14
0
0
CA
Hi,

I've been searching up how to calculate percentile values all day and I still do not seem to be able to get the values that I want. Can someone please help me?

Heres the overview of my situation:

I have a table called Raw Data and it holds fields including 'operating costs' and 'property size'

The percentile value that I want is the 95th and 5th percentile value of 'operating costs/property size'

First step I did was to create a query called 'STEP 1:FACTOR VARIABLES' that calculates 'operating costs/property size'

Second step I made another query that would use my query 'STEP 1:FACTOR VARIABLES' and tried calculating my 5th percentile with the sql code
SELECT Max([Operating Costs/Property Size]) AS [5th percentile Operating Costs/Property]
FROM (SELECT TOP 95 PERCENT [Operating Costs/Property Size] FROM [STEP 1: FACTOR VARIABLES])
ORDER BY Max([Operating Costs/Property Size]);


I thought that this would get the value that I need but all it does is grab the largest 95% of values and then get the max of it, which is basically just the max value itself....

Would I be able to get what I want using SQL or do I have to do some VBA coding?

 

I am not very good with SQL so I did it in pieces.
Made two queries

qry5th:

SELECT TOP 5 PERCENT
tblOperatingCosts.ID,
[operatingCost]/[propertySize] AS OpCost_Size,
"5th Percentile" AS Percentile
FROM
tblOperatingCosts
ORDER BY
[operatingCost]/[propertySize];


and
qry95th:

SELECT TOP 5 PERCENT
tblOperatingCosts.ID,
[operatingCost]/[propertySize] AS OpCost_Size,
"95th Percentile" AS Percentile
FROM
tblOperatingCosts
ORDER BY
[operatingCost]/[propertySize] DESC;


Then I just unioned them
select *
from
qry5th
union
select *
from
qry95th
order by 2
 
Hey MajP,

The thing is, there isn't a table for operating costs separately so I think that SQL code wouldn't work in my case...
 
I thought you said you have a table of raw data holding operating costs and property size, so I am not sure if I understand.
 
Yes, the table name is called Raw Data and operating costs and property size are 2 fields that are in that table
 
Unless I am missing something, then the solution I posted would work if you change the names.
 
:S still does not seem to be working, I get a error message regarding 'tblOperatingCosts.ID'

btw the query is going against my Raw Data table
 
Assuming you want 5th percentile to be the higher costs/size and 95th to be the lower costs/size I think this should work:

SELECT Min([CostsPerSize]) AS [5th percentile Operating Costs/Property]
FROM (SELECT TOP 5 PERCENT [Operating Costs/Property Size] As CostsPerSize
FROM [STEP 1: FACTOR VARIABLES] ORDER BY 1 Desc)

SELECT Max([CostsPerSize]) AS [95th percentile Operating Costs/Property]
FROM (SELECT TOP 5 PERCENT [Operating Costs/Property Size] As CostsPerSize
FROM [STEP 1: FACTOR VARIABLES] ORDER BY 1 Asc)

 
Sorry I misinterpreted what you were asking.
Also, I did not know there are multiple ways to do percentiles. The following function is supposed to mimic an Access domain function. I choose to use the formulation used in excel see discussion


This allows you to verify your values.

Code:
Public Function DPercentileExcel(expr As String, domain As String, Percentile As Double) As Variant
    Dim strSQL As String
    Dim N As Integer
    Dim nSubk As Double
    Dim vSubk As Variant
    Dim vSubkPlus1 As Variant
    Dim rs As DAO.Recordset
    
    strSQL = "SELECT " & expr & " FROM " & domain
    strSQL = strSQL & " WHERE NOT " & expr & " IS NULL ORDER BY " & expr
    Set rs = CurrentDb.OpenRecordset(strSQL, dbOpenDynaset)
    
    If Not (rs.EOF And rs.BOF) Then
      rs.MoveLast
      rs.MoveFirst
    Else
      Exit Function
    End If
    
    N = rs.RecordCount
    
    nSubk = Percentile / 100 * (N - 1) + 1
    'using NIST nSubK = Percentile / 100 * (N+1)
    
    If nSubk = 1 Then
      DPercentileExcel = rs.Fields(expr)
    ElseIf nSubk = N Then
      rs.MoveLast
      DPercentileExcel = rs.Fields(expr)
    Else
      rs.AbsolutePosition = nSubk - 1     '0 based
      vSubk = rs.Fields(expr)
      rs.AbsolutePosition = nSubk         '0 based
      vSubkPlus1 = rs.Fields(expr)
      'Debug.Print nSubk
      'Debug.Print vSubk
      'Debug.Print vSubkPlus1
      DPercentileExcel = vSubk + (nSubk - Int(nSubk)) * (vSubkPlus1 - vSubk)
    End If
End Function

Just pass the name of the field to use and the table or query

Example
Code:
Public Sub testPercentileExcel()
 Debug.Print DPercentileExcel("fldData", "tblOne", 0)
End Sub


Here is another approach using a more sql solution and a weighted average. In this one you do not encapsulate field/table names in brackets.

Code:
Public Function DPercentile(expr As String, domain As String, Percentile As Double) As Variant
    Dim strSelect As String
    Dim strSelectOpposite As String
    Dim strSQL As String
    Dim OppositePercentile As Double
    
    OppositePercentile = 100 - Percentile
    
    strSelect = "SELECT TOP " & Percentile & " PERCENT [" & expr & "] FROM [" & domain & "]"
    strSelect = strSelect & " WHERE NOT [" & expr & "] IS NULL ORDER BY [" & expr & "]"
    Debug.Print strSelect
    strSelectOpposite = "SELECT TOP " & OppositePercentile & " PERCENT [" & expr & "] FROM [" & domain & "]"
    strSelectOpposite = strSelectOpposite & " WHERE NOT [" & expr & "] IS NULL ORDER BY [" & expr & "] DESC"
    strSQL = "SELECT (Max(x." & expr & ")+Min(y." & expr & "))/2 AS Median FROM (" & strSelect & ") AS x, (" & strSelectOpposite & ") AS y "
    Debug.Print strSQL
    DPercentile = CurrentDb.OpenRecordset(strSQL, dbOpenSnapshot).Fields("Median").Value
End Function
 
Wow these VBA codes look really promising, I'm going to try them out now. Because I am a complete noob at this, do I just copy n paste the first part into a module, then copy and paste the second part underneath it with my table name and field?
 
just drop the code into a module.

The functions can then be called from anywhere (from code, in a calculated control, from another query)

So you pass in the field name, the table/query name, and the percentile you are looking for. The field is the field containing your values.
 
Ok thanks MajP,

I've created a new module and first put in

Public Function DPercentileExcel(expr As String, domain As String, Percentile As Double) As Variant
Dim strSQL As String
Dim N As Integer
Dim nSubk As Double
Dim vSubk As Variant
Dim vSubkPlus1 As Variant
Dim rs As DAO.Recordset

strSQL = "SELECT " & expr & " FROM " & domain
strSQL = strSQL & " WHERE NOT " & expr & " IS NULL ORDER BY " & expr
Set rs = CurrentDb.OpenRecordset(strSQL, dbOpenDynaset)

If Not (rs.EOF And rs.BOF) Then
rs.MoveLast
rs.MoveFirst
Else
Exit Function
End If

N = rs.RecordCount

nSubk = Percentile / 100 * (N - 1) + 1
'using NIST nSubK = Percentile / 100 * (N+1)

If nSubk = 1 Then
DPercentileExcel = rs.Fields(expr)
ElseIf nSubk = N Then
rs.MoveLast
DPercentileExcel = rs.Fields(expr)
Else
rs.AbsolutePosition = nSubk - 1 '0 based
vSubk = rs.Fields(expr)
rs.AbsolutePosition = nSubk '0 based
vSubkPlus1 = rs.Fields(expr)
'Debug.Print nSubk
'Debug.Print vSubk
'Debug.Print vSubkPlus1
DPercentileExcel = vSubk + (nSubk - Int(nSubk)) * (vSubkPlus1 - vSubk)
End If
End Function

then right underneath it i wrote

Public Sub testPercentileExcel()
Debug.Print DPercentileExcel("Operating Size/Property Size", "STEP 1: Factor Variables", 0)
End Sub

After saving this module, I opened up a query in access and pulled out the module in the expression builder.

Now my expression looks like this : DPercentileExcel («expr», «domain», «Percentile»)

What do input in for <<expr>>, <<domain>> and <<percentile>> ?
 
Here is my query that takes operating cost and divides by property size
There are 21 values
Code:
operatingCost  PropertySize  CostperSize

$100.00	      10     10.00
$150.00	      12     12.50
$110.00	      7	     15.71
$250.00	      15     16.67
$130.00	      7	     18.57
$200.00	      10     20.00
$350.00	      15     23.33
$120.00	      5	     24.00
$450.00	      17     26.47
$300.00	      10     30.00
$230.00       7	     32.86  11th item
$100.00	      3	     33.33
$210.00	      6	     35.00
$400.00	      10     40.00
$220.00	      5	     44.00
$310.00	      7	     44.29
$330.00	      7	     47.14
$410.00	      7	     58.57
$430.00	      7	     61.43
$320.00	      5	     64.00
$420.00	      5	     84.00

To demonstrate calling the functions from another query I built a table that holds nothing but percentiles 10,20,...90

tblPercentiles
Code:
percentile
10
20
30
40
50
60
70
80
90

Then I can call the functions from another query
Code:
SELECT 
 tbPercentiles.percentile, 
 DpercentileExcel("costPerSize","qryCostPerSize",[percentile]) AS PercentileValueExcel,  
 Dpercentile("costPerSize","qryCostPerSize",[percentile]) AS PercentileValue
FROM 
 tbPercentiles
ORDER BY 
 tbPercentiles.percentile;

Results:
Code:
percentile  PercentileValueExcel  PercentileValue
10      15.71   15.71
20      18.57   18.57
30      23.33   23.33
40      26.47   26.47
50      32.86   32.86
60      35      35
70      44      44
80      47.14   47.14
90      61.43   61.43
I was expecting the values to be different from the two formulations. If N is odd they produce the same results. If N is even they do not.

As you can see the median is correct since there is 21 values it is the middle rank (10) and a value of 32.86.

By shaving of the 21 first item and making N odd the two formulations give different results. For the 50th (median) values you get:

percentile PercentileValueExcel PercentileValue
50 32.86 31.43

The weighted formulation of 31.43 is what you traditionally would expect because it is the average of the 10 and 11th value.
 
SELECT
tbPercentiles.percentile,
DpercentileExcel("costPerSize","qryCostPerSize",[percentile]) AS PercentileValueExcel,
Dpercentile("costPerSize","qryCostPerSize",[percentile]) AS PercentileValue
FROM
tbPercentiles
ORDER BY
tbPercentiles.percentile;

what is the Dpercentile in ur sql code, when i try to run my query it says 'Dpercentile' is an undefined function?
 
It is the second function I provided.
DPercentileExcel uses the excel formulation
Dpercentile uses the standard weighted average.

You can use one or the other.
 
ooooh ic...

also, is qryCostPerSize the name of the query u made for the operating costs/property size?
 
yes that is the name of the query you see with the 21 records.
 
then i don't understand why when i try to run my query it says that in the SELECT state ment theres a reserved word or argument name that is misspelled or missing, or the punctuation is incorrect :(

the sql i wrote is SELECT [Percentiles].[percentile], DpercentileExcel("operating costs/property size","step 1: factor variables",[percentile]) AS PercentileValueExcel,
FROM Percentiles
ORDER BY [Percentiles].[percentile];
 
at a mininum "[operating costs/property size]","[step 1: factor variables]"
Your naming convention is absolutely horrible and will be problematic. No spaces, no special characters.

OperatingCostsPerPropertySize Step1_FactorVariables
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top