Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Calculate 90th Percentile

Status
Not open for further replies.

shelby55

Technical User
Jun 27, 2003
1,229
CA
Hi

I searched on "percentile" which resulted in 3 threads. One post didn't apply and 2 had responses saying "there are plenty of percentile threads" and advising to search but not indicating links to the threads or advising how to do it.

So could someone please advise how I calculate percentile or point me to the threads where this is discussed?

Specifically I am using Access 2003. I have a table with patient data at the visit level. What I would like to do is calculate the 90th percentile per day, week and month of the patient's length of stay.

All assistance greatly appreciated.
 
Hi Skip

This thread thread767-1270864 describes the differences between Crystal and Excel calculation of percentile. I'd prefer to use Crystal's if it's possible to replicate.

Thanks.
 
Hi Again..I re-read the post and I think that I want whatever is more accurate and it sounds like the Excel algorithm works best.

Which still doesn't help me apply this to a working solution in Access....
 



But its not a straight 9/10. You stated that ou have some logic that eliminates some counts or sums.

What is your logic as it relates to your data?

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 


... and post a concrete example, please.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Hi

Below is a set of sample data. I have the formula for calculating the length of stay (LOS) occurring in the source data. Sometimes the times required to calculate LOS are missing or certain cases aren't to be included in the calculation so these LOS show as zero. The denominator (denom) field is also calculated within the source to flag whether to include in the percentile calculation based on the same criteria as the LOS calculation. There would never be cases where the LOS = 0 but even if there was, if the case is valid then it should be included in the calculation.

[tt]VisitID VisitDate Triage LOS Denom Doctor
M34395 01-May-09 3 0 0 Jones
M92597 01-May-09 4 0 0 Smith
M45297 01-May-09 5 0 0 Jones
M12345 01-May-09 1 3.7 1 Smith
M45589 01-May-09 1 7.1 1 Smith
M34483 01-May-09 1 3.8 1 Jones
M55555 01-May-09 2 5.4 1 Jones
M11111 01-May-09 2 8.1 1 Jones
M22222 01-May-09 2 11.3 1 Jones
M33333 01-May-09 4 2.6 1 Smith
M44444 01-May-09 4 4.0 1 Smith
M66666 01-May-09 3 9.3 1 Smith
M45888 01-May-09 3 5.3 1 Jones
M28793 01-May-09 4 4.7 1 Jones
M79923 01-May-09 5 3.7 1 Smith
M99945 01-May-09 5 4.2 1 Smith
M83884 01-May-09 5 3.7 1 Smith
M54392 01-May-09 4 3.4 1 Jones
M38397 01-May-09 5 5.3 1 Jones[/tt]

Using above as an example and basing it on Excel calculations of percentile, if I only include the cases that have denom = 1 then the 90 percentile is 8.7 hours but if leaving them in then it's 8.3. So I want to exclude where denom = 0.

I've added doctor in from the original because at some point I will need dual groups i.e. 90th percentile per doctor per visit date.

Thanks.
 

I would think that the simplest way to do the analysis, would be to exclude the ZEROS from your resultset, and only have rows that contain valid data.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Hi Skip

Thanks but I'm still not sure what fields etc. to be using for the percentile calculation which is why I posted this thread in the first place.

The thing is that there are many other data elements required for the overall query feeding my workbook that include those cases so I assume I will create a query for those, one for the percentile and then link them together to make one big query.
 

Here's what I did in Excel.

1. Deleted the 3 rows of ZEROS

2. Added a column for GROUP, where you can have a formula that groups, based on your criteria. I did a group on VisitDate and Doctor, so for Jones the 90 pctile is 9.06 and Smith is 7.76
[tt]
G1: Group
G2: =IF(AND(B2=B1,F2=F1),G1,IF(ISNUMBER(G1),G1+1,1))
[/tt]
3. percentile function calculation
[tt]
H2: =PERCENTILE(OFFSET($A$1,MATCH(G2,Group,0),3,COUNTIF(Group,G2),1),0.9)
[/tt]
The OFFSET function returns a RANGE that varys with the specified GROUP.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Sorry, I had a typo in my suggestion (90 instead of 0.9)
the 90 percentile is 8.7
With your sample data I get 9.3:
Code:
SELECT S.VisitDate
, (SELECT Min(LOS) FROM SampleData A
   WHERE (SELECT Count(*) FROM SampleData B WHERE B.LOS<=A.LOS AND B.Denom=1 AND B.VisitDate=A.VisitDate)
    >= 0.90 * (SELECT Count(*) FROM SampleData WHERE Denom=1 AND VisitDate=A.VisitDate)
   AND A.VisitDate=S.VisitDate AND A.Denom=1) AS [90th Percentile]
FROM SampleData AS S
GROUP BY S.VisitDate

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
the 90 percentile is 8.7
Can't happen with your sample data as no row has a LOS=8.7 !
 


9.3 is the only descrete value in the 90th percentile, but any value >= 8.7 would be included, if there were any, which, of course, would perturbate the calculation.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
So, the discrete 90th percentile value for each doctor at each date:
Code:
SELECT S.VisitDate, S.Doctor
, (SELECT Min(LOS) FROM SampleData A
   WHERE (SELECT Count(*) FROM SampleData B WHERE B.LOS<=A.LOS AND B.VisitDate=A.VisitDate AND B.Doctor=A.Doctor)
    >= 0.90 * (SELECT Count(*) FROM SampleData WHERE VisitDate=A.VisitDate AND Doctor=A.Doctor)
    AND A.VisitDate=S.VisitDate AND A.Doctor=S.Doctor) AS [90th Percentile]
FROM SampleData AS S
GROUP BY S.VisitDate, S.Doctor
Result:[tt]
VisitDate Doctor 90th Percentile
01-May-09 Jones 8.1
01-May-09 Smith 9.3[/tt]

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hi

Thanks so much. Not sure what Skip means by "perturbate the calculation"?

This gets more interesting because I used the same table to run in Crystal and get 10.3 hours when denom = 1 and 9.3 when using all cases.

On eHow I found this comment:
The percentile need not be an actual data point. An alternative way to solve for the percentile when there are few numbers is to interpolate. For example, for the set {1,2,3,4}, you can define the 90th percentile to be 3.6 or 3.7, depending on what interpolation approach you choose. Interpolation is how Microsoft Excel's percentile function works.

Based on this, that is why I'm getting values in Excel for data points that don't exist. PHV are you saying that in Access the formula can't replicate Excel's calculation?

Thanks.

 
The SQL formula I gave you returns an actual point.

BTW, in my previous post I've ommitted to filter by Denom:
Code:
SELECT S.VisitDate, S.Doctor, (SELECT Min(LOS) FROM SampleData A
   WHERE (SELECT Count(*) FROM SampleData B WHERE B.LOS<=A.LOS AND B.VisitDate=A.VisitDate AND B.Doctor=A.Doctor AND B.Denom=1)
    >= 0.90 * (SELECT Count(*) FROM SampleData WHERE VisitDate=A.VisitDate AND Doctor=A.Doctor AND Denom=1)
    AND A.VisitDate=S.VisitDate AND A.Doctor=S.Doctor AND A.Denom=1) AS [90th Percentile]
FROM SampleData AS S
GROUP BY S.VisitDate, S.Doctor

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hi PHV

Thanks very much...but is there a way to calculate in Access to replicate Excel's i.e. interpolate the result so that it's not necessarily a data point? Thanks.
 
As for interpolation, I'm afraid you have to play with VBA.
In a standard code module create the following function:
Code:
Public Function XthPercentile(SQL As String, ColName As String, X As Double) As Variant
Dim Percentile As Double
Dim XthRec As Double
Dim iRec As Long
Dim rst As DAO.Recordset
Set rst = CurrentDb.OpenRecordset(SQL, dbOpenDynaset)
With rst
  If Not (.BOF Or .EOF) Then
    .MoveLast
    If X = 1 Then Percentile = .Fields(ColName)
    .MoveFirst
    If X = 0 Then Percentile = .Fields(ColName)
    If X > 0 And X < 1 Then
      XthRec = 1 + X * (.RecordCount - 1)
      iRec = Int(XthRec)
      .Move iRec - 1
      Percentile = .Fields(ColName)
      XthRec = XthRec - iRec
      If XthRec > 0 Then
        .MoveNext
        Percentile = Percentile + XthRec * (.Fields(ColName) - Percentile)
      End If
    End If
    XthPercentile = Percentile
  End If
  .Close
End With
Set rst = Nothing
End Function
And now the query:
Code:
SELECT A.VisitDate,A.Doctor
, XthPercentile("SELECT LOS FROM SampleData WHERE VisitDate='" & A.VisitDate & "' AND Doctor='" & A.Doctor & "' AND Denom=1 ORDER BY LOS","LOS",0.90) AS [90th Percentile]
FROM SampleData A
GROUP BY A.VisitDate, A.Doctor

If VisitDate isn't a string but a date then replace this:
VisitDate='" & A.VisitDate & "' AND
with this:
VisitDate=#" & A.VisitDate & "# AND

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
shelby55, did you have expected result with the VBA code ?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top