Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Min 1 of Top 10% and Max 1 of Bottom 10%??? 1

Status
Not open for further replies.

tqeonline

MIS
Oct 5, 2009
304
US
Hey guys I have a temp table that i'm creating. Then doing some queries off that table.

The goal is to get the Max and Min of the temp table. Then get the 90% item and the 10% item. I'm using the data for a candlestick chart.

This is what I currently have
Code:
WITH TMP_TABLE AS (SELECT DateAdd(Month, DateDiff(Month, 0, BG_CLOSING_DATE), 0) as 'Close'
		  , DATEDIFF(day,[BG_DETECTION_DATE],[BG_CLOSING_DATE]) as 'Diff'
	FROM [default_bsd_qa_db].[dbo].[BUG]
	WHERE [BG_CLOSING_DATE] IS NOT NULL
	AND [BG_STATUS] = 'Closed'
	AND [BG_CLOSING_DATE] > '8/1/2010' 
	AND [BG_CLOSING_DATE] < '12/31/2010'
	AND [BG_DETECTION_DATE] > '8/1/2010' 
	)
SELECT [Close]
	, MIN([Diff]) as 'Min'
	, MAX([Diff]) as 'Max'
	, MIN(Select Top 10 Percent [DIFF] From TMP_TABLE ORDER BY [DIFF] ASC) as '90%'
	, MAX(Select Top 10 Percent [DIFF] From TMP_TABLE ORDER BY [DIFF] DESC) as '10%'
FROM TMP_TABLE
GROUP BY [Close]
ORDER BY [Close]

But it is throwing errors on my Min and Max of the Top 10 Percent. It's an SQL2005 database so i'm pretty sure it should be able to work.

What am I missing?

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
k well just let me know i will publish something simple if you need it.

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
the table would have four columns.
[ID] = Primary key and index
[BG_STATUS] = varchar(15)
[BG_DETECTION_DATE] = DateTime
[BG_CLOSING_DATE] = DateTime

The results would look like this:
1 CLOSED 1/1/2010 2/3/2010
2 CLOSED 3/3/2010 4/3/2010
3 CLOSED 8/6/2010 9/3/2010
4 CLOSED 10/8/2010 11/3/2010
5 CLOSED 11/1/2010 12/1/2010
6 CLOSED 12/1/2010 12/1/2010

Then all the "Duration" would be is Closing - Detection.



- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
Ok, Peter told me to post it. I can do a bit of a history - yesterday when I saw the question I was thinking we can try to do this with the window functions. I was thinking about dense_rank and count and then do some calculation based on them and completely forgot about this very nice WINDOW function (which is not that common) NTILE(). So, when I asked Peter how to simulate top N percent with window function, he immediately pointed me to this function.

Now, your query will become:

Code:
WITH TMP_TABLE AS (SELECT DateAdd(Month, DateDiff(Month, 0, BG_CLOSING_DATE), 0) as 'ClosedOn'
          , DATEDIFF(day,[BG_DETECTION_DATE],[BG_CLOSING_DATE]) as 'Duration'
    FROM [default_bsd_qa_db].[dbo].[BUG]
    WHERE [BG_CLOSING_DATE] IS NOT NULL
    AND [BG_STATUS] = 'Closed'
    AND [BG_CLOSING_DATE] > '8/1/2010' 
    AND [BG_CLOSING_DATE] < '12/31/2010'
    AND [BG_DETECTION_DATE] > '8/1/2010' 
    ),
t1 as (select *, 

NTILE(10) over (partition by [ClosedOn] ORDER BY Duration ) as [PercentGrp] from TMP_TABLE)

select T1.ClosedOn, Max(T1.Duration) as [Max],
MIN(T1.Duration) as [Min],
T2.[90%], T3.[10%]
from T1 
LEFT join (select top 1 ClosedOn, Duration
from T1 where PercentGrp = 10 ORDER BY Duration) T2
ON T1.ClosedOn = T2.ClosedOn
LEFT JOIN (select top 1 ClosedOn, Duration
from T1 where PercentGrp = 1 ORDER BY Duration DESC) T3
on T1.ClosedOn = T3.ClosedOn

I'm not sure if my query now is the best - Peter probably meant something simpler. But I think it should do the job.
Again, this is from the top of my head since you didn't post a simple test case.


PluralSight Learning Library
 
Sorry, I made the same mistake again - just realized. You can not use derived table and TOP. In your case use it as a subquery in the select list (usually don't perform good) or we can do more cte and use row_number() function.

I let you finish it based on the main idea - it should be simple enough.

PluralSight Learning Library
 
it's saying invalid column name '90%' and invalid column name '10%'

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
Matt,

I figured this out (took a short break). It's very simple

Code:
WITH TMP_TABLE AS (SELECT DateAdd(Month, DateDiff(Month, 0, BG_CLOSING_DATE), 0) as 'ClosedOn'
          , DATEDIFF(day,[BG_DETECTION_DATE],[BG_CLOSING_DATE]) as 'Duration'
    FROM [default_bsd_qa_db].[dbo].[BUG]
    WHERE [BG_CLOSING_DATE] IS NOT NULL
    AND [BG_STATUS] = 'Closed'
    AND [BG_CLOSING_DATE] > '8/1/2010' 
    AND [BG_CLOSING_DATE] < '12/31/2010'
    AND [BG_DETECTION_DATE] > '8/1/2010' 
    ),
t1 as (select *, 

NTILE(10) over (partition by [ClosedOn] ORDER BY Duration ) as [PercentGrp] from TMP_TABLE)

select T1.ClosedOn, Max(T1.Duration) as [Max],
MIN(T1.Duration) as [Min],
MAX(case when NTILE = 1 then Duration end) as [90%], --  out of 10 % with lowest duration the max
MIN(case when NTILE = 10 then Duration end) as [10%] -- out of 10 percent with highest duration the min
from T1 
GROUP BY T1.ClosedOn

PluralSight Learning Library
 
it said can't find "NTILE" so i changed the CASE WHEN NTILE to CASE WHEN [PrecentGrp] and it worked

Is that the crrect change?

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
Yes, it's a correct change. Glad you got it working - it sounds like a very interesting problem, so I'll probably post something about it this weekend.

Are the results correct now?

PluralSight Learning Library
 
Yes the results are correct! This is amazing and fast.

So explain to me, if you have time, the NTILE concept.

We are creating a table
Code:
t1 AS
(
	SELECT *,
	NTILE(10) OVER (PARTITION BY [ClosedOn]	ORDER BY [Duration])AS [PercentGrp] 
	FROM TMP_TABLE
)

Then saying

Code:
MAX (CASE WHEN [PercentGrp]  = 1 THEN [Duration] END) AS [10%]

Is it saying break TMP_TABLE into 10% sections and then the MAX is saying grab the 1st ten percent and give me the MAX?

I don't understand the NTIL, OVER, and PARTITION concepts.

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
Have you also tried googling on NTILE() (check BOL first)? I think I read an article about NTILE() before, but it was long time ago and I forgot the details.

Essentially, first we divide everything into 10 groups (so 10%) for each closed date (this is how partition works). So, within each Closed date we created 10 groups based on the Duration - this is the same as 10%, 20%, 30%, etc. based on Duration.

Then after a short unrelated break it dawned on me to use the simple CASE logic.

So, to get max number out of the first 10% we use
MAX(case when GrpPercent = 1 then Duration end)

and to get min from the bottom 10 percent (the highest duration) we use
MIN(case when GrpPercent =10 then Duration end)

Hopefully it's clearer now and as I said, I plan to post a blog based on this problem as it turned out to be quite interesting and I'm not usually using NTILE() function that's why it took me so many bad tries to get to the solution.

------------
Note, that I solved it from the top of my head, but I plan to create a test case for the blog.

PluralSight Learning Library
 
I don't think NTILE() solution is correct.
This query

Code:
USE AdventureWorks2008R2;
GO
;with cte as (SELECT p.FirstName, p.LastName
    ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number'
    ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank'
    ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank'
    ,NTILE(10) OVER (ORDER BY a.PostalCode) AS 'Quartile',
    
    s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s 
    INNER JOIN Person.Person p 
        ON s.BusinessEntityID = p.BusinessEntityID
    INNER JOIN Person.Address a 
        ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL 
    AND SalesYTD <> 0)
    
select cte.*, TP.*, LP.* from cte
LEFT join (select top 10 percent FirstName as TPFName, LastName as TPLName, PostalCode as TPPCode  from cte Order by PostalCode) TP 
on cte.FirstName = TP.TPFName  
and cte.LastName = TP.TPLName 
LEFT join (select top 10 percent FirstName as LPFName, LastName as LPLName, PostalCode as LPPCode  from cte Order by PostalCode DESC) LP 
on cte.FirstName = LP.LPFName  
and cte.LastName = LP.LPLName

shows that bottom 10 percent don't correspond to the group 10.

Also, if the number in NTILE is bigger than number of records, the extra groups are not created.

In other words, we can not rely on NTILE() to be the exact equivalent of the TOP N percent.

I'll continue my research and it looks like it may be a useful addition to the language to allow percentile. May be new PERCENTILE() function in the new SQL Server?

PluralSight Learning Library
 
So if there are only 9 records than there is no group 10.

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
Right, exactly. The right way will be to use row_number() and count(*) functions instead - that was my original idea when I looked at that problem. For the big result sets NTILE() will be close enough.

PluralSight Learning Library
 
Yea basically the only time it wouldn't work is if there aren't 10 defects in a given set. which might be for a couple hours at the beginning of the month.

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
to solve that issue couldn't we just do a CASE statement.


Like
Code:
ISNULL(MAX(CASE WHEN [PercentGrp] = 10 THEN [Duration] END),ISNULL(MAX(CASE WHEN [PercentGrp] = 9 THEN [Duration] END),ISNULL(MAX(CASE WHEN [PercentGrp] = 8 THEN [Duration] END),0)

AS [90%]

Something along those lines that just worked back to [PercentGrp] = 1

- Matt

"If I must boast, I will boast of the things that show my weakness"

- Windows 2003 Server, 98 SE, XP
- VB.NET, VSTS 2010, ASP.NET, EXCEL VBA, ACCESS, SQL 2008
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top