Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

MIN function question 2

Status
Not open for further replies.

Queryman

Programmer
Nov 4, 2002
243
US
select
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
A.EXACT_DISTANCE as DISTANCE
from
BA_TEMPDB.pharmacy_distance A,
N433M4_ALL_INDIES_GT_10MILES B
where
A.ACCOUNT1 = B.ACCOUNT_NO
This gives me over 52000 rows and multiple instances of account1, I need to select
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
A.EXACT_DISTANCE as DISTANCE
where it selects the closest account2 based on distance. I know I need some kind of derived query where I need to use the min function for distance, but stumped as to how to word the query.
Thanks!

QueryMan

 
I'm not quite clear on the use of each table. Do Account1 and Account2 come from the same table as well as the distance? What is the purpose of the table, N433M4_ALL_INDIES_GT_10MILES?

Given the query you've posted, would the following provide what you need? Is N433M4_ALL_INDIES_GT_10MILES even required to determine the distances?

Select
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
Min(A.EXACT_DISTANCE) as DISTANCE
From BA_TEMPDB.pharmacy_distance A
Inner Join N433M4_ALL_INDIES_GT_10MILES B
ON A.ACCOUNT1 = B.ACCOUNT_NO
Group By A.ACCOUNT1

Wouldn't the following provide the same result?

Select
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
Min(A.EXACT_DISTANCE) as DISTANCE
From BA_TEMPDB.pharmacy_distance A
Group By A.ACCOUNT1 Terry L. Broadbent - DBA
SQL Server Page:
If you want to get the best answer for your question read faq183-874.
 
There is a table with distances between account_nos, it has account1, account2 & distance in it. To get account1 was a separate step, account2 was arrived at by eliminating all the matches for account1 where distance was <=10. So now we have a table with ~52k rows, but there are only 896 unique account1, I need to deteremine the closest account2 along with the distance to it for each of the 896 account1.
Thanks much.

QueryMan

 
How about getting the least distance for Account1 (derived table), then getting the associated account 2?

Select SubT1.acct1, Acctb1.Account2, SubT1.mdist
From (select Account1, Min(Distance)
from Acctbl
group by Account1) SubT1(acct1, mdist),
Acctb1
Where SubT1.acct1 = Acctb1.Account1 and
SubT1.mdist = Acctb1.Distance
 
How about this?

SELECT
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
A.EXACT_DISTANCE as DISTANCE
FROM BA_TEMPDB.pharmacy_distance A
INNER Join N433M4_ALL_INDIES_GT_10MILES B
ON A.ACCOUNT1 = B.ACCOUNT_NO
WHERE A.EXACT_DISTANCE =
(SELECT MIN(c.EXACT_DISTANCE)
FROM BA_TEMPDB.pharmacy_distance c
INNER Join N433M4_ALL_INDIES_GT_10MILES d
ON c.ACCOUNT1 = d.ACCOUNT_NO
WHERE c.ACCOUNT1=a.ACCOUNT1) Terry L. Broadbent - DBA
SQL Server Page:
If you want to get the best answer for your question read faq183-874.
 
Terry,
I did an explain on this query since it seemed to take a long time to run & this is what I got back. I think the DBA's frown on the product joins
Explanation
1) First, we lock a distinct BA_TEMPDB.&quot;pseudo table&quot; for read on a
RowHash to prevent global deadlock for BA_TEMPDB.c.
2) Next, we lock a distinct TEMPDB.&quot;pseudo table&quot; for read on a
RowHash to prevent global deadlock for TEMPDB.d.
3) We lock BA_TEMPDB.c for read, and we lock TEMPDB.d for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from BA_TEMPDB.c by way of an
all-rows scan with no residual conditions into Spool 2, which
is redistributed by hash code to all AMPs. The size of Spool
2 is estimated with low confidence to be 73,320,048 rows.
The estimated time for this step is 57.15 seconds.
2) We do an all-AMPs RETRIEVE step from BA_TEMPDB.A by way of an
all-rows scan with no residual conditions into Spool 3, which
is redistributed by hash code to all AMPs. Then we do a SORT
to order Spool 3 by row hash. The size of Spool 3 is
estimated with low confidence to be 73,320,048 rows. The
estimated time for this step is 2 minutes and 25 seconds.
5) We do an all-AMPs JOIN step from TEMPDB.d by way of an all-rows
scan with no residual conditions, which is joined to Spool 2 (Last
Use). TEMPDB.d and Spool 2 are joined using a product join, with
a join condition of (&quot;ACCOUNT1 = TEMPDB.d.ACCOUNT_NO&quot;). The
result goes into Spool 4, which is built locally on the AMPs. The
result spool file will not be cached in memory. The size of Spool
4 is estimated with index join confidence to be 29,914,579,584
rows. The estimated time for this step is 19 hours and 28 minutes.
6) We do a SUM step to aggregate from Spool 4 (Last Use) by way of an
all-rows scan, and the grouping identifier in field 1. Aggregate
Intermediate Results are computed locally, then placed in Spool 6.
The aggregate spool file will not be cached in memory. The size
of Spool 6 is estimated with no confidence to be 26,923,121,626 to
29,914,579,584 rows.
7) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of
an all-rows scan into Spool 5, which is built locally on the AMPs.
Then we do a SORT to order Spool 5 by row hash. The result spool
file will not be cached in memory. The size of Spool 5 is
estimated with no confidence to be 26,923,121,626 rows.
8) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
all-rows scan, which is joined to Spool 5 (Last Use). Spool 3 and
Spool 5 are joined using an inclusion merge join, with a join
condition of (&quot;(ACCOUNT1 = ACCOUNT1) AND (EXACT_DISTANCE = Field_3)&quot;).
The result goes into Spool 8, which is built locally on the AMPs.
The size of Spool 8 is estimated with index join confidence to be
73,320,048 rows. The estimated time for this step is 1 hour and 9
minutes.
9) We do an all-AMPs JOIN step from TEMPDB.B by way of an all-rows
scan with no residual conditions, which is joined to Spool 8 (Last
Use). TEMPDB.B and Spool 8 are joined using a product join, with
a join condition of (&quot;ACCOUNT1 = TEMPDB.B.ACCOUNT_NO&quot;). The
result goes into Spool 1, which is built locally on the AMPs. The
result spool file will not be cached in memory. The size of Spool
1 is estimated with index join confidence to be 29,914,579,584
rows. The estimated time for this step is 4 hours and 20 minutes.
10) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.


QueryMan

 
Teradata, maybe I should post this question in that forum.

QueryMan

 
Terry,
So much for the explain query part, I took a chance of getting into th eDBA penalty box & ran this even though it said it would take more than a day to run this. It worked perfectly. Am awarding a star to you!

QueryMan

 
This looks like a typical case for the RANK function.
This may be much faster than the correlated subquery:

SELECT
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
A.EXACT_DISTANCE as DISTANCE
FROM BA_TEMPDB.pharmacy_distance A
INNER Join N433M4_ALL_INDIES_GT_10MILES B
ON A.ACCOUNT1 = B.ACCOUNT_NO
QUALIFY
RANK() OVER (PARTITION BY a.ACCOUNT1
ORDER BY A.EXACT_DISTANCE ASC) = 1
;

QUALIFY is an Teradata extension, so in ANSI SQL it's:
SELECT
ORIGINAL_ACCT_NO,
ACCT_MATCH,
DISTANCE
FROM
(
SELECT
A.ACCOUNT1 AS ORIGINAL_ACCT_NO,
A.ACCOUNT2 as ACCT_MATCH,
A.EXACT_DISTANCE as DISTANCE,
RANK() OVER (PARTITION BY a.ACCOUNT1
ORDER BY A.EXACT_DISTANCE ASC) AS rnk
FROM BA_TEMPDB.pharmacy_distance A
INNER Join N433M4_ALL_INDIES_GT_10MILES B
ON A.ACCOUNT1 = B.ACCOUNT_NO
) dt
WHERE rnk = 1
;

Dieter
 
I wasn't sure you were using Teradata, I should have guessed from your name. Since you are you can use the RANK function to get your answer:

Select Account1, Account2, Distance
From AccountTable
Group By 1
Qualify RANK(Distance) = 1 ;

If this gives the longest distance instead of shortest, make the RANK descending RANK(Distance DESC)
 
Dieter, that worked too w/o a product join. I will be able to use these methods in the future if I encounter similar issues.

Thanks to you both, I really appreciate it.


QueryMan

 
I just noticed something, I have a few duplicates of account1 where the distance to account2 are the same in some instances it picks all of those instances because they all fall under the MIN, is there a way to restrict that to just the 1st instance?
Thanks again.

QueryMan

 
Just add another column in the RANK's ORDER BY, which makes it unique.
e.g.;
RANK() OVER (PARTITION BY a.ACCOUNT1
ORDER BY A.EXACT_DISTANCE ASC,
a.ACCOUNT2 ASC) AS rnk

Dieter
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top