Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Select top n records per customer? 5

Status
Not open for further replies.

mikeopolo

Technical User
May 25, 2003
54
NZ
I'm sure I've seen an example of this on the forum, but couldn't locate it.

I have:

select ;
s.cardrecordid, ;
S.invoicenumber, ;
S.Date, ;
(S.totallines + S.totaltax) as invtotal, ;
S.Outstandingbalance ;
from sales as S ;
left outer join cards as C ;
on S.cardrecordid = C.cardrecordid ;
order by s.date desc ;
into cursor tmpsales readwrite
index on cardrecordid tag cardid

and I would like to select the 5 most recent sales from each customer. I believe a sub-select clause would do it - how should I change this code?

Regards
Mike
 
Hi baltman,

while it's true the concept of HAVING is to inspect the result after it is built, rushmore can optimize HAVING clauses before building the result and does so. Tamars Select even has no subselect, just a self join.

Unfortunately we have not that detailed insight in what the sql engine does.

If it would start with some record in Orders2, for which there is just 1 record in Orders1 fulfilling the join condition (the latest order of a customer), it could also take that as partial result and stop checking for other records of Orders2. But the result is okay, so the mechanism seems to get the right idea of what we want and seems to analyze which record of Orders2 is needed to limit the result to N records per group, if there are.

Bye, Olaf.
 
Hi. Even with indexes on CustomerID

Tamar's solution takes 10.5+ seconds vs the sql/dbase of .07 seconds for a top 3 using Order records * 10 (9,130 orders).

For Order records * 15 (13,280 orders) the difference widens to 32+ seconds vs .08 seconds.

Code:
CLOSE DATABASES ALL
CLEAR

USE (HOME()+"samples\northwind\orders") IN 0
USE (HOME()+"samples\northwind\Customers") IN 0


SELECT * FROM orders INTO TABLE test_orders
ALTER TABLE test_orders alter COLUMN OrderID n(20)
FOR i = 1 TO 10
 SELECT * FROM orders INTO TABLE temp_orders
 REPLACE ALL OrderDate WITH OrderDate + i, OrderID WITH OrderID * (i+1)
 SELECT test_orders
 APPEND FROM temp_orders
ENDFOR  

SELECT test_orders
INDEX ON CustomerID TAG tagOrder

SELECT Customers
INDEX ON CustomerID TAG tagCust

x = SECONDS()
SELECT Customers.CustomerID, Customers.CompanyName, ;
    Orders.OrderID, Orders.OrderDate ;
    FROM test_Orders Orders JOIN Customers ;
    ON Orders.CustomerID=Customers.CustomerID ;
    ORDER BY 1,2,3 DESC INTO CURSOR curTemp

SELECT * FROM curTemp INTO CURSOR SQLXBASETopItems WHERE .f. READWRITE 

m.cCustID = ""
m.nOrderCnt = 0

SELECT curTemp
SCAN
 IF ALLTRIM(CustomerID)==m.cCustID = .f.
   m.cCustID = CustomerID
   m.nOrderCnt = 0
 ENDIF
 
 IF nOrderCnt < 3
   m.cName=CompanyName
   m.nOrderID = OrderID
   m.dDate=OrderDate 
   m.nOrderCnt = m.nOrderCnt +1
   INSERT INTO SQLXBASETopItems VALUES (m.cCustID, m.cName, m.nOrderID, m.dDate)
  ENDIF 
ENDSCAN
?"SQL and XBASE Solution Takes "+TRANSFORM(SECONDS() - x)
    
    
x = SECONDS()
SELECT Customers.CustomerID, Customers.CompanyName, ;
       Orders1.OrderID, Orders1.OrderDate ;
    FROM test_Orders Orders1 ;
      JOIN test_Orders Orders2 ;
        ON Orders1.CustomerID = Orders2.CustomerID ;
           AND Orders1.OrderID <= Orders2.OrderID ;
      RIGHT JOIN Customers as Customers;
        ON Customers.CustomerID = Orders1.CustomerID ;
    GROUP BY 1, 2, 3, 4 ;
    HAVING COUNT(*) <= 3 ;
    ORDER BY 1, 4 DESC ;
    INTO CURSOR SQLTopItems
?"Original SQL Only Solution Takes "+TRANSFORM(SECONDS() - x)

BROWSE NOWAIT 
SELECT SQLXBASETopItems 
LOCATE
BROWSE NOWAIT
 
Hi Brian,

Okay, you earn it, star awarded for the far better performance.

Still Tamars solution is tricky and interesting. If you put min(order2.OrderDate) and Count(*) in the result it get's clearer, how it works and if you remove count(*) and grouping, it's clear how many virtual records this select is creating and counting and why it scales so badly, even worse if you raise the limit of top items for each customer.

Put these indexes on test_orders:
Code:
SELECT test_orders
Index On CustomerID Tag tagOrder
Index ON OrderDate Tag tagDate && better for Tamars select
Index On CustomerID+Dtos(OrderDate) Tag tagCstDat Descending

Then measure the time my solution takes, by adding this to your code, adjusted to your naming and Northwind instead of Tastrade:
Code:
Local loRecord, lcCustomerID, llExact
llExact = (Set("Exact")="ON")
Set Exact Off
Create Cursor curResult(;
   CustomerID C(6), CompanyName C(40), ;
   OrderID N(20), OrderDate D)
Select Customers
Scan && all customers
   Store Customers.CustomerID To lcCustomerID
   * find last order
   If Seek(m.lcCustomerID,"test_orders","tagCstDat")
      Scatter Fields CustomerID, CompanyName;
         Name loRecord
      Select test_orders
      * scan top 3 orders of the customer
      Scan Next 3 While test_orders.CustomerID = m.lcCustomerID
         Scatter Fields OrderID, OrderDate;
            Name loRecord Additive
         Insert Into curResult From Name m.loRecord
      Endscan
   Endif
Endscan
If m.llExact
   Set Exact On
Endif

Takes about half the time for me.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top