Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Select top n records per customer? 5

Status
Not open for further replies.

mikeopolo

Technical User
May 25, 2003
54
NZ
I'm sure I've seen an example of this on the forum, but couldn't locate it.

I have:

select ;
s.cardrecordid, ;
S.invoicenumber, ;
S.Date, ;
(S.totallines + S.totaltax) as invtotal, ;
S.Outstandingbalance ;
from sales as S ;
left outer join cards as C ;
on S.cardrecordid = C.cardrecordid ;
order by s.date desc ;
into cursor tmpsales readwrite
index on cardrecordid tag cardid

and I would like to select the 5 most recent sales from each customer. I believe a sub-select clause would do it - how should I change this code?

Regards
Mike
 
Providing sample data using the below logic would make it easier to help you.

Code:
create table sales (field1 n(3), field2 n (4))
insert into sales values (33, 44)

 
hi,

first one can't see from your select, what identifies a customer? cardrecordid? Sounds more like an id for some credit card. What if a customer bought via creadit card and something else, like paypal?

I think it should be something like this:
Code:
select c.customerid from customers as c;
left join (Select top 5 s.price, s.invoice, s.outstanding from sales as s;
where s.customerid=c.customerid;
order by s.date desc);
into cursor curLast5sales

Bye, Olaf.
 
Thanks for both inputs. I'll take a more detailed look tomorrow, it's late here now.

btw the cardrecordid is what identifies the customer. Shouldn't there be a group statement in there somewhere?

Many thanks
Mike
 
mikeopolo said:
Shouldn't there be a group statement in there somewhere?

No, the subselect is done for each customerid/cardrecordid, so there is no need for a group clause.

Bye, Olaf.
 
Unfortunately, you can't use TOP n in a correlated sub-query. I actually have a piece on this topic coming up in FPA. The solution doesn't use TOP n at all; I learned it from Fabio Lunardon, one of the newest batch of VFP MVPs.

Here's an example using the TasTrade data. I'll let you translate to your data. This finds the 3 most recent orders from each customer:

Code:
SELECT Customers.CustomerID, CompanyName, ;
       Orders1.OrderID, Orders1.OrderDate ;
	FROM Orders Orders1 ;
	  JOIN Orders Orders2 ;
	    ON Orders1.CustomerID = Orders2.CustomerID ;
	       AND Orders1.OrderDate <= Orders2.OrderDate ;
	  RIGHT JOIN Customers ;
	    ON Customers.CustomerID = Orders1.CustomerID ;
	GROUP BY 1, 2, 3, 4 ;
	HAVING COUNT(*) <= 3 ;
	ORDER BY 1, 4 DESC ;
	INTO CURSOR TopItems

Tamar
 
Tamar,

Very nice example. It deserves a star. However, it doesn't scale very well. Although it doesn't look as elegent, perhaps a SQL-XBASE hybrid solution would be better.

Please note I altered the SQL to key off OrderID to better fit my hypothetical data. It doesn't change the original results.

Brian

Code:
CLOSE DATABASES ALL
CLEAR

SELECT * FROM orders INTO TABLE test_orders
ALTER TABLE test_orders alter COLUMN OrderID n(20)
FOR i = 1 TO 10
 SELECT * FROM orders INTO TABLE temp_orders
 REPLACE ALL OrderDate WITH OrderDate + i, OrderID WITH OrderID * (i+1)
 SELECT test_orders
 APPEND FROM temp_orders
ENDFOR  

x = SECONDS()
SELECT Customers.CustomerID, Customers.CompanyName, ;
    Orders.OrderID, Orders.OrderDate ;
    FROM test_Orders Orders JOIN Customers ;
    ON Orders.CustomerID=Customers.CustomerID ;
    ORDER BY 1,2,3 DESC INTO CURSOR curTemp

SELECT * FROM curTemp INTO CURSOR SQLXBASETopItems WHERE .f. READWRITE 

m.cCustID = ""
m.nOrderCnt = 0

SELECT curTemp
SCAN
 IF ALLTRIM(CustomerID)==m.cCustID = .f.
   m.cCustID = CustomerID
   m.nOrderCnt = 0
 ENDIF
 
 IF nOrderCnt < 3
   m.cName=CompanyName
   m.nOrderID = OrderID
   m.dDate=OrderDate 
   m.nOrderCnt = m.nOrderCnt +1
   INSERT INTO SQLXBASETopItems VALUES (m.cCustID, m.cName, m.nOrderID, m.dDate)
  ENDIF 
ENDSCAN
?"SQL and XBASE Solution Takes "+TRANSFORM(SECONDS() - x)
    
    
x = SECONDS()
SELECT Customers.CustomerID, Customers.CompanyName, ;
       Orders1.OrderID, Orders1.OrderDate ;
    FROM test_Orders Orders1 ;
      JOIN test_Orders Orders2 ;
        ON Orders1.CustomerID = Orders2.CustomerID ;
           AND Orders1.OrderID <= Orders2.OrderID ;
      RIGHT JOIN Customers as Customers;
        ON Customers.CustomerID = Orders1.CustomerID ;
    GROUP BY 1, 2, 3, 4 ;
    HAVING COUNT(*) <= 3 ;
    ORDER BY 1, 4 DESC ;
    INTO CURSOR SQLTopItems
?"SQL Only Solution Takes "+TRANSFORM(SECONDS() - x)

BROWSE NOWAIT 
SELECT SQLXBASETopItems 
LOCATE
BROWSE NOWAIT
 
Hi,

Ah, I see Tamar, no TOP clause in subselects. I've stumbled across this already, but I always forget.

Tricky, but just seeing at HAVING Count(*)<n hurts. It shouldn't scale very well, true. This seems not filtering records, but filtering resultsets out of all possible resultsets.

Instad of forcing some number of results for each customer, it seems much easier to limit with a date.

I'd create an index on bintoc(customerid)+dtos(orderdate) descending. that can be seeked for bintoc(cutomerid), then scanned for the next n records of the same customer. Still I think there should be a faster sql only solution.

Bye, Olaf.
 
SQL/XBASE is 2x faster than this, but here's an efficient SQL solution.

Brian

Code:
x = SECONDS()
SELECT Customers.CustomerID, Customers.CompanyName, ;
    Orders.OrderID, Orders.OrderDate ;
    FROM test_Orders Orders JOIN Customers ;
    ON Orders.CustomerID=Customers.CustomerID ;
    GROUP BY 1, 2, 3, 4 ;
    ORDER BY 1, 2, 3 DESC ;
    WHERE ;
	OrderID in ;
	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders) OR ;
	OrderID in ;
	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders WHERE OrderID NOT in  ;
	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders)) OR ;
    OrderID in ;
 	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders WHERE OrderID NOT in  ;
	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders WHERE OrderID NOT in  ;
	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders)) AND OrderID NOT in  ;
	(SELECT MAX(OrderID) as OrderID GROUP BY CustomerID FROM test_Orders)) ;
	INTO CURSOR SQLTopItems
?"NEW SQL Only Solution Takes "+TRANSFORM(SECONDS() - x)
 
Tamar, many thanks for your code

I have two uses for it (preparing 2 child tables for a multi-detail band report), and sadly one instance takes 30 seconds to complete, the other only takes a second or so. Both instances use two tables of only 2000 records approx. So I have reverted to a solution somewhat like Brian's using a second pass with scan/endscan.

Brian, also thanks for your input, stunning work, I'll give that a go too.

Stars to both of you!

If there's more performance to be gained I'm sure between you it will be found!

Regards
Mike
 
One more alternative...

Code:
SELECT DISTINCT CustomerID FROM Customers INTO ARRAY aa

FOR m.ii = 1 TO ALEN(m.aa)
  SELECT TOP 3 Customers.CustomerID, Customers.CompanyName, ;
    Orders.OrderID, Orders.OrderDate ;
    FROM test_Orders Orders JOIN Customers ;
    ON Orders.CustomerID=Customers.CustomerID ;
    ORDER BY 1,2,3 DESC INTO CURSOR curTemp WHERE Customers.CustomerID=m.aa[m.ii]
    
  IF SELECT("curResult") = 0 
   SELECT * FROM  curTemp INTO CURSOR curResult READWRITE 
  ELSE 
   SELECT curResult 
   APPEND FROM DBF("curTemp") 
  ENDIF 
ENDFOR 
[code]
 
Hi again,

Tamar:
Was there a change in the tastrade database with SP1 or something the like? I had to change your select to:

Code:
close databases all
open database (home()+"samples\tastrade\data\tastrade.dbc")
SELECT Customer.Customer_ID, Company_Name, ;
       Orders1.Order_ID, Orders1.Order_Date ;
    FROM Orders Orders1 ;
      JOIN Orders Orders2 ;
        ON Orders1.Customer_ID = Orders2.Customer_ID ;
           AND Orders1.Order_Date <= Orders2.Order_Date ;
      RIGHT JOIN Customer ;
        ON Customer.Customer_ID = Orders1.Customer_ID ;
    GROUP BY 1, 2, 3, 4 ;
    HAVING COUNT(*) <= 5 ;
    ORDER BY 1, 4 DESC ;
    INTO CURSOR TopItems

And though it's only 92 customers and 1079 orders I would have expected a slower run. It's finished in split seconds. Nice.

Bye, Olaf.
 
Hi Brian,

your efficient SQL solution is about as fast as Tamars for me, but it has 2 less records in the result. I haven't analyzed why, but maybe your result is even more exact. Still it's hard to modify the number of top orders for each customer in this version. The original request was the last 5.

Bye, Olaf.
 
Hi,

here's my solution:
Code:
#Define cnOrders 5 && or 3

t0=Seconds()
Select Sum(temp.Number_Of_Orders);
From ( Select ;
Iif(Sum(Iif(IsNull(Orders.customer_id),0,1))<cnOrders,;
    Sum(Iif(IsNull(Orders.customer_id),0,1)),cnOrders) as Number_of_Orders;
From customer Left join Orders;
on Orders.Customer_ID=Customer.Customer_ID;
group by Customer.Customer_ID) temp;
Into Array lnOrders

SELECT Top lnOrders;
    Customer.Customer_ID, Customer.Company_Name, ;
    Orders.Order_ID, Orders.Order_Date ;
    FROM Orders JOIN Customer;
    ON Orders.Customer_ID=Customer.Customer_ID ;
    GROUP BY 1, 2, 3, 4 ;
    ORDER BY 1, 2, 3, 4  DESC;
    into Cursor LastNOrders
? Seconds()-t0

Bye, Olaf.
 
Hi again,

the first select can be simplified to:
Code:
Select Sum(temp.Number_Of_Orders);
From ( Select ;
Iif(Count(*)<cnOrders,;
    Count(*),cnOrders) as Number_of_Orders;
From Orders;
Group by Orders.Customer_ID) temp;
Into Array lnOrders

Bye, Olaf.
 
Sorry,

I verified the result and had some customers with more than the wanted amount of last orders. My assumption was wrong, that the result would be the last records of the orders table, but that can be wrong of course, because a customer may have ordered 20 times in the last year while another one only had 1 order that year.

Bye, Olaf.

ps: verification select was this:
Code:
Select customer_id,Count(*) from LastNOrders;
   group by customer_id having Count(*)>cnOrders
That should of course have an empty result.
 
Hi again,

and noe my xBase solution. And this one works:
Code:
#Define cnOrders 5

Close Databases All
Open Database (Home()+"samples\tastrade\data\tastrade.dbc")

* initialization, don't want to modify the orginal tastrade dbc, so copy orders to cursor:
Select * From orders Into Cursor temp_orders Readwrite
* add an index on customer foreign key plus date
Index On Customer_ID+Dtos(Order_Date) Tag xCstOrdDat Descending
* all this could be done once and permanent

* now it counts:
t0=Seconds()
Local loRecord, lcCustomerID
llExact = (Set("Exact")="ON")
Set Exact Off
Create Cursor curResult(;
   Customer_ID C(6), Company_Name C(40), ;
   Order_ID C(6), Order_Date D)
Select 0
Use Customer In 0
Scan && all customers
   m.lcCustomerID = Customer.Customer_ID
   * find last order
   If Seek(m.lcCustomerID,"temp_orders","xCstOrdDat")
      Scatter Fields Customer_ID, Company_Name;
         Name m.loRecord
      Select temp_orders
      * scan top 5 orders of the customer
      Scan Next cnOrders While temp_orders.Customer_ID = m.lcCustomerID
         Scatter Fields Order_ID, Order_Date;
            Name m.loRecord Additive
         Insert Into curResult From Name m.loRecord
      Endscan
   Endif
Endscan
If llExact
   Set Exact On
Endif
? Seconds()-t0

This is making use of some new VFP9 features like Scatter Additional and Insert From Name.

Bye, Olaf.
 
Hi Mike,

although this code focused on getting the data result, yes I seldom use m. and now didn't do it wholeheartedly. Sorry.

Mike said:
Personally I've taken to using m. on every possible memory variable reference.

According to this all I did wrong was to not use m. on t0 and llExact ;-). Additionally I didn't declare llExact as LOCAL variable.

What's accellerating the whole thing most in my eyes is the new INSERT INTO ... FROM NAME Ojectname. But I didn't investigate it really with COVERAGE.

If we want to discuss variable scopes and secure coding, we should also reject any private variable that may have come from outside with PRIVATE ALL as the first line (which causes the end-of-scope of any private variable trying to "get in"). Then again, if you do that you may never profit from that kind of variable scoping. I personally don't do anyway, but if you use some framework or third party class you should be prepared for anything.

If I'd always tried to post perfect code, I'd not post that much. For example I very rarely also post parameter checks (eg as ASSERTs).

Bye, Olaf.
 
<<TasTrade>>

That was a braino. I used NorthWind, not TasTrade. Same basic idea.

BAltman--did you test scalability for my solution? I haven't, but given how fast Rushmore is, I'll bet it does a pretty good job.

Tamar
 
Hi Tamar,

My 1st post multiplied orders by a factor of 10 before testing. I suspect that before the HAVING is applied the cartesian results quickly get out of hand.

Brian
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top