Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

compare subqueries?: Tracking Customer Loyalty 4

Status
Not open for further replies.

Welshbird

IS-IT--Management
Jul 14, 2000
7,378
DE
I have a table with a customer ID and a date of order.

What I need is to produce a table/report/summat that tracks customers by month.

So, I need to group the dates by month, and then count the customers who are new (date < dates in this month), customers returning (customer appears with date < date this month) and somehow, customers lost (customer appears with date < date this month but not with date in this month).

It seems that I want to use an !=ALL subquery, but I just can't get my head around how best to do this.

Do I need to do some PL/SQL loop to move through each month? It seems I must be able to do some group by based on a to_char(date,'Mon-YY') or something.

I somehow know that this can't be as difficult as it appears to me at the moment, but I don't seem to be able to get any closer to a solution!

My original table has many columns, but the only ones I am aiming to use in this are:
Date, Customer (which is a number).

I'd really appreciate any pointers anyone could give me.



Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
Your date requirements look a bit contradictory, so I'm not exactly sure what you're trying to do there. However, the general technique for this sort of thing is to use a count with a case statement e.g.

select
count(case when start_date = trunc(sysdate) then 1 end) as new,
count(case when start_date < trunc(sysdate) then 1 end) as returning
from table

Not sure if that's exactly what you want but you'd need to make the formulation of the date conditions a bit clearer before I can help further.
 
Thanks for such a prompt reply Dagon.

My dates in the table run from March 2003 to end July 2007.

What I want to end up with is a table like the following:

Date New Returning Lost
March 2003 5 0 0
April 2003 0 4 1
May 2003 2 4 0

So, I want to pick each month, and compare the set of customers in that month to all previous months. Then I can count which are returning and which are new. Customers which appear in previous months and no later have somehow to be counted as lost in the first month in which they do NOT appear.

(Seemed like such a useful report when I suggested it in a meeting! That'll learn me...)

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
I don't know how well this will work on a larger data set since it involves a cartesian join.

Code:
DROP TABLE MONTHS;

create table months (START_date date);

insert into months values ('01-jul-07');
insert into months values ('01-aug-07');
 
drop table customer;
 
create table customer (customer_id number, start_date date);
 
insert into customer values (1, '01-aug-07');
insert into customer values (2, '15-JUL-07');
insert into customer values (3, '01-aug-07');
insert into customer values (3, '01-jul-07');
insert into customer values (4, '05-aug-07'); 
insert into customer values (4, '05-may-07');
 
select 
 to_char(start_date, 'MON') as Month,
 count(case when cur_mth_date is null and prv_mth_date is not null then 1 end) as lost_customers,
 count(case when cur_mth_date is not null and prv_mth_date is NOT null then 1 end) as returned_customers,
 count(case when cur_mth_date is not null and prv_mth_date is null then 1 end) as new_customers
 from
 (
 SELECT c.customer_id,
        m.start_date,
        max(case when trunc(c.start_date, 'MM') = m.start_date then c.start_date end) as cur_mth_date,
		max(case when c.start_date < m.start_date then c.start_date end) as prv_mth_date
 from customer c, months m
 group by c.customer_id, m.start_date
 )
group by to_char(start_date, 'MON')
 
Dagon,

You are a complete star - that seems to do exactly what I needed.

I really appreciate your help with this.



Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
At least... it works as long as there is only one year in the data.

(I have three to four years).

But I'm sure I can work that bit out.

Thanks again.

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
Just include the year as part of the group by:

Code:
select 
 to_char(start_date, 'MON-YYYY') as Month,
 count(case when cur_mth_date is null and prv_mth_date is not null then 1 end) as lost_customers,
 count(case when cur_mth_date is not null and prv_mth_date is NOT null then 1 end) as returned_customers,
 count(case when cur_mth_date is not null and prv_mth_date is null then 1 end) as new_customers
 from
 (
 SELECT c.customer_id,
        m.start_date,
        max(case when trunc(c.start_date, 'MM') = m.start_date then c.start_date end) as cur_mth_date,
		max(case when c.start_date < m.start_date then c.start_date end) as prv_mth_date
 from customer c, months m
 group by c.customer_id, m.start_date
 )
group by to_char(start_date, 'MON-YYYY')
 
Aha!

I've found it:
Code:
select
 to_char(start_date, 'MON') as Month,
 count(case when cur_mth_date is null and prv_mth_date is not null then 1 end) as lost_customers,
 count(case when cur_mth_date is not null and prv_mth_date is NOT null then 1 end) as returned_customers,
 count(case when cur_mth_date is not null and prv_mth_date is null then 1 end) as new_customers
 from
 (
 SELECT c.customer_id,
        m.start_date,
        max(case when trunc(c.start_date) = trunc(m.start_date) then c.start_date end) as cur_mth_date,
        max(case when trunc(c.start_date) < trunc(m.start_date) then c.start_date end) as prv_mth_date
 from customer c, months m
 group by c.customer_id, m.start_date
 )
group by to_char(start_date, 'MON')
I realised that if I trunc all the dates, then tehy all become the 1st of that month, so now I seem to get very lovely figures.

Thanks again Dagon.

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
And yet again - I posted too soon.

I have one wee problem remaining.

If a customer has appeared in several months (he bought in Mar-06, April-06, May-06 and then stopped), I really wanted him to appear just once as a lost customer, but my test implies that he appears as 3 lost customers, as he has been lost from three months.

(I did always think that the lost ones would be the most difficult to calculate)

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
Truncating the dates (without the 'MM' parameter) will remove the time component but not bring them down to the start of the month.

The SQL I wrote would only record the customer as lost once, regardless of how many previous months he had appeared in. It takes the maximum of all the months which have a date less than the current month. If there is a value for that and there is no value in the current month, he is a lost customer.
 
DOH!

Please smack me.

I needed to create a table 'MONTHS' as
Code:
create table months as
(select Date from
(select trunc(date, 'MM') as DATE from customer
group by date))
rather than just trying to use the customer table twice....

If only I stopped rushing and thought about it!

Thank you so much.

Virtual second star (as I can't seem to give you a second one)

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
But...

I still have lost more customers than I ever had.

So, I ran the subquery and dropped the data into Excel to try and see if I could work out what was going on.

Now, I have all possibly months in my MONTHS table, which seems sensible. If I only use the months which have that customer number I'll surely never see them lost.

But, if my customer appears in JAN and FEB, I see him as NEW in JAN, Returning in FEB, and then lost in MAR, APR, MAY...

So, my lost figure is much bigger than it ought to be, as a lost customer is showing as lost in all subsequent months. And possibly I have another another problem; if he then comes back he's now returning. When actually if he returns he was never lost...

(My head hurts now)

So, I see data like this:
1116177981_0781666f94_o.jpg


My customer is lost for three months (and gets counted in all of them) and then he returns in a month, so actually he wasn't really lost at all.

I think I might just go home now, and see if I have a flash of inspiration overnight. It does now look more difficult than I had thought it might be.



Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
But, if my customer appears in JAN and FEB, I see him as NEW in JAN, Returning in FEB, and then lost in MAR, APR, MAY...

That is exactly what my query was designed to do. It provides a snapshot of what the situation was in any one particular month.

This appeared to meet your requirements:

So, I want to pick each month, and compare the set of customers in that month to all previous months

Can you define exactly what you mean by:

a) a new customer. Is this just a customer who has only joined in the current month ? Is it a customer who has only ever joined once, regardless of date (in which case I don't see how you differentiate between him and a "lost" customer) ?

b) a returned customer. Again, is this just someone who has returned this month. If they joined in March and then returned in May, what is the April month supposed to show ? Logically, it should show them as lost because, at that point in time, they did appear to be lost. Are you saying you want to show them as returned even though they didn't even have an entry in April.

c) a lost customer. Is this just a customer who hasn't come in during the current month ? If so, what are all the previous months to show for lost and new customers ? Should they all be zero ? How do you separate them from new customers ?

Without a clearer definition of what you're trying to achieve, I think you'll struggle to make much further progress.
 
Sorry - I've not been very good at explaining my requirements.

I need to know new customers - these are new because they have no previous records ever for a previous date.

Returning customers have an earlier record from any time earlier.

Lost customers are those who had a record at some point and do not have a record at a later date. They are lost in the first month only that they show no record. If they reappear, even in x months time then they were never lost.

The business reason for this is that the thing they buy as a customer may last for on month or for many months, so they are not regular customers necessarily.

If they haven't returned for more than one month prior to the maximum date in the data then they count as lost.

Nasty innit.

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
I'll see if I can get a bit closer to the definition.

Lost Customer

Appears in any one month only of the reporting period.
Does not appear in any subsequent months.
Recorded as "new" in the month they appear but "lost" in the next month ?
Not reported in subsequent months.

New Customer

As lost customer ?

Returned Customer
Appears in at least 2 different months of the reporting period.
Reported as "returned" in the month of their first reappearance ? Or last reappearance ? Or just in the last month of the sample ?

 
I suspect I may have led us all down the garden path by beginning to look at each individual month.

So, I'm trying out another plan: I'm numbering each customer visit, and so in any given months, the count of 1's are the new customers and the count of > 1's are the returning customers.

I'm still pondering the lost customers though. Haven't got a plan for that yet. It may be that the max(visit_count) for the customer occurs at least 3 months ago.

I do appreciate all of your help though.

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
So, new and returning works really well now.

I have created a table as follows:
Code:
Create Table Bucket
(
	RecID 				Number not null,
	CompID 				integer null,
	Customer 		  number not null,
	FirstDate 		date not null,
	NextDate 			date null
);
I'm not sure I need nextdate, but just in case I decide to look at length of time between visits I've included it.

I've populated it using the following:
Code:
ALTER TABLE Bucket ADD (
  CONSTRAINT ox_pk PRIMARY KEY (RecID));

DROP SEQUENCE RecID_seq;

CREATE SEQUENCE RecID_seq;

CREATE OR REPLACE TRIGGER Ident_ox 
BEFORE INSERT ON Bucket
FOR EACH ROW
WHEN (NEW.RecID IS NULL)
BEGIN
  SELECT RecID_seq.NEXTVAL
  INTO   :NEW.RecID
  FROM   dual;
END;
/

CREATE INDEX idx_Comp
ON Bucket (CompID);

CREATE INDEX idx_Customer
ON Bucket (Customer);

insert into Bucket (RecId, Customer, FirstDate, CompID )
    select
        RecID_seq.NEXTVAL,
        Customer, 
        Rx_DATE, 
        row_number() over (partition by Customer order by Rx_DATE)
    from
        MAINDATA;
        
update Bucket a set NextDate =
(select FirstDate from Bucket b
  where a.Customer = b.Customer
    and a.compid+1 = b.compid)
where NextDate is null
  and exists
(select FirstDate from Bucket b
  where a.Customer = b.Customer
    and a.compid+1 = b.compid)
So that all seems to work out ok, and my query for new, returning and lost customers is as follows:
Code:
SELECT
To_Char(firstdate,'Mon-YY') AS MONTH,
Count(CASE WHEN compid = 1 THEN 1 END) AS NEW,
Count(CASE WHEN compid > 1 THEN 1 END) AS Returning,
Count(CASE WHEN FIRSTDATE < '01-May-07' AND NEXTDATE IS NULL THEN 1 END) AS LOST
FROM 
bucket
GROUP BY To_Char(firstdate,'Mon-YY')

(I thought I'd post call this just in case someone else is looking to do a similar thing!)

Light dawned eventually that no next_date is sign of a lost customer!



Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
It's a bit overcomplicated for my tastes. I think you could do the same thing directly from the customer table with a query like:

Code:
select month,
       count(case when rn = 1 then 1 end) as new,
	   count(case when rn > 1 then 1 end) as returning,
	   count(case when start_date  < '01-may-07' and rn = 1 and next_start_date is null then 1 end) as lost
from
(select to_char(start_date, 'Mon-YY') as month,
        customer_id,
        start_date,
		row_number() over (partition by customer_id order by start_date) as rn,
        lead(start_date) over (partition by customer_id order by start_date) as next_start_date
   from customer)
group by month
order by to_date('01-'||month, 'DD-MON-YY')

Note you're also counting a customer as returned each time they return rather than just once.
 
That is SO MUCH more elegant than my solution.

Blimey.

(I know I still have a lot to learn, but I say again. Blimey.)

Dagon, if I could award you a whole host of stars I would.

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
Dagon,

because willif can't, and I agree, herewith another purple pointy thingy. Nice analytics - wish I could do that - sigh.

T

Grinding away at things Oracular
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top