sql for number of new endusers 2

robertsquestion · Sep 27, 2010

Hi,

I'm using a database on a SQL-server that contains the sales from a few dealers to different endusers per period. So each dealer has got his own endusers. Now the goal is to analyse the number of new endusers compared to the previous period. Outcome should be grouped (on row-level) per enduser age-group (0-40, 40 - 60, >60). There are only 2 periods per year. The number of new endusers should be displayed as a total and also split per gender.

An additional issue is that only dealers that have sales in both the current and previous period should be considered.

So let's say we're using table 'T_Sales' with the following fields + example records:
Dealernr / Endusernr / Birthyear / Gender / Period / Sales
100 / 9000 / 1950 / M / 200901 / 200
100 / 9200 / 1960 / M / 200901 / 180
100 / 9000 / 1950 / M / 200902 / 250
100 / 9300 / 1940 / M / 200902 / 420 (*)
100 / 9400 / 1960 / M / 200902 / 350 (*)
100 / 9500 / 1985 / F / 200902 / 180 (*)
200 / 8500 / 1920 / M / 200902 / 260
200 / 8600 / 1945 / M / 200902 / 300

I've marked the records that contain new endusers with a (*). The endusers from dealer 200 are not considered because there are no sales for this dealer in previous period (200901).

So the result would be:
Age-group descr. / Total / Male / Female
Age-group 0 - 40 / 1 / 0 / 1
Age-group 40 - 60 / 1 / 1 / 0
Age-group >60 / 1 / 1 / 0

Now I'm quite an amateur when it comes to SQL.. I can do this with 6 different queries and finally in the MS Access frontend with a crosstab-query. But I was hoping that maybe someone out there knows a more clever SQL-solution. As the table is quite big, the performance of my current solution is quite bad..

To make things worse, I need the same thing for the 'disappeared' endusers (so endusers with sales in periode 200901 and no sales in periode 200902 for dealers that have sales in both periods). But hopefully I can figure this one out myself if someone knows the solution for the new endusers.

I hope someone can help me out!

Thanks,
Robert
The Netherlands

markros · Sep 27, 2010

Try something like:

Code:

;with Ages as (select 1970 as MinAge, 2009 as MaxAge, 'Age-group 0-40' as [Age-group descr.]
union all
select 1950 as MinAge, 1969 as MaxAge, 'Age-group 40-60' 
union all 1800 as MinAge, 1949 as MaxAge, 'Age-group > 60'),
Info as (select T.DealerNr, Ages.[Age-group descr.],
sum(case when Period = '200902' then 1 else 0 end) as Total,
sum(case when Period = '200902' and Gender = 'M' else 0 end) as Male,
sum(case when Period = '200902' and Gender = 'F' else 0 end) as Female
from Table T inner join Ages on
T.BirthYear between Ages.MinAge and Ages.MaxAge
group by T.DealerNr, Ages.[Age-group descr.]
having sum(case when Period ='200901' then 1 else 0 end) >= 1
 AND SUM(CASE WHEN Period ='200902' THEN 1 ELSE 0 END) >= 1)

select [Age-group descr.],sum(Total) as Total, sum(Male) as Male, sum(Female) as Female
from Info group by [Age-group descr.] -- to remove DealerNr group

From the top of my head - not tested.

PluralSight Learning Library

gmmastros · Sep 27, 2010

If I understand correctly, you just recently upgraded your database from Microsoft Access to Microsoft SQL Server, Right? When this happens, some times people are discouraged because the performance isn't any better than it was with Access, which is contrary to popular opinion. The problem is... Access will sometimes do nice this for you automatically, but SQL Server doesn't. Specifically, SQL Server will not automatically create indexes for you. Indexes are used to speed up queries. Be careful that you do not create too many indexes because it can negatively impact inserts, updates, and deletes.

Why do I mention indexes? Because you said "As the table is quite big, the performance of my current solution is quite bad." I'm reasonably certain that you can improve performance by writing better queries. Sometimes this requires adding additional indexes to your table(s).

Before I begin showing you a solution to your problem, it's important for you to understand aggregates. Simply put, an aggregate is a function that is applied to a group of data, for example, SUM, MIN, MAX, COUNT, AVG, etc... All of these functions work on a group of rows specified in the Group By clause. Another important thing to understand is that each of these functions ignore NULLs. For example, if you have 3 rows in your group, and one row contains NULL in the Blah column, then COUNT(Blah) = 2 and Count(*) = 3. We can use this to our advantage by carefully controlling the 'thing' that we are counting.

The first thing I thought about when I read your question is filtering for the dealers we care about. So, we first write a query that returns only the dealer number. Like this:

Code:

	Select	Dealernr
	From	T_Sales
	Where	Period In (200902, 200901)
	Group By Dealernr
	Having Count(Distinct Period) = 2

Next, I thought about getting just the end users for those dealers. So, we need to account for all the data, for just for the dealers returned in the previous query. How do we do this? Basically, we can do an inner join between your table and the results we get from the previous query. There are a couple ways to do this. In the query I show you below, I use a common table expression (CTE).

Code:

;With Dealers As
(
	Select	Dealernr
	From	T_Sales
	Where	Period In (200902, 200901)
	Group By Dealernr
	Having Count(Distinct Period) = 2
)
Select  T.*
From    T_Sales T
        Inner Join Dealers
          On T.dealernr = Dealers.dealernr

Notice that the query shown above returns all the data from the sales table but only for dealers that had sales in both periods. This is an important concept to understand. You see, we are filtering our data based on the results of a query. Make use you understand this concept before moving on because it's a "trick" that you will likely use numerous times.

Next, let's write a query that filters out our end users based on whether they are recurring, new, or disappeared. This is where we will use the NULL count trick I mentioned earlier.

** The following query assumes that each end user will appear in the sales table once for each period.

Code:

;With Dealers As
(
	Select	Dealernr
	From	T_Sales
	Where	Period In (200902, 200901)
	Group By Dealernr
	Having Count(Distinct Period) = 2
)
Select  T.dealernr, endusernr
From    T_Sales T
        Inner Join Dealers
          On T.dealernr = Dealers.dealernr
Group By T.dealernr, endusernr
--To get dealers in both periods
--Having Count(*) > 1

-- To get New dealers
Having Count(Case When Period = 200901 Then 1 End) = 0

-- To get disappearing dealers
--Having Count(Case When Period = 200902 Then 1 End) = 0

The next step is to cross tab your data. Again, there are various ways to do this. Some people prefer the pivot operator, but I can't seem to commit the syntax to memory, so I usually end up using the NULL Count method, like this...

Code:

;With Dealers As
(
  Select   Dealernr
  From	   T_Sales
  Where	   Period In (200902, 200901)
  Group By Dealernr
  Having   Count(Distinct Period) = 2
),
EndUsers As(
Select T.dealernr, endusernr
From   T_Sales T
       Inner Join Dealers
         On T.dealernr = Dealers.dealernr
Group By T.dealernr, endusernr
--To get dealers in both periods
--Having Count(*) > 1

-- To get New dealers
Having Count(Case When Period = 200901 Then 1 End) = 0

-- To get disappearing dealers
--Having Count(Case When Period = 200902 Then 1 End) = 0
)
Select Case When Year(GetDate()) - BirthYear < 41 Then 'Age Group 0 - 40'
            When Year(GetDate()) - BirthYear < 61 Then 'Age Group 41 - 60'
            Else 'Age Group > 60' End,
       Count(*) As Total,
       Count(Case When Gender = 'M' Then 1 End) As 'Male',
       Count(Case When Gender = 'F' Then 1 End) As 'Female'
From   T_Sales T
       Inner Join EndUsers
         On T.EndUserNr = EndUsers.EndUserNr
Group By Case When Year(GetDate()) - BirthYear < 41 Then 'Age Group 0 - 40'
            When Year(GetDate()) - BirthYear < 61 Then 'Age Group 41 - 60'
            Else 'Age Group > 60' End

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

robertsquestion · Sep 27, 2010

Hi George and Markros,

Thanks a lot for your quick responses! I will try your solutions and let you know the results.

George, I'm not really disappointed by the performance of the database that is running via SQL-server now. It's quite an improvement compared to running it via MS Access. But I hoped (considering your excellent answer to my previous question) that the query setup could be a bit more sophisticated which will probably help a lot for improving the performance. And looking at both your answers, I think this will be a lot better than my solution..

So thanks again George and Markros, your help is very much appreciated!

Regards,
Robert

robertsquestion · Sep 28, 2010

Hi George/Markros,

Sorry, I'm really a newbee, can't get it working..
As I didn't manage to use your code, I've copied a very simple CTE to an SQL view (just to get any CTE working):

;with MyCTE(x)
as
(select x='hello')
select x from MyCTE

When I use the 'verify-button', I get the message:
'Unable to parse query text'
When I use the 'run-button', I get the message:
Line 1: incorrect syntax near ';'.

(also tried it without the ';' as first character of the code, but that also gives the 'Unable to parse-error')

I really would like to learn about this subject, but obviously need some more help.. Hope someone can give me a hint here!

Thanks,
Robert

gmmastros · Sep 28, 2010

Run this:

Select @@Version

Post the output here. I suspect you are using SQL2000, but would like to make sure before I advise further.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

robertsquestion · Sep 28, 2010

Hi George,

You guessed right, I'm using Microsoft SQL Server 2000..
Sorry, should have mentioned that. So I guess that I cannot use the CTE functionality now?

Thanks again,
Robert

gmmastros · Sep 28, 2010

Nope. CTE was first introduced in SQL2005. No big deal. Truth is, the only REAL benefit of CTE is when you are using recursion. Otherwise, it's mostly a formatting thing. We can easily re-write the query use a derived table approach. Like this:

Code:

Select Case When Year(GetDate()) - BirthYear < 41 Then 'Age Group 0 - 40'
            When Year(GetDate()) - BirthYear < 61 Then 'Age Group 41 - 60'
            Else 'Age Group > 60' End,
       Count(*) As Total,
       Count(Case When Gender = 'M' Then 1 End) As 'Male',
       Count(Case When Gender = 'F' Then 1 End) As 'Female'
From   T_Sales T
       Inner Join 
         (
         Select T.dealernr, endusernr
         From   T_Sales T
                Inner Join (
                  Select   Dealernr
                  From       T_Sales
                  Where       Period In (200902, 200901)
                  Group By Dealernr
                  Having   Count(Distinct Period) = 2
                  ) As Dealers
                  On T.dealernr = Dealers.dealernr
         Group By T.dealernr, endusernr
         --To get dealers in both periods
         --Having Count(*) > 1
         
         -- To get New dealers
         Having Count(Case When Period = 200901 Then 1 End) = 0
         
         -- To get disappearing dealers
         --Having Count(Case When Period = 200902 Then 1 End) = 0
         ) As EndUsers
         On T.EndUserNr = EndUsers.EndUserNr
Group By Case When Year(GetDate()) - BirthYear < 41 Then 'Age Group 0 - 40'
            When Year(GetDate()) - BirthYear < 61 Then 'Age Group 41 - 60'
            Else 'Age Group > 60' End

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

robertsquestion · Sep 28, 2010

Hi George,

I was getting worried now, but reading your last reply I'm already feeling a lot better! Thanks for adjusting the code, I will give it another try.

Regards,
Robert

gmmastros · Sep 28, 2010

I'm a little confused about something. Since you are just now upgrading to SQL Server, why did you pick a database engine that is 10 years old? There are 3 newer versions you could have picked (SQL 2005, SQL 2008, and SQL 2008 R2). There are free versions of each of these database engines.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

robertsquestion · Sep 29, 2010

Hi George,

The SQL server it's running on was already in use for other applications, but I haven't used it before. So I'm not really the owner of the SQL-environment. I've already asked the responsible person to check for these free versions, so maybe we can upgrade.

I've tested the adjusted code (so the last one, without using CTE and running on SQL Server 2000..). The good news is: I get some outcome!
The bad news is:

1. when I ran it for the first time, I got the message: "Timeout Expired". So I tried it again, but got the same error. After trying another time, I suddenly did get a result.

2. The layout I get out is exactly what I need. But the data (so number of new customers for male/female/total) doesn't look correct. Total number of new customers is around 13 times the 'real' total number of new customers. My guess is that is has to do with the fact that in the background we have data on article-level, so maybe it's counting endusers more than once. But in T_Sales I already grouped on "Dealerrnr / Endusernr / Birthyear / Gender / Period / Sales (sum). So I've got one line per dealer/endusernr/period.
But I will split your code in parts, maybe I can find this one myself..

Thanks for your support.
Robert

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

sql for number of new endusers 2

robertsquestion

Technical User

markros

Programmer

gmmastros

Programmer

robertsquestion

Technical User

robertsquestion

Technical User

gmmastros

Programmer

robertsquestion

Technical User

gmmastros

Programmer

robertsquestion

Technical User

gmmastros

Programmer

robertsquestion

Technical User

Similar threads

Part and Inventory Search

Sponsor