Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Counting non-exclusives in a query

Status
Not open for further replies.

tyemm

Technical User
Feb 28, 2005
66
US
I will try to describe briefly this challenge:
A doctor (MD) can be hired at one or more of four facilities, A,B,C & D, during any month between 2006 and the present. Thus the table MD contains the doctor ID field (MD) and four Date fields. When the doctor is hired by a facility, he must go to Employee Health one time. If he gets hired by as many as four facilities, he still only has to go to Employee Health the first time.
Employee Health wants to know how many visits they had in each month over this period, and of course never count the same doctor twice.
I have queried for the "exclusives" for a facility in a month (hire date B in range, and nulls in A, C & D), and queried the non-exclusives (hire date B in month range, but some (any) other date for that MD in either A, C or D.
Perhaps this is more of a mathematical puzzle, but I would sure like to solve it cleanly, such that my weighted totals for each facility for each month add up to the total number of new hires (a known quantity!).
Thank you!
Tom
 
so create a query that normalizes your data:

Code:
SELECT MD, A_Date As StartDate, "A Info" As Location FROM TableName WHERE A_Date is not null
UNION
SELECT MD, B_Date, "B Info" FROM TableName WHERE B_Date is not null
SELECT MD, C_Date, "C Info" FROM TableName WHERE C_Date is not null
UNION
SELECT MD, D_Date, "D Info" FROM TableName WHERE D_Date is not null

This will give you:
[tt]
MD StartDate Location
1 1/1/2006 A
1 2/15/2006 C
1 3/18/2006 D
2 5/15/2006 A
3 2/16/2008 C
[/tt]

now that this data is normalized it will be much easier to get the information you are after; you'll be using the normalized query as the source for your report now.

But in order to help you more I'll need to know where do you store information about when the MD took the Employee Health course?




Leslie

In an open world there's no need for windows and gates
 
Wow, thank you.
That got me far down the road...
Using my own nomenclature, I have this code:
SELECT DRName, [LIJ Appt Date] As [Start Date], "LIJ" As Facility From [2006 New Hires EHS NSLIJHS]
Where [LIJ Appt Date] is not null
UNION
SELECT DRName, [NSUH Appt Date] as [Start Date], "NSUH" as Facility From [2006 New Hires EHS NSLIJHS]
Where [NSUH Appt Date] is not null
UNION
SELECT DRName, [GC Appt Date] as [Start Date], "GC" as Facility from [2006 New Hires EHS NSLIJHS]
Where [GC Appt Date] is not null
UNION
SELECT DRName, [PV Appt Date] as [Start Date], "PV" as Facility from [2006 New Hires EHS NSLIJHS]
Where [PV Appt Date] is not null;

This gives me what you said it would, a normalized "table" where each appt date appears once, paired with the doctor and the facility. If a doctor was hired at all four facilities, his name appears four times:
DRName Start Date Facility
ACR 2/5/2008 GC
ACR 3/4/2008 LIJ
ACR 3/4/2008 NSUH
ACR 3/4/2008 PV

Now this person obviously got hired at the GC facility in February, so I need to count this record as a "GC Hire Employee Health" record. But how to do this?
And what if the dates were reversed, i.e. I had to divvy up initial hire dates between multiple facilities so that this one doesn't automatically get credited to the LIJ facility, etc.?

Thank you for the start!
Tom
 
Glad it helped! One suggestion, you don't have to have the aliases in the subsequent queries in the UNION, just the first one. In fact, in some other SQL flavors, that syntax doesn't work at all.

So you basically want a report that shows:
[tt]
Location Feb Mar Apr
GC
LIJ
NSUH
PV[/tt]

with the number of new hires in each month but only counting a dr for the first one, right?

Ok this should point you in the right direction. You'll need to take this query and see about converting it to a cross tab query. There's a wizard that will help with it, but it's not my strong point.

Code:
N.Facility, Month([N.Start Date]), Count(*)
FROM qryNormal N
INNER JOIN (SELECT DrName, Min([Start Date]) As [Start Date] FROM qryNormal) As A ON N.DrName = A.DrName and A.[Start Date] = N.[Start Date]
GROUP BY N.Facility, Month(N.[Start Date])



Leslie

In an open world there's no need for windows and gates
 
Thank you Lespaul (the other one is past 90 and still playing out, I believe)--

I am trying to make sense of your reply, and still have an error in my work up of your suggestion (I named the earlier result as [EHS Normalized]:

SELECT [EHS Normalized].Facility, Month([EHS Normalized].[Start Date]), Count(*)
FROM [EHS Normalized] INNER JOIN (SELECT [EHS Normalized].DrName, Min([Start Date]) As [Start Date] FROM [EHS Normalized] As A ON [EHS Normalized].DrName =
A.DrName and A.[Start Date] = [EHS Normalized].[Start Date])
GROUP BY [EHS Normalized].Facility, Month([EHS Normalized].[Start Date])
;

Still a syntax error here somewhere, but worse is that I only half-understand it!
 
Code:
SELECT D.Facility, Month(D.[Start Date]), Count(*)
FROM [EHS Normalized] AS D INNER JOIN (
SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName
) AS A ON D.DrName = A.DrName AND D.[Start Date] = A.MinDate
GROUP BY D.Facility, Month(D.[Start Date])

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
That was a big help.
With a slight modification, to give me different years (over a more than one-year period of months), I displayed the dates as follows:
Facility Date Total
GC 2/5/2008 3
GC 3/4/2008 2
GC 4/1/2008 4
LIJ 2/3/2006 1
LIJ 2/7/2006 30
LIJ 4/4/2006 8
NSUH 4/4/2006 15
This is just a sampling, of coure, but the sample highlights two more issues:
a) I need the total for a facility during a particular month of a given year, not, say, all in February of any year; and
b) I need a way to distribute counts of dates that appear for more than one facility: in the above example, 8 new hires at LIJ were in April 2006, and 15 for NSUH in April 2006. Now I know that there is a high probability that all or most of the 8 at one facility are double counted at the other. How can I exclude double counts, effectively "distincting" them?

in appreciation,
Tom
P.S. existing sql is:
SELECT D.Facility, ([D].[Start Date]) AS [Date], Count(*) AS Total
FROM [EHS Normalized] AS D INNER JOIN [SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName
]. AS A ON (D.[Start Date] = A.MinDate) AND (D.DRName = A.DrName)
GROUP BY D.Facility, ([D].[Start Date]);
 
by using this query in your join:
Code:
SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName
you are already getting rid of the duplicates for each dr. If you just run this query you will only get each dr one time with the first date that they joined a facility. for example, using your data:
[tt]
DRName Start Date Facility
ACR 2/5/2008 GC
ACR 3/4/2008 LIJ
ACR 3/4/2008 NSUH
ACR 3/4/2008 PV[/tt]

the above query will only return ACR at GC on 2/5/2008.

So when you take the results of that sub-query and join into the normalization query, you will only get the ONE record for dr ACR. Now, a caveat to that, if a dr starts for the first time at multiple locations, you may have an issue. If you have data like:
[tt]
DRName Start Date Facility
DCR 2/5/2008 GC
DCR 2/5/2008 LIJ
DCR 2/5/2008 NSUH
DCR 3/4/2008 PV[/tt]

well now the start date matches for 3 locations, which location would you want to pick in this situation?


So, to get the year info, try this:
Code:
SELECT D.Facility, Format("mm/yyyy", ([D].[Start Date])) AS [Start Month], Count(*) AS Total
FROM [EHS Normalized] AS D INNER JOIN [SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName
]. AS A ON (D.[Start Date] = A.MinDate) AND (D.DRName = A.DrName)
GROUP BY D.Facility, Format("mm/yyyy", ([D].[Start Date]));

Leslie
 
I have incorporated the format syntax, but it's not working for me:

sql of
SELECT D.Facility, Format("mm/yyyy", ([D].[Start Date])) AS [Date], Count(*) AS Total
FROM [EHS Normalized] AS D INNER JOIN (SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName)
AS A ON (D.[Start Date] = A.MinDate) AND (D.DRName = A.DrName)
GROUP BY D.Facility, Format("mm/yyyy", ([D].[Start Date]));

gives me
Facility Date Total
GC mm/yyyy 72
LIJ mm/yyyy 351
NSUH mm/yyyy 258
PV mm/yyyy 163

The code looks OK to me, but not the result!
And to be sure, I do have many instances where the earlier date is shared by more than one facility, so I need a technique to apportion them (add on a random amount of hours/seconds to identical day/year dates?).

Tnx
 
ok if there are records that look like :
[tt]DRName Start Date Facility
DCR 2/5/2008 GC
DCR 2/5/2008 LIJ
DCR 2/5/2008 NSUH
DCR 3/4/2008 PV[/tt]
then you'll need to change the query to this:
Code:
SELECT [b]MIN(D.Facility)[/b], Format("mm/yyyy", ([D].[Start Date])) AS [Date], Count(*) AS Total
FROM [EHS Normalized] AS D INNER JOIN (SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName)
 AS A ON (D.[Start Date] = A.MinDate) AND (D.DRName = A.DrName)
[b]GROUP BY Format("mm/yyyy", ([D].[Start Date]))[/b];

so I changed the SELECT and the GROUP BY...see if that gives you the correct numbers.

Leslie

 
Use this syntax of the Format function:
Format([date field], "mm/yyyy")


Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Pretty good, if not pretty!

However I still am double counting records such as what is above:
ok if there are records that look like :
DRName Start Date Facility
DCR 2/5/2008 GC
DCR 2/5/2008 LIJ
DCR 2/5/2008 NSUH
DCR 3/4/2008 PV

This provider has to be counted only one time, in one of those three facilities, and I don't care if it is at random.

Tnx
 
and this query with the MIN on Facility doesn't get the the correct info?

Code:
SELECT MIN(D.Facility), Format(A.MinDate , "mm/yyyy"), count(*)
FROM [EHS Normalized] D
INNER JOIN
(SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName) AS A ON D.DrName = A.DrName AND D.[Start Date] = A.MinDate
GROUP BY Format(A.MinDate , "mm/yyyy")
 
My original table has about 740 records, i.e. doctors, with one or more start dates at one or as many as four facilities. The query gives 844 as a count, so I believe the extra 144 are double-counts, or "cross-affiliated" individuals who began at two or more facilities on the same day, that day being earlier than any other start date they may also have. The exampe above is perfect to display this I believe. I'm still working on it!
 
have you checked the original UNION query and made sure that there aren't any weird records? I see that you are using IS NOT NULL...could there be records that contain a blank string and are being included in the [EHS Normalized]?

Sometimes you just have to look at the records the query is returning to spot the weirdness...as far as I can tell, it should be doing what you want.

What if you run this query:
Code:
SELECT *
FROM [EHS Normalized] D
INNER JOIN
(SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName) AS A ON D.DrName = A.DrName AND D.[Start Date] = A.MinDate

The records that you see when you do that are the ones that are being counted. Look through those records and see if you can find any weirdness!

Leslie
 
Thanks, Leslie
What those 104 semi-duplicate records are is not weirdness, but merely cases where a doctor was hired at two or more facilities on the same date, thus appearing in that query as:
D.DrName Start Date Facility A.DrName MinDate
AEMD 10/10/2006 LIJ AEMD 10/10/2006
AEMD 10/10/2006 NSUH AEMD 10/10/2006

I need to either flip a coin (add some random time/date snippet?), or ideally determine how many of each case (e.g., here LIJ/NSUH identical hire dates, but elsewhere perhaps LIJ/NSUH/GC identical hire dates) and split the difference. Perhaps this cannot be done directly, but only by inspection/eye-ball?
 
No, by adding MIN to the outside query you should only get one of those records. Maybe this (I change the source of the date and changed the MIN to a FIRST):
Code:
SELECT FIRST(D.Facility), Format(D.[Start Date], "mm/yyyy"), count(*)
FROM [EHS Normalized] D
INNER JOIN
(SELECT DrName, Min([Start Date]) As MinDate FROM [EHS Normalized] GROUP BY DrName) AS A ON D.DrName = A.DrName AND D.[Start Date] = A.MinDate
GROUP BY Format(D.[Start Date], "mm/yyyy")



Leslie

In an open world there's no need for windows and gates
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top