Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Brain-twister with aggregates 1

Status
Not open for further replies.

OsakaWebbie

Programmer
Feb 11, 2003
628
JP
Can someone help me figure out how to do this without it being really slow? I have a table "attendance" to show what days a person is present at something, with these fields:
PersonID -> points to person table
EventID -> points to event table
AttendDate: date of attendance
I need to do a series of queries to find out how many people had their first date of attendance in each of several months (as well as various other criteria involving joins to other tables, but none of those are aggregate-related).

This query is one of a series of working ones I had previously (before I had to worry about the "first" limitation):
Code:
SELECT COUNT(DISTINCT person.PersonID) FROM person
INNER JOIN percat ON person.PersonID=percat.PersonID
INNER JOIN attendance ON person.PersonID=attendance.PersonID
WHERE CategoryID IN (6,53)
AND EventID=2
AND AttendDate >= CONCAT(DATE_FORMAT(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+9:00') - INTERVAL 1 MONTH,'%Y-%m-'),'01')
AND AttendDate <= LAST_DAY(DATE(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+9:00')) - INTERVAL 1 MONTH)
But now I need to limit the results to just the first date being in the target month. Adding a MIN function around the two instances of AttendDate is an "invalid use of group function", and adding "GROUP BY EventID" doesn't help - I really don't know how to get the MIN(AttendDate) and the COUNT(PersonID) to exist in the same query. The only way I know how to do it is this:
Code:
SELECT COUNT(DISTINCT person.PersonID) FROM person
INNER JOIN percat ON person.PersonID=percat.PersonID
WHERE CategoryID IN (6,53)
AND person.PersonID IN (
  SELECT attendance.PersonID FROM attendance
  WHERE EventID=2
  GROUP BY attendance.PersonID
  HAVING MIN(AttendDate) >= CONCAT(DATE_FORMAT(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+9:00') - INTERVAL 1 MONTH,'%Y-%m-'),'01')
  AND MIN(AttendDate) <= LAST_DAY(DATE(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+9:00')) - INTERVAL 1 MONTH)
)
But I'm concerned that this will be slow - in a single 4x8 chart I'm building that will be shown every time my boss logs in, this kind of query will be performed for every cell (eight different months and four variations in the other criteria like the JOIN and EventID), resulting in 32 queries. And the number of attendance records to sift through is 80,000 and growing.
 
This might be quicker

Code:
SELECT COUNT(DISTINCT person.PersonID) 
FROM person
INNER JOIN percat ON person.PersonID=percat.PersonID and CategoryID IN (6,53)
inner join (
	SELECT attendance.PersonID FROM attendance  
	WHERE EventID=2  
	GROUP BY attendance.PersonID  
	HAVING MIN(AttendDate) >= CONCAT(DATE_FORMAT(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+9:00') - INTERVAL 1 MONTH,'%Y-%m-'),'01')  
	AND MIN(AttendDate) <= LAST_DAY(DATE(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+9:00')) - INTERVAL 1 MONTH)
	) E on person.PersonID = e.PersonID

Ian
 
Wow, what a difference! I had a row of eight queries to test, and my way, each query took about 8-10 seconds. Your way, the whole row filled so fast I didn't get a chance to blink!

If you have extra energy, if you can tell me why it's so different even though both use subqueries, I'm all ears. If I understand the mechanism, I might be able to clean up some other queries that are similarly slow.
 
I am not an expert but I think your query will have to return all results for the IN() query for every row of data.

If you join it as a subquery it only executes the query for that particular record.

Ian
 
Okay, thanks - I'll try to figure out how to apply that thought in other (differently structured) queries in my code. Ciao!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top