Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help with SQL Max() command 2

Status
Not open for further replies.

donnitadk

Technical User
Jun 30, 2010
2
US
I have a database that contains contractor's licenses. Each license can have multiple renewals but for reporting purposes I am only interested in the current (newest) renewal for each license. The very simplified query below gives me the data I am looking for the license number and the newest annual renewal number for that record

SELECT license.licensenumber, MAX(licenseannualrenewal.licenseannualnumber)
FROM logosdb.dbo.license INNER JOIN logosdb.dbo.licenseannualrenewal ON license.licenseid = licenseannualrenewal.licenseid
GROUP BY license.licensenumber

My problem is I need to include other fields and other tables but when I add them it starts pulling in the multiple annual renewals again instead of just the newest renewal. How do I make sure it only looks at the MAX(licenseannualrenewal.licenseannualnumber) no matter what else I select? THANKS!
 

Code:
SELECT license.*, MaxStuff.MaxAnnualNumber
FROM   logosdb.dbo.license 
       INNER JOIN 
         (
         Select licenseid,
                MAX(licenseannualrenewal.licenseannualnumber) As MaxAnnualNumber
         From   logosdb.dbo.licenseannualrenewal
         Group By licenseId
         ) As MaxStuff
         ON license.licenseid = MaxStuff.licenseid


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George I Like your solution, but being a newbe to SQL. Why did you chose to keep the INNER JOIN?

could it have been done like this?

Code:
SELECT 
	License.*
FROM logosdb.dbo.license
	Where In
		(
		Select MAX(LicenseAnnualNumber) From LicenseAnnualRenewal
		Where  license.licenseid = licenseannualrenewal.licenseid
		)

I only ask because you have a LOT more experience than I and your examples/solutions have been a great asset!!

Thanks, John

Thanks

John Fuhrman
 
John,

Your query is not correct. There are syntax errors. That being said, it could have been written like this:

Code:
SELECT 
    License.*,
    (Select MAX(LicenseAnnualNumber) 
     From LicenseAnnualRenewal
     Where  license.licenseid = licenseannualrenewal.licenseid
     ) As MaxLicenseNumber
FROM logosdb.dbo.license

This is the classic subquery approach. As I mentioned in another thread earlier this week, the 2 queries are not the same. There are slight diffferences.

The subquery method:
1. Will return exactly one row in the output for each row in the main table (license).
2. Will error if the subquery returns multiple rows. In this case, it isn't possible because we have a max without a group by, so you are guaranteed to get either one row, or no rows (if the child table doesn't have any matches on licenseid).

The derived table method:
1. Uses a Join: Theoretically, joins can cause multiple rows in the output. But in this case, that cannot happen because we have a scalar value (licenseid) joining to the derived table, so again, there will be 0 or 1 row that matches each row in the parent table.
2. Since this uses a join, you may get less rows in the output than there are rows in the parent table. This would only happen if there were no rows in the child table for the join condition (licenseid). This behavior can be changed by using a left join to the derived table, is which case you would get NULL returned. This would be the same for both queries.

I created 2 similar queries, but using tables in my database so that I could test this. My main table has 9,482 rows and the child table has 287,305. The execution plans are the same. The IO statistics are the same. Everything is the same.

Code:
Select Student.*,
       (Select Max(CalendarDate) 
        From Calendar 
        Where Student.CurrentSchoolId = Calendar.SchoolId
       ) As LastDayOfSchool
From   Student

Select Student.*,
       LastDayOfSchool
From   Student
       Left Join (
         Select SchoolId, 
                Max(CalendarDate) As LastDayOfSchool
         From   Calendar
         Group By SchoolId
         ) As LastDay
       On Student.CurrentSchoolId = LastDay.SchoolId

Now, to answer your question...

There are several reasons why I prefer the derived table method:

1. It's what I am used to based on my experiences with older SQL database engines. I *think* that earlier versions of SQL Server would NOT create the same execution plan, and the derived table method was faster.

2. In my opinion, it is easier to tune the query. Look at the query I posted (from my database). You can see that it uses the schoolid (in the select and group by clauses) and it also uses the CalendarDate column (in the max aggregate). If I think about this single query, it is easy to optimize it for performance by adding a multi-column index on SchoolId and CalendarDate.

3. In my opinion, it is easier to write the query. You see, Usually, when I write a query like this, I start from the inside out. In this case, I wrote the query that was to become the derived table. Once I had it written, i squished the F5 button and made sure that it ran without any errors. I then write the outer query, which was also relatively simple (for me) and then tested again.

When you use the subquery method, you cannot run it by itself because the where clause is forming the join to the outer query. This means I need to write the entire query before I can test it. For a (relatively) simple 2 table query, this isn't so bad, but it's not uncommon for me to have queries that involve a dozen tables, which makes things a lot more complicated.

I'm a very big fan of the KISS principle. In case you've never heard of it ([!]K[/!]eep [!]I[/!]t [!]S[/!]imple, [!]S[/!]tupid). In my opinion, the derived table method is simpler to understand and allows me to build up the query in smaller chunks which also allows me to keep it simpler.

I should also mention that the query could just as easily have been written using a common table expression approach. In fact, it would be just like derived table method, but the syntax would be a little different.

Code:
;with LastDay As
(
    Select   SchoolId, Max(CalendarDate) As LastDayOfSchool
    From     Calendar
    Group By SchoolId
)
Select Student.*,
       LastDay.LastDayOfSchool
From   Student
       Left Join LastDay
         On Student.CurrentSchoolId = LastDay.SchoolId

Notice that the only difference between this query and the derived table query is the syntax. Most notably, the query for the derived table was removed from the outer query and pushed to the top. Many people prefer this method. I don't, but mostly because I got really used to the derived table method. Also note that the execution plan is the same, the IO is the same, and performance is the same.

All things considered, I prefer to stick with the method that I am used to, and one that is easier for me to write the queries. May sound lame, but it works for me.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Other threads from earlier this week that talk about the subquery. Some interesting reading:

thread183-1625514
thread183-1625535

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks so much for the detailed explanation George!!

I will have to do some experimenting with one of my DB's and learn how to manipulate derived table queries. Right now I am "upsizing" many Access DB's so I do not have any databases with over a dozen tables, but I do plan on merging as many of these into a single DB with a common UI so learning (much) more about DB normalization/design and the "tricks" to properly abstract to underlying tables so that programming is easier is one of my primary goals this year.

This was the query I had been messing with when I posted the question to you.

And I sincerly hope this discussion helps the original poster (donnitadk) as well.

Code:
Select  top 100 percent
	Tracking_ID, UPS_Prefix, UPS_AcctNum, UPS_ShipmentType, 
	UPS_ParselNumber, BoxNumber, FileNumPrefix, FileNumber, 
	TrackingDate, Reship, BoxNumberOriginal, EmployeeID, MachineName
From dbo.View_Parsed
	Where CONVERT( varchar(10), TrackingDate,101) In 
		(Select Max(CONVERT( varchar(10), TrackingDate,101))
		 from dbo.View_Parsed as b
		 Where dbo.View_Parsed.FileNumber = b.FileNumber)
Order by TrackingDate desc
[blue]
The view is for seperating the UPS tracking number into its verious componants and peeling the prefix off our internal file numbers for validation and reporting.[/blue]

This does return the results expected but it slower than what I could use in production.

The main table has 6,192,262 rows.




Thanks

John Fuhrman
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top