Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Other fields from MAX() record don't match 1

Status
Not open for further replies.

BongoB1

Programmer
Jul 26, 2001
26
0
0
US
I am trying to run a single SELECT statement pulling data from three tables (PERSONAL, EMPLOYEE, SALARY) to return each employee's name, address, start date, etc. and their most recent salary amount. There will be only one personal and employee record for each employee though they will have numerous historical salary records. I figured that if I grouped by SSN (key field across all of the tables) and used the MAX(SALARY.PAYDATE) in the SELECT, I would get data from the most recent (highest paydate) SALARY record. The results seemed to be a mixed bag. All of the records reflect the correct (most recent) pay date values. However, the salary amounts are sometimes correct, but usually not. In some cases, it seems to be a salary amount from earlier in the individual employee's history.

I know that I could do this with a couple of SELECT statement, etc, but I thought I could do this in one step. Will the MAX() function work the way I want or does it only return the correct value for the field to which it has been applied and anything else from that table might or might not be from that record?

Thanks

Bob
 
What is your VFP version?

Here is the idea:

select Emp.*, Sal.*, Per.* from Employee Emp ;
inner join Salary Sal on Emp.SSN = Sal.SSN ;
inner join Personal Per on Emp.SSN = Per.SSN ;
where Sal.PK IN ;
(SELECT CTOBIN(RIGHT(MAX(DTOS(PayDate)+BINTOC(PK)),4)) ;
FROM Salary GROUP BY SSN)
-----------------------------------------------
The idea is taken from (in Russian)
 
Sorry, I'm using VFP 6.0 so I don't think this is going to work. I can still get the results I need by doing a couple of SELECTS, but I just thought the MAX() function would work.

Thanks

Bob
 
Yes, this is going to work in VFP6 as well. Instead of BINTOC/CTOBIN you may use PADR and INT(VAL( functions.

Instead of INNER JOIN you can move joins into where expression.
 
Let me explain why MAX() doesn't work here, so you won't run into this again. When a query includes GROUP BY, VFP processes it in two stages. In stage 1, all the joins and the WHERE clause are executed, creating an intermediate result; in your case, it would have one record for each combination of employee data, personal data and salary history data.

After that, VFP takes all the records that exactly match in the list of fields specified in GROUP BY and consolidates them into a single record. As it does so, it applies any of the aggregate functions specified (CNT(), SUM(), AVG(), MIN() or MAX()). As long as every field in the field list is either included in the GROUP BY list or uses one of those functions, you get exactly what you expect.

In VFP 7 and earlier, however, you're allowed to list additional fields. When you do so, VFP chooses the value "randomly" from among the records in the group. It turns out that randomly here means from the last record in the group, but that's an arbitrary choice.

In VFP 8 and later, such queries are prohibited to prevent exactly the problem you're running into.

You can solve your problem with a correlated subquery, along these lines:
Code:
SELECT <the fields you need> ;
  FROM Personal ;
    JOIN Employee ;
      ON Personal.SSN = Employee.SSN ;
    JOIN Salary ;
      ON Employee.SSN = Salary.SNN ;
  WHERE Salary.SalDate = ;
    (SELECT MAX(SalDate) FROM Salary SalB ;
       WHERE SalB.SSN = Employee.Salary) ;
  INTO CURSOR MaxSalary
Tamar



 
Tamar,

One word of caution with the above solution - if you have multiple records with the same MAX date it will bring all these records.

I actually like the select I posted. This idea originates from Igor Korolev and in my opinion it's an ingenious one.
 
I think there's a mistake in your query. You need to have matching expressions on both sides of the WHERE, but something like that will certainly work and I've taught that solution many times. (In fact, I think it's in one of the white papers posted on my website.)

Good point on my solution.

Tamar
 
I took a quick look on the posted query again and I don't see an error, but I admit I haven't tried, wrote it quickly from the top of my head.

Would you please give a link to your white paper?

Thanks.
 
Hello ilyad, Hello Tamar,

Code:
CTOBIN(RIGHT(MAX(DTOS(PayDate)+BINTOC(PK)),4))

This is a tricky expression, as it finally ends up as a PK only (CTOBIN(BINTOC(PK))==PK), but the inner MAX() in that expression causes it to be the max PK of the record(s) with the max PayDate. I like it as an ingenious way to get at the PK of the record with Max(somefield) without a further subselect.

What's bad from the theretical point of view is, that you assume the max PK is the latest record. It's giving some meaning to the PK, which is bad. Normally you should adress such a problem by making it a PayDateTime or adding an additional field PayNo. It could be the case several payments are not input in chronoligcal order. But that's a very theoretical problem.

You are right, that Tamar's solution will give you several records, if there are several records with the same Paydate, but that's just not the normal case especially for salaries. Tamar's select has just the error in the subselects Where clause, comparing SSN with Salary. There might be very few people having such a high salary ;-)

Bye, Olaf.
 
Olaf,

:) Yes, it would be nice to have Salary = SSN :)
 
Thanks for catching my typo, Olaf.

Ilya-it looks like I don't actually have that particular example on my site, though I talk about the problem in this paper:
Guess I ought to post some of my older papers. All I need is time to organize them.

Tamar
 
Tamar,

Thanks a lot for the link to your paper. I'll study it in details tomorrow, but I have to admit, I didn't know about nested joins. I always use JOIN with ON condition following right away.
 
Thanks to everyone for their input. It has certainly given me some new ways to think about my future queries. This is one of the great benefits from Tek-Tips, different points of view and experience.

Thanks again,

Bob
 
Ilya - I almost always use the sequential notation myself. I find it easier to read and write. The only time I use the nested is when the data is really hierarchical and even then, I'm more likely to use the sequential form.

Hope there are other things for you to learn in the paper.

Tamar
 
I certainly will, thanks a lot again. Haven't time today to look very closely, will read on Sunday.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top