Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

GROUP BY clause is missing or invalid

Status
Not open for further replies.

dantheinfoman

Programmer
May 5, 2015
131
US
Screen_Shot_06-16-15_at_04.02_PM_f4axrr.png


Hi All,

I've scoured the web, but everything is pretty vague about fixing these old SQL statements from VFP 7 to the new strict/compliant VFP 9 guidelines.

Here's one such example (of many many statements I'll have to fix):
Code:
SET ENGINEBEHAVIOR 70
	DO SYCOMBAN WITH 'tktimcrd','TKTIMCRD','TK_ARCH'        &&02/26/01vb to make it work w/SYCOMBAN
	SELECT tktimcrd.*, SUM(tktimcrd.HOURS) AS SUM_HOURS,SUM(tktimcrd.flatamount) AS SUM_flat;
		FROM tktimcrd ;
		WHER (work_date BETWEEN m.D_BeginDate AND m.D_EndDate) ;
		GROUP BY tktimcrd.Employee, tktimcrd.JOB, tktimcrd.work_date, tktimcrd.pay_type ;
		ORDER BY tktimcrd.Employee, tktimcrd.JOB, tktimcrd.work_date, tktimcrd.pay_type ;
		INTO CURSOR QUERY
	SELECT QUERY
	SET RELA OFF INTO PY_EMPLO
	SET RELATION TO Employee INTO PY_EMPLO ADDI
	SET RELA OFF INTO SYJOBCST
	SET RELATION TO JOB INTO SYJOBCST ADDI
	m.JoinFile = .T.                          &&VB
	SET ENGINEBEHAVIOR 90

I've read that all the fields that are non-aggregate (I think that means all fields that are not calculations, such as SUM, COUNT etc) must be re-listed on the GROUP BY statement.

Does that mean I should just say GROUP BY tktimcrd.* and get it over with? What happens if I remove the Group By clause completely? I don't really understand what the GROUP BY statement does, especially if all fields must be mentioned in the group, it seems like they don't do much of anything.

Thanks!
Dan
 
That's the 'strict' part about VFP9. If you look at the VFP help, "Columns referenced in the SQL SELECT statement list, except for aggregate expressions, must be included in the GROUP BY clause. "
So you can't GROUP BY tktimcrd.* since you need to name the fields in the order you want them grouped.
And yes, you're right about the non-aggregate part. You can't group by groups, which is what an aggregate function does.
A group basically groups items together. Like for instance, all the cities for a state. All time cards for an employee, etc. In your example above, if you don't use a group you will get the total hours for each record (each time card) instead of total hours for the employee.


-Dave Summers-
[cheers]
Even more Fox stuff at:
 
Hi Dave,

So do I need to look up every field in the 'tktimcrd' table and list them all? That would be a massive GROUP BY clause. It sounds like you're saying I could tell the system to group by employee, but it also sounds like you're saying I should tell the system to group by employee,hours,time_in,time_out,city,state,zip,department . . .etc until every field is exhausted. I'm confused but I sure appreciate the info. Can you expound further or correct my SQL statement as an example?

Thanks, sorry I'm not getting it just yet.

Dan
 
Putting more fields in the group by solves the technical problem of the group by requirements, but it creates smaller groups! It creates groups you might not want!
Think bout extremes: If you group by all fields, then every single record becomes it's own group, unless you have double records. That's not what you want, do you?

The normal solution is to take out fields from the result, because they are the culprit, not too less fields you group by.

Let's take it from a small example, a table banktransactions:

[pre]accountno transactionamount transactiondate
000001234 1100 02/01/2015
000001234 -50 02/11/2015
000001234 -150 02/21/2015
000001234 -25 02/26/2015[/pre]

Code:
Select accountno, Min(transactionamount), transactiondate Group By accountno
This would error, because transactiondate is not aggregated.

Which date should be taken from these three dates? You'll say the date of the Min(transactionamount). Seems the natural choice of what "makes sense", but that's not how things work.
All these records are one group and the result has to have one record per group, so you need to specify a rule for the row to pick, and the "rules" are the aggreagtion functions. Sum(transactiondate) won't work, you can't sum dates. You could pick min(), max(), but none of them corrresponds to the row we'd want. You can't solve this with a simple group by, so you have to delay the solution of that part of the problem to later, ie a consecutive query or subquery.

Foxpro 7 still allowed this query and picked some arbitrary row value. Since you SET ENGINEBEHAVIOUR 70 there must be something really out of whack with your query, as it still gives that error with the more forgiving version of the older query engine.

So now let's see what works.

1. Add the transactiondate to the group by:
Code:
Select accountno, Min(transactionamount), transactiondate Group By accountno,transactiondate

This runs and has all records as result. Why? What happend? Well, a group now isn't an account, a group is an account at each date. As each date is different, every record is its own group. If you want the Min(transactionamount) independent of any date, then you can't group by transactiondate, you have to skip the transactiondate from the result and only take accountno and Min(transactionamount):

2. Remove the transactiondate from the field list:
Code:
Select accountno, Min(transactionamount) Group By accountno

This runs and has one record with -150. To get the date you now query the transactiondate where transactionamount=-150 and get the 02/21/2015.

In your case you can only query the fields you group by besides the sums:
[pre]SELECT tktimcrd.Employee, tktimcrd.JOB, tktimcrd.work_date, tktimcrd.pay_type, SUM(tktimcrd.HOURS) AS SUM_HOURS,SUM(tktimcrd.flatamount) AS SUM_flat...[/pre]

Bye, Olaf.




 
To give a quick summary:

If a SELECT has a GROUP BY clause, then the only columns that can appear in the result set are aggregate functions and fields referenced in the GROUP BY clause. An aggregate function is one based on SUM(), COUNT(), MIN(), MAX(), etc.

Or, as the VFP Help file puts it:

VFP Help said:
The GROUP BY clause specifies one or more columns used to group rows returned by the query. Columns referenced in the SQL SELECT statement list, except for aggregate expressions, must be included in the GROUP BY clause.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Well, and the passage "must be included in the GROUP BY clause" leads people to think the easy solution is to extend the group by clause. In most any case you won't want to do that to get more groups. That's why I took the time to give an example and show what rather to do.

Bye, Olaf.
 
Wow,

Those explanations are top notch. I thank you Olaf, Mike and Dave. Unfortunately, for now, my folks have opted to use SET ENGINEBEHAVIOR 70 before these statements for the time being.

Does that detract from any speed the reports/macros could have or make it erroneous? If they're okay with the results of these queries through the years and consider them accurate, is there anything wrong with using SET ENGINEBEHAVIOR 70 for any reports/macros that have the GROUP BY error, or will fixing them give me more speed (especially with big tables)??

Thanks again you brilliant, brilliant people!!

Dan
 
I strongly advice against using SET ENGINEBEHAVIOR 70 in your code. The reason is that although you may get your select statements to run, you may very well end up with the wrong data!!!!
 
SET ENGINEBEHAVIOR 70 does not detract from the speed, not does it necessarily lead to incorrect results. What it does is to relax the requirement for the SQL code to conform to standard behaviour. It is generally not recommended, although in some limited cases it could be a good short-term fix, provided you are aware of the consequences.

This is particularly relevant to a GROUP BY query. To give an example, suppose you wanted to know the total order amount per customer. You want the result set to show, in each case, the customer's ID and his name. You could do this:

[tt]SELECT Customer_ID, Customer_Name, SUM(Order_Amt) FROM Orders GROUP BY Customer_ID[/tt]

Now, that does not conform to the rule that says the columns in the result set must be either an aggregate function or the field in the GROUP BY. In this case, Customer_Name doesn't meet that requirement. Nevertheless, the query will give the correct result. So, to make it run, you would do SET ENGINEBEHAVIOR 70 - although that's really a kludge.

In other cases, the consequences are more serious. For example, consider this query:

[tt]SELECT Customer_ID, Country, SUM(Order_Amt) FROM Orders GROUP BY Country[/tt]

With SET ENGINEBEHAVIOR 70, that would be a legal statement, but the results would be at best misleading. You would get the correct orders by country, but what would you get for the Customer ID? In fact, it would be an arbitrary customer ID for the country in question, which is arguably a valid result but a meaningless one. There are many other examples where the results are even more misleading.

Does that help at all?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I agree with the others on the SET ENGINEBEHAVIOR 70 scenario. Be sure and run some tests, check your results manually, before relying on the queries. I have this in use in an app right now forcing some results I need, but you will definitely get bogus or misleading results with GROUP BY unless you are very careful.


-Dave Summers-
[cheers]
Even more Fox stuff at:
 
A general rule of thumb for when it's safe to use SET ENGINEBEHAVIOR 70 in a grouped query is the following:

If the query involves parent/child tables, and the aggregate functions are used only on fields of the child table, and the fields that are neither grouped nor use aggregates are from the parent table, it should be safe. For example, extending Mike's example, it's safe to do something like:

Code:
SELECT CustomerID, CustomerName, SUM(Price*Qty) ;
  FROM Customer ;
    JOIN Order ;
      ON Customer.CustomerID = Order.CustomerID

You can extend that rule to queries involving a single table where the ungrouped, unaggregated fields are uniquely determined by other fields that listed in the GROUP BY clause. That's the case for Mike's first example. It makes the assumption that you've brought the customer name into the Orders record, even though normalization says you shouldn't.

Finally, the biggest risk of data being wrong with ENGINEBEHAVIOR 70 is when you're using MIN() and MAX(). With the other aggregate functions, you simply get an arbitrary value for the ungrouped, unaggregated fields. (In VFP, it's not actually arbitrary. It always comes from the last record in the group.) However, when you're using MIN() or MAX(), there's an implication that the value comes from the record that provides the minimum or maximum value; this is very misleading. I've written about this case:
Tamar
Tamar
 
Quries can be well tested in time, if the app comes from an earlier VFP version and worked ok. But if you have the time and budget, better correct what you inherited and never add new queries with old engine behaviour. The qzestion comes up, if things really went right or were just unimportant or ignored or overlooked.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top