Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Ongoing 7-day sum or average

Status
Not open for further replies.

Jan Flikweert

Programmer
Mar 20, 2022
85
NL
Hi all,

Of course there are different ways to calculate a ongoing 7-day sum/average.

I never had a way to use a query. Now I found the way:
Code:
***First create a cursor with unique date value

SELECT distinct dt FROM riooldata WHERE EMPTY(dt)=.f. INTO CURSOR unique_data READWRITE

***Then create cursor with ongoin 7-days sum

SELECT unique_data.dt,SUM(riooldata.rnaml)/100000000000 AS sum_rnaml, 100 AS indx FROM covid19!riooldata;
[COLOR=#CC0000]JOIN unique_data ON riooldata.dt BETWEEN unique_data.dt-6 AND unique_data.dt[/color] WHERE unique_data.dt>=DATE(YEAR(DATE()),1,1) GROUP BY 1 ORDER BY 1 INTO CURSOR Sum_7days READWRITE

Kind regards,

Jan Flikweert
 
I think you just wanted to post this as a tip, right.

As you have the where clause
Code:
WHERE unique_data.dt>=DATE(YEAR(DATE()),1,1)
And the range
Code:
BETWEEN unique_data.dt-6 AND unique_data.dt

I'd only select those dates into unique_data and maybe call it unique_dates:
Code:
SELECT distinct dt FROM riooldata WHERE WHERE dt>=DATE(YEAR(DATE())-1,12,26) INTO CURSOR unique_dates READWRITE

Okay, you'd still want to sum data in week ranges that each end in this year. Which means the sums should be between any of the unique_dates and these dates +6 rather than -6, so the between clause has to change:

Code:
SELECT unique_data.dt,SUM(riooldata.rnaml)/100000000000 AS sum_rnaml, 100 AS indx FROM covid19!riooldata;
JOIN unique_data ON riooldata.dt BETWEEN unique_data.dt AND unique_data.dt[highlight #FCE94F]+[/highlight]6 GROUP BY 1 ORDER BY 1 INTO CURSOR Sum_7days READWRITE
One "error" remains, the ranges starting less than 6 days before the max dt in your data will not be 7 day sums and if you want that the maximum start of a week has to be max(dt)-6.

I wonder why you divide by 100000000000, an average would be divided by 7, but I think this has another reason than averaging. That could be corrected by first determining the max(rioooldata.dt) as maxdate and make maxdate-6 the maximum dt in unique_dates and maybe then also call it thisyearsweekrangestartdates.

You could also redefine the ranges have to start in this year and won't have to take data from the last 6 days of the previous year into account. And in the longer term, if you'd want to caluclate for past years the yearsweekrangestartdates data has to end on december 25th. So for a year nYear the year data range would be WHERE dt>=DATE(nYear-1,12,26) and dt<=DATE(nYear,12,25).

Chriss
 
Criss,

Indeed it is a tip.

I have a small facebook group about COVID19 in our city/province. Eery day I publish statistics. In that case there is no future. So the ongoing( better advancing) sum/average concerns in this case the last 7 days.

Of course there is the direct way of distinct date. But in this case I need a JOIN to calculate every record with the last 7 days.

And ofcourse the where clause can be refined.

The division by 100000000000 is because every record contains such a large value. So summing this makes no sense and causes an error.
This division by 100000000000 has nothing to do with the range of calculation.

That all does not matter.

The main part of the tip is the JOIN CLAUSE: [highlight #CC0000]JOIN unique_data ON riooldata.dt BETWEEN unique_data.dt-6 AND unique_data.dt[/highlight]

Kind regards,

Jan Flikweert
 
Indeed it is a tip.

Jan, you can convert your post into a tip by clicking the Edit button (in the bottom right corner). You can then set the "thread type" to "Helpful tip" (just above the subject line).

At least, I think this is true. I have not way of testing it other than by actually starting a new thread.

This is not all that important, and perhaps not worth worrying about. However, making it a tip rather than a question might help other people searching for this information.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I see, I'd still recommend the modifications I explained. But that join is a neat trick. (It doesn't highlight well in red, though).

And btw, if the high sum value is a concern, you would do SUM(riooldata.rnaml/100000000000), not SUM(riooldata.rnaml)/100000000000.

That it works to divide after summing proves the numeric range isn't a problem for the summing in itself. Even if rnaml is integer data type, VFP always calculates in the double float range.

I didn't expect data from 2023 in your table, but if you'd want to go back to year 2020 or 2021 you'd have a max date for that years.

Here's an example of data and how I'd work on it using the join:
Code:
CREATE CURSOR exampledata (dt Date, datavalue i)
FOR i=0 TO 13 
   INSERT INTO exampledata VALUES (DATE()-i,i+1)
ENDFOR

SELECT dt FROM exampledata WHERE dt<=DATE()-6 INTO Cursor startdates

SELECT startdates.dt,SUM(exampledata.datavalue) AS sum_datavalues FROM exampledata;
JOIN startdates ON exampledata.dt BETWEEN startdates.dt AND startdates.dt+6 ;
GROUP BY 1 ORDER BY 1;
INTO CURSOR Sum_7days READWRITE

In my case I don't need distinct in the query creating the startdates cursor, as all dates are unique in exampledata already.

If I'd not limit the start dates to DATE()-6 I'd get results for weeks starting later and not having data of a whole week. So the sums at the end would look like declining. They just become lower as they only sum 6 or less days. You can of course do that, but have to know these sums look rigged. While they are not literally rigges you could only make estimates for weeks that end in the future and for which you don't yet have data for all their days yet. So the max startdate for full week sums depends on which weeks you can fully cover.

Chriss
 
And just to extend this to the idea of averages. If you don't limit the dates you can also cover the case of the last 6 weeks in the Sum_7days result not covering full 7 days by averaging. And in that case don't just divide by 7 but by COUNT:

Code:
CREATE CURSOR exampledata (dt Date, datavalue i)
FOR i=0 TO 13 
   INSERT INTO exampledata VALUES (DATE()-i,i+1)
ENDFOR

SELECT dt FROM exampledata INTO Cursor startdates readwrite && not limited to only cover full weeks

SELECT startdates.dt,SUM(exampledata.datavalue)/COUNT(*) AS avg_datavalues1, AVG(exampledata.datavalue*1.0) as avg_datavalues2 FROM exampledata;
JOIN startdates ON exampledata.dt BETWEEN startdates.dt AND startdates.dt+6 ;
GROUP BY 1 ORDER BY 1;
INTO CURSOR Sum_7days READWRITE

And, indeed, we see that AVG() takes this into account, too.

I realized you didn't want to average your data, but it would make sense to make the last ranges, which don't cover full weeks, comparable to the full weeks before them. The last average will be for one day only, of course. All this could become more complicated if you have multiple records for the same date, though. I do rely on that not being the case.

To complete the explanation, I used the factor 1.0 when using AVG() as without that, the avg_datavalues2 indeed becomes integer rounded down (not up, as would be mathematically correct), even though I just said VFP always calculates in float. That's funny, as SUM() doesn't force the result type of integer. I get 4 decimal places in a browse of Sum_7days. Just another of the quirks in which SQL determines the result structure and the data types of its fields.

Chriss
 
Just one linguistic thing: Ongoing would be called rolling or running sum or total in English.

And in other SQL dialects you have window functions for this, which get rid of needing such a join but can take a window of data in the sense of a range, the weeks you define by the first column grouped by and joining all data that's within a weeks range, becomes a moving window with 7 days length. The way this works doesn't need a self join, the window will be reading from the one table in a sort order you need to define for the window functions to know what concepts like NEXT or PREVIOUS, and LAG or LEAD mean.

And before that was introduced in T-SQL (the MS SQL Server SQL dialect you had to use another concept: Partitions). None of that (PARTITION, LAG, LEAD) is available in VFP, even not in the most advanced VFP SQL engine, unfortunately. So your tip - I think - is the best way of doing that in VFP SQL regarding the windows.

There is one step of optimization possible in VFP9, as you can join a query you name, so you could get rid of generating unique_data or startdates:
Code:
SELECT startdates.dt,SUM(exampledata.datavalue)/COUNT(*) AS avg_datavalues1, AVG(exampledata.datavalue*1.0) as avg_datavalues2 ;
FROM exampledata;
JOIN (SELECT dt FROM exampledata) as startdates ON exampledata.dt BETWEEN startdates.dt AND startdates.dt+6 ;
GROUP BY 1 ORDER BY 1;
INTO CURSOR Avg_7days READWRITE
(Corrected result name to Avg_7days).

It does not become more effective in the end, but it becomes one query which always is a nice thing to have to be able to define a view without the need to make it a view on a view.

Here's a nice article that does cover partitions but not window functions: I actually looked for something that covers that, too, but mention it, as it hints on yet another nice solution that is a table designed to compute such rolling totals easier. The paragraph "Designing a Table to Track a Rolling Sum" is about that, but it's NOT giving it away, just says "We will let you mull over that one!"

Chriss
 
Chriss,

Technical it is clear.

Statistical a remark.

I always use a given number of days. Ending today and divide that by one is statistical not good.

There are several reasons possible. In this case they take measurements during a week spread in whole our country. Taking one day is not representative and must be avoided.

Kind regards,

Jan Flikweert
 
Just one linguistic thing: Ongoing would be called rolling or running sum or total in English.

Actually, the most common term is moving, as in "moving 7-day average". And we are seeing a lot of them nowadays, as health authorities around the world publish their latest Covid data.

I recall once working with exponentially-weighted moving averages, where the more recent figures have a higher weight than earlier ones. They are used in forecasting, I believe, but I don't remember the details.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike, Chriss,

Indeed there are several kinds of averages possible. And like Chriss showed in his example you can work with the first day of seven.

In case of health care there are several points of view. How long does it take before somone gets in hospital, some one dies, how long stays it in the sewage system. On which day test people does the public health care work en when is the sewage system measured.

Due to the lingual case hard to dicuss.

Kind regards,

Jan Flikweert
 
Mike, thank's for the "moving" alternative.

In regard of covid statistics I know that most recent official numbers often are subject to be corrected and so you only build statistics up to a point where you are sure all reported numbers have been account for. Which would not only be a reason to never consider the last date a single day week average, it would also be yet another reason to limit the max end date of the weeks to only cover full weeks, and only put data into the tables you use statistically, that's not subject to change anymore, ie only data for days that has been fixed as final data.

Sites like I used also to monitor Covid are sometimes explicitly stating such "subject to correction" results.

Depending on the topic that can have different rules. Of course I only talked about the mere technical perspective.

Chriss
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top