Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Array calc using count of distinct events in denominator

Status
Not open for further replies.

Sideline

Technical User
Aug 15, 2002
30
NZ
Dear All,

This has me stumped and I would very much appreciate any suggestions.

Dataset is 220K rows. An small subset follows.

Using an array calc with criterions Day = 1 and Hour = 1 summed on Elapsed Time returns 81 with an average of 11.57 for 7 events of which there are 5 distinct dates.

The function =SUM(IF(D2:D14<>"",1/COUNTIF(D2:D14,D2:D14))) returns the number of distinct dates in the dataset ie 7.

Is it possible to combine these two methods to create an average of elapsed time by day and hour using a denominator of distinct dates for only that combination of dimensions.

I have attempted to calculate an item in a pivot table using a different denominator without success.

many thanks

day Hour Date Elapsed Time
1 0 5/03/2009 25
1 0 5/03/2009 24
1 1 5/03/2009 23
1 1 7/03/2009 11
1 0 7/03/2009 5
1 1 8/03/2009 2
1 1 8/03/2009 8
1 1 9/03/2009 4
1 1 9/03/2009 17
1 0 9/03/2009 22
1 1 10/03/2009 16
1 2 13/03/2009 12
2 1 14/03/2009 15


 
If I understood well the problem, you need to work with sum of elapsed times for given date, day and hour.
In this case I would use pivot table with external data (anyway, 220k of rows is quite a lot for array calculations), in the underlying query it is possible to sum times.

combo
 


I do not understand your data.

How is Day, Hour and Date related, especially the LAST row of your example.

What RESULTS do you expect to get from the given example?

Pleas answer both questions clearly, concisely and completely.

Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
SkipVought,

thanks for the quick response.

each row in the actual dataset has a date time stamp which is converted using the weekday and hour functions. Hence the data set has many event instances for each weekday (1- 7) and each hour (0 - 23). in the hypothetical data set above the last event took place on a Tuesday between 1 and 2 AM for the hypothetical date of 14/03/09. This event last 15 units of time.

The outcome I am looking for will be an 24 (hour) by 7(weekday) array with an average elapsed time using a denominator of distinct dates for only that combination of dimensions.

So from the data set above, be it very limited sample, the combination of dimensions weekday 1, hour 1, will give a result of 16.2 (81 (total elapsed time) divided by 5 (distinct dates for only that combination of dimensions)).

trust that clearly answers your questions.
 


So from the data set above, be it very limited sample, the combination of dimensions weekday 1, hour 1, will give a result of 16.2 (81 (total elapsed time) divided by 5 (distinct dates for only that combination of dimensions)).
I see 7 distince dates, the average being 11.57.

What am I missing?
My Pivot...
[tt]
Average of Elapsed Time Hour
day 0 1 2
1 19 11.57 12
2 15.00
[/tt]


Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
SkipVought,

you are correct if the denominator is the number of dates for that combination of hour and weekday; which is 7.

However for the example above, day 1 hour 1, there are 7 records, but only 5 distinct dates.

There are two records for each of 08/03/09 and 09/03/09. Thus the denominator is 5.

thx


 



Then you FIRST need to Summarize your data to the date level.

Then do the Pivot.

I'd use MS Query. faq68-5829.

Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
thx skipvought,

I have spent too much time in the forrest.

Aggregating the data by hour, weekday and date would solve the problem. Will do in the database and let you know how I get on.

cheers
 
One last comment.

It would be cool to be able to do this within the workbook without having two distinct datasets which would blow the file size out.

thx
 




You can query your database in Excel using Data > Import External Data > New database query...

Once you insert a QueryTable, all you have to do is Refresh.

Oh, yes you need an ODBC Driver configured for your database.

Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Skipcought,

all of that is already in place for the original dataset.

However, one limitation of running an aggregation at the database level is not having the capacity to modify the denominator to include other dimensions from the dataset.

ie from the example dataset below Weekday = 1, Hour = 1, there are 7 events with 4 distinct dates. Adding in the dimension Class = A gives 5 events with 3 distinct dates.

This could be run using a different aggregation at in the db but if I wish to modify the denominator, there are approx 15 dimensions which could be used, it would be nice to be able to do this in the workbook instead of have change the database.

One solution is to use offsets in combination with =SUM(IF(D2:D14<>"",1/COUNTIF(D2:D14,D2:D14))), sorting the dataset by the dimensions used in the denominator.

Or to write a routine to loop through the whole dataset producing the required array.

Doing it at the worksheet level would seem to be the easiest solution. But life wasn't meant to be easy.

Weekday Hour Date Elapsed Time Class
1 0 5/03/2009 25 A
1 0 5/03/2009 24 A
1 1 5/03/2009 23 A
1 1 7/03/2009 11 A
1 0 7/03/2009 5 B
1 1 8/03/2009 2 A
1 1 8/03/2009 8 A
1 1 8/03/2009 4 A
1 1 8/03/2009 17 B
1 0 9/03/2009 22 A
1 1 10/03/2009 16 B
1 2 13/03/2009 12 A
2 1 14/03/2009 15 B
 


I don't believe that you understand. It is NOT all done in your database. You need further aggregations...
[tt]
Select Weekday, Hour, Date, Class, SUM([Elapsed Time]) as ET
From [YourDatabase]
Group by Weekday, Hour, Date, Class
[/tt]
Furthermore, you could probably do a Transform (crosstab) and skip the PivotTable step...
[tt]
Transform Sum(ET)/Count(*)

Select Weekday, Class

From
(
Select Weekday, Hour, Date, Class, SUM([Elapsed Time]) as ET
From [YourDatabase]
Group by Weekday, Hour, Date, Class

)
Group By Weekday, Class

Pivot Hour
[/tt]
My test results on transform
[tt]
Weekday Class 0 1 2
1 A 35.5 16 12
1 B 5 16.5
2 B 15
[/tt]

Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 



sorry, reposting my results...
[tt]
Weekday Class 0 1 2
1 A 35.5 16 12
1 B 5 16.5
2 B 15
[/tt]


Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Skipvought,

apologies for mis-spelling your nick. I tried to use your suggested sql without success. The sub-query would not work.

The good news is another forum suggested the use of

{= SUM(IF(FREQUENCY(IF((A2:A14=1)*(B2:B14=1),C2:C14),C2:C14)>0,1))}

This allows for a change in the criteria used to include other dimensions.

But I would still like to make this work in access.

TRANSFORM Sum(test_Data.[Elapsed_Time])/Count(*) AS [SumOfElapsed Time]
SELECT test_Data.Hour, test_Data.Class
FROM test_Data
GROUP BY test_Data.Hour, test_Data.Class
PIVOT test_Data.Weekday;

returns

Weekday Hour Date Class SumOfElapsed Time
1 0 5/03/2009 A 49
1 0 7/03/2009 B 5
1 0 9/03/2009 A 22
1 1 5/03/2009 A 23
1 1 7/03/2009 A 11
1 1 8/03/2009 A 14
1 1 8/03/2009 B 17
1 1 10/03/2009 B 16
1 2 13/03/2009 A 12
2 1 14/03/2009 B 15

but trying to include your sub query

From
(
Select test_Data.Weekday, test_Data.Hour, test_Data.Date, test_Data.Class, SUM(test_Data.[Elapsed Time]) as ET
From test_Data
Group by test_Data.Weekday, test_Data.Hour, test_Data.Date, test_Data.Class
)

throws an error

thanks for your help

 



WHERE did you perform the query? Please explain IN DETAIL.



Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Skipvought,

in ms query

Transform Sum(test_Data.Elapsed_Time)/Count(*) as ET
Select test_Data.Weekday, test_Data.Class
From
(
Select test_Data.Weekday, test_Data.Hour, test_Data.Date, test_Data.Class, SUM(test_Data.[Elapsed Time]) as ET
From test_Data test_Data
Group by test_Data.Weekday, test_Data.Hour, test_Data.Date, test_Data.Class
)
Group By test_Data.Weekday, test_Data.Class
Pivot test_Data.Hour

throws

'Could not add the table '('.

thx
 



Exactly where is test_Data?

How many rows does the inner query return?

Skip,
[sup][glasses]Don't let the Diatribe...
talk you to death![tongue][/sup][sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
test_data is in MS Access

stand alone inner query returns 10 rows

but when the inner query is placed between ( and ) it throws the error above.

thx
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top