Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Calculate Average of (only) Distinct Values in Excel 4

Status
Not open for further replies.

TomYC

Technical User
Dec 11, 2008
191
US
piece of cake in Access, but in Excel I have (obviously output from a non-relational database!):
TAT PR Case #
41.0 PR-11942
41.0 PR-11942
41.0 PR-11942
28.0 PR-11641
26.0 PR-11669
What should the obvious Average TAT be? 31.7 of course, as a weighted average is not desired!
 
If you are setting up a query to pull this data from access, you could create another query (that you could hide the output from) and average those results.
Alternatively you could make a new column that only shows a value in the TAT column if it is unique, then have an average of that column (AVERAGE does not include blanks)

Or you could use an Advanced Filter to generate a unique list of values from the TAT column and average that.

There would also be a very unwieldy way to do all of this with a single formula that I do not recommend.

Which would you like? (I'm not going to write out the solution to each one before you choose)
 
If you just want to add a column to show the average of the "PR Case#" on that row, well that's really easy:

[tt]
TAT PR Case # PR Case# Avg

41 PR-11942 =AVERAGEIF(B:B, B2, A:A)
41 PR-11942 =AVERAGEIF(B:B, B3, A:A)
41 PR-11942 =AVERAGEIF(B:B, B4, A:A)
28 PR-11641 =AVERAGEIF(B:B, B5, A:A)
26 PR-11669 =AVERAGEIF(B:B, B6, A:A)
[/tt]

But I'm thinking you want a list of unique PR Case #s on a separate sheet with the average for each.

A few questions:

Is this something you'll have to do often, or are you just looking for a quick way to get it done right now?

What version of Excel are you using? (I ask because starting in Excel 2007 there's a built in "Remove Duplicates" button that would make getting a list of unique Case #s a breeze.)

[tt][blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
Thank you both for these replies--
I guess my ultimate goal is always to find a way to do it so that, when I'm gone, a relative novice will be able to run the report and get the correct answer without any additional data processing. So in general, I try to find a formula, however unwieldy, as long as it works. Second choice is to add hidden columns or even hidden worksheets.
But I definitely want a result that gives one average of all TATs that are not duplicates as per column B. So I definitely do not want the answer to give 35.4 in my example!
 
Are you comfortable with Macros? When I read, "when I'm gone, a relative novice will be able to run the report and get the correct answer without any additional data processing," I think macro. In fact I write macros all the time to allow others to go from point A to point B with a button click.



[tt][blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
I haven't created many macros, I suppose because I'd read somewhere (perhaps about Access) that it was better to go the VBA route. A macro triggered by an On Open event would be relatively transparent (opaque, actually) to the user, would it not?
 
try this
Code:
=SUMPRODUCT((A$1:A5<>A$2:A6)*A$2:A6/SUM(--(A$1:A5<>A$2:A6)))
Assumes that "TAT" is in A1 and all data follows.
you would need to make sure the "A$1:A5" array started on "TAT" and reached down to the second to last data point, and that the "A$2:A6" had all the data points.

All is necessary for the formula to work!

(very unwieldy)
 
I'm going to chew into this one--it looks interesting.
Thank you very much!
 
Oh, duh, to make things easier, you could add dynamic ranges

Create a new named range, use this as the source (I named mine comp1)
Code:
=OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A)-1,1)
And another (comp2)
Code:
=OFFSET(Sheet1!$A$1,1,0,COUNTA(Sheet1!$A:$A)-1,1)

And your formula would then be
Code:
=SUMPRODUCT((comp1<>comp2)*comp2/SUM(--(comp1<>comp2)))
 
[small]oooo[/small]ooooOOOOoooo[small]ooooh[/small]

Well I feel silly. I completely missed the last line of the OP, "What should the obvious Average TAT be? 31.7 of course, as a weighted average is not desired!".

I thought you were looking to call out individual Case#s and show an average of each.

Now that I understand your goal, I'd just make one small adjustment to Gruuuu's very nice formula:

[tab][COLOR=blue white]=SUMPRODUCT(([red]B$1:B5<>B$2:B6[/red])*A$2:A6/SUM(--([red]B$1:B5<>B$2:B6[/red])))[/color]

Notice that I changed the parts in Red from the TAT column to Case# column. Otherwise it would return an incorrect result if any two successive Case#s had the same TAT. For an example, change the TAT for PR-11641 from 28.0 to 41.0. The average of unique Case #'s should become 36, but if we key off of changes in column A then we'd get 33.5 (because it lumps that 41 in with all of the ones above).

Also, just to state it explicitly, such a formula depends on the data being sorted by  PR Case #.

[tt][blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
Star for higgins. Good catches! For some reason I was thinking that the Case # did not correlate to the TAT. Which is dumb; OP clearly stated it came from a DB pull.
 
Thanks for the formula. I've been trying various ways to do DistinctCount and Sum or Average of distinct records in Excel.
My best solution was to add a column C (call it DCount) with the formula =if(A$1=A$2,0,1) where column A has the Case#. This identifies Distinct Case#.

Then I use the Tools, Conditional Sum. This wizard is easy and pretty self explanatory.
I set it to sum the column B (TAT in this thread) if DCOUNT=1.
It creates the formula
{=SUM(IF($C$2:$C$10=1,$B$2:$B$10,0))}

You can get the Average by dividing by the sum of DCOUNT.

You do have to sort by column A (PR CASE#)



 
I thank you all, and throw in a couple of stars.
In the end, I defined three named ranges, per anotherhiggins example and use the following:

=SUMPRODUCT((comp1<>comp2)*comp3/SUM(--(comp1<>comp2)))

I wish I didn't have to remember to sort on that column, but I understand why, and if wishes were horses...
 


I haven't created many macros, I suppose because I'd read somewhere (perhaps about Access) that it was better to go the VBA route.
Actually "macro" in Excel IS VBA, where a "macro" in Access is not.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Just a thought: If you do go the macro-route, what behaviour do you want to pursue if you get two lines with the same case number, but different values for TAT?
 
I'm thinking that might be apples and oranges, but in my specific case, TAT referred to Turn Around Time, so it would be constant for a particular case number!!
 
I've mulled this over and I realized that there is a clean formula in here for Distinct Count that I haven't seen on this site yet. Taking a piece of Gruuuu's formula

=SUMPRODUCT(--(comp1<>comp2))

or without his range formulas

=SUMPRODUCT(--(A$1:A100<>A$2:A101))

of course A has to be sorted ascending

It also works as

{=SUM(--(A$1:A100<>A$2:A101))}
where you use CTRL+SHIFT+ENTER to create an array

This will come in handy for use in other formulas and when a distinct count is required by users at the bottom of the column.

Thanks Gruuu

C
 
No problem!

But... I honestly see very limited use of this in most environments. Normally this is something I would handle with a SQL query. Or a Pivot Table entry. Or an Advanced Filter.

Like I mentioned in my first reply, this is not an ideal solution because it requires sorted data, and isn't easily accessible to most users (it's a little complex).

Granted, the complexity makes it more FUN, but less PORTABLE.
 
Yep, there are a lot of easier ways to get "the answer". However, this is a good way of putting the answer at the bottom of an excel list where many of my co-workers seem to need it. In reality, they really don't need excel but they won't let go, even though they don't really know how to use it.

But this isn't a forum for complaining about co-workers....sorry! ;)

C
 
Perhaps as a final word on this thread, and with thanks to all who participated, this thought for why an Excel Formula:
so that, long after I am gone from this report, a replacement user, who will most probably not know sql or even much of Excel, but who can follow simple instructions ("dump the data, and sort on column B, ascending order!') can replicate the results. I have Excel reports and Access databases scattered all over this institution, and many are in use years after I've not touched them, and in some cases, current users would not know whom to ask nor would they know how to ask it!
T.Y.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top