Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sumproduct formula - determine if records consitent 8

Status
Not open for further replies.

IngDep

MIS
Feb 7, 2008
89
US
Have a 35,000+ row Excel worksheet with the data arranged similar to the following;

ProcCd---CostPerSvc-----CostPerSvcConsistent?
12345----10-------------Yes
12345----10-------------Yes
12345----10-------------Yes
24680----8--------------No
24680----8--------------No
24680----6--------------No

If I recall, the use of a sumproduct formula will allow me to input "Yes" or "No" in column C if the Cost Per Svc figure in column b is not consistent.

The use of the following IF statement is not working properly; =If(A5<>A6,"Yes",If(AND(A5=A6,B5=B6),"Yes","No"))

Any insight as to how I need to use a sumproduct to accomplish this?

 
I'd use this in C5:
[tab][COLOR=blue white]=B5 = SUMIF(A:A, A5, B:B) / COUNTIF(A:A, A5)[/color]
B5 obviously just refers to the cell to the left. It compares that value to the average of column B for each record where Column A is the same as the current row.


[tt]_____
[blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
Did input the formula in cell C5 but was not successful.

Received "FALSE" when the cost per service figure was consistent across a given procedure code.

 
Dang. As soon as I pressed submit it occurred to me that if the CostPerSvc for 24680 were 6, 7 and 8, then 7 would return TRUE because it is the average.

- -

I got called away from my desk and now that I'm back I see that you've already awarded me a star. I appreciate that and will now try to earn it....

Try this instead:
[tab][COLOR=blue white]=SUMPRODUCT(($A$2:$A$50=A2) * ($B$2:$B$50=B2)) = COUNTIF(A:A,A2)[/color]
As written, this would go in C2. The SumProduct part counts how many times the current row's combination of Column A's value & Column B's value occur, then compares that number to how many times the current row's Column A value occurs.

NOTE: In the SumProduct formula, each section (column A & column B) must contain the same number of rows.

[tt]_____
[blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
-> Received "FALSE" when the cost per service figure was consistent across a given procedure code.

As I said, the original formula did have a weakness, but that doesn't make sense.

Off the top of my head, I suspect one of two culprits, either of which could also cause problems for the new formula:
1) You have values with hidden decimal places. So 10 & 10.2 would both display in the cell as "10", but would average out to 10.1, not 10.

2) You have leading or trailing spaces in the ProcCd

[tt]_____
[blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
Did try the latest suggestion. Receive 0 in all of the cells.

Any clue as to the culprit/resolution?

It appears that the worksheet is not calculating.

Also, did check the formatting of the column with the Cost Per Service figure and formatted as number with two decimal places.
 




Please post one of the formulas that you copied to another cell.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
=SUMPRODUCT(($B$6:$B$12878=B6)*($G$6:$G$12878=G6)=COUNTIF(B:B,B6))

Upon reviewing the worksheet, I do indeed have cost per service figures that have numbers that are different than what is displayed. For example, displayed is "10.86" but when I select the cell, "10.864161113" is displayed in the formula bar.

How do I resolve?

 

[tt]
YOURS: V
=SUMPRODUCT(($B$6:$B$12878=B6)*($G$6:$G$12878=G6)=COUNTIF(B:B,B6))

Should be: V
=SUMPRODUCT(($B$6:$B$12878=B6)*($G$6:$G$12878=G6))=COUNTIF(B:B,B6)
[/tt]
Your LEFT PARENTHESES is in the wrong place.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
did make the correction but still receive "TRUE" OR "FALSE."

It appears to be the hidden decimal precision issue.

Any ideas as to a resolution?
 



TRUE or FALSE is what this formula returns. (equivalent to YES or NO, don't you think?)

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
TRUE, YES would be synonymous to "TRUE."

Additional context: The data is sorted by a name field that I did not indicate in the initial posting.

Specifically, the order of the sort is Name (Ascending) and then procedure code (ascending).

Consequently, do I have to resort the data, in ascending order, by procedure code and not regard the Name field to
get the sumproduct function to work properly?



 
You can simplify Skip's formula by looking for prices that aren't equal to the current one:
=SUMPRODUCT(($B$6:$B$12878=B6)*($G$6:$G$12878<>G6))=0

If all the matching Procedure Codes have the same price, then the SUMPRODUCT returns 0--so the formula returns TRUE.

Brad
 



I can't take credit for the formula. John provided it. But nice option. Lots of ways to skin a cat.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Will the function work if there is just one record for a particular procedure code?
 
Also, what modifications is needed in the formula so that "FALSE" will appear in Column C only for the "offending" record - the record with the inconsistent cost per service?
 
-> Will the function work if there is just one record for a particular procedure code?
Go crazy and try it for yourself. Playing around with these things is a great way to learn. But to answer your question - if there is a single record either formula (mine or byundt's) will return TRUE.

-> Also, what modifications is needed in the formula so that "FALSE" will appear in Column C only for the "offending" record - the record with the inconsistent cost per service?
What if there are two instances of the "procedure code" and they have different "cost per service"? How would you know which is the "offending" record? If you don't have an answer for that question, I would advise against trying to do that, because your results will be inconsistent.

[tt]_____
[blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
With 35,000 records, there is probably a substantial recalc time if any data is changed in columns B or G because every SUMPRODUCT and COUNTIF needs to troll through all 35,000 rows. That's the reason why I suggested an alternative SUMPRODUCT without need for a COUNTIF.

If you are willing to sort your data first by Prod Code and then by CostPerService, then you can eliminate this lengthy recalc time using two auxiliary columns. Recalc time using these auxiliary columns will be instantaneous because any change in columns B or G forces only a handful of formulas to recalc, and none of those formulas involve more than two rows of data.

The first auxiliary column copies the CostPerService top down (from the record with the same Prod Code above). The second auxiliary column copies the CostPerService bottom up. Since the data is sorted by CostPerService, the first column will contain the minimum cost while the second column contains the maximum cost.
=IF(B2=B1,H1,G2) top down (formula goes in cell H2)
=IF(B3=B2,I3,G2) bottom up (formula goes in cell I2)

You can now compare these two auxiliary columns to determine Consistency:
=IF(H2=I2,"","Not Consistent")
I like to make the exceptions stand out, either using color or text instead of a blank (empty string).

If you want to identify the exceptions, then you could use a variation on John's original formula. It compares the CostPerService to the MODE of costs for that ProductCode. I needed to do that test twice because MODE returns #N/A error value if there is a tie for the mode. This is an array formula, so remember to CTRL + Shift + Enter.
=IF(J2="","",IF(ISNA(MODE(IF(B$2:B$40000=B2,G$2:G$40000,""))),"Exception", IF(G2<>MODE(IF(B$2:B$40000=B2,G$2:G$40000,"")),"Exception","")))
Even though this formula looks computationally intensive, it is executed only in those ProductCodes that have been identified as having consistency problems.

Brad
 
Thanks for the insight.

I will test.

Definitely, I am encountering issues related to the recalc time. Will use the auxiliary columns.

The comment regarding the determination of the "offending" record is noted. I had incorrectly assumed that just because a particular cost per service figure is prevalent for a given procedure code, it does not imply that the particular cost per service figure is "non-offending.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top