This FAQ addresses the use of the 'Eliminate Duplicate Rows' checkbox within Cognos Impromptu. It is maintained by griffindm. Suggestions for additions and corrections can be addressed to him via the 'Send a Comment' link below the FAQ. All suggestions are intended for the current release of Impromptu, unless otherwise noted.
Background
When report writing, the author will occasionally come across situations where data in the report appears to contain duplications. When this occurs, it is
very important for the report writer to ascertain
why this is happening in order to be sure that the proper steps are taken to avoid the problem, without other issues appearing as the report is used.
Two Root Causes: First Case
When this situation happens, it usually has one of two root causes. I will cover the one most users are trying to eliminate first. Let's consider some test data:
Code:
Voucher Date Vendor State Amt
---------- -------- -------- ------- -------
12345 01/01/02 ABC MD 100.00
12346 01/01/02 DEF VA 250.00
12347 01/01/02 GHI DC 500.00
While most of this data is from a single fact table, the
state is from a child address table of a parent Vend table. Let's look at what that table might look like.
Code:
Vendor Addr1 Addr2 City State Postal
-------- ------- ------- ------- ------ ------
ABC Main St Laurel MD 20723
ABC 2nd Ave Easton MD 22790
DEF 42nd St #5A Fairfax VA 24213
GHI West Rd. Wash DC 25643
Note that vendor ABC has
two addresses. When this data is pulled into your report, the data for ABC would appear
twice. This could happen for all rows, if they all share the same many-to-one relationship, or for none, if all only have one
now. That could change tomorrow if a vendor gets another valid address.
In this case using the 'Eliminate Duplicate Rows' checkbox in the filter of the report may be valid as long as the remaining data for the duplicated row is unique. It will give you the correct data in the above case
as long as any additional rows for a vendor are from the same state. A
better solution would be to determine the correct, or default address used, and only include that in your report. This is often recognizable by another row column that designates it. Failing that, you could use a Max function on the state to ensure you only get a single row returned.
Second Case: When NOT to use it
Another common situtation that gives apparent duplicate rows is
NOT appropriate for this type of solution. This happens when the report does not include the
key values of the fact table. Based on the limited data you see, rows that are in fact different appear as duplicates. Consider sample data again:
Code:
Voucher Ln Date Vendor Desc Amt
---------- -- -------- -------- ------- -------
12345 1 01/01/02 ABC Rental Car 100.00
12345 2 01/01/02 ABC Hotel 100.00
12346 1 01/01/02 GHI Travel 500.00
In this scenario, the first two lines share a common voucher number, but have different descriptions and line numbers.
If your report
query (not necessarily just the output) did not include either the line number or the descriptions, then Impromptu would see the first two lines as duplicates and would suppress one of them. This is
NOT what you want.
When creating a report that may contain apparent duplicates, I always recommend turning off the option to 'Retrieve only the required rows' (found under menu option Report | Query | Access', proving the numeric validity of the results returned from
just the fact table data only. Then after confirming this, add the other dimensional lookups while checking that the row count and report totals do not change.
What Impromptu Does When Suppressing Duplicates
When you check the box for this, Impromptu will include a
distinct clause in the resulting SQL. This results in the elimination of data that are identical
in the data returned. That's all it does. This process can also slow the report down significantly if there are a lot or rows and columns in the report. Impromptu has no more idea than you do as to what is truly unique to the row within the application holding the data.
While the information given here may provide a good rule to follow, the bottom line is understanding your data thoroughly is the best aid to being an effective report writer.