Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compression was disabled for data set 1

Status
Not open for further replies.

iren

Technical User
Mar 8, 2005
106
US
Hi everybody. I have a problem and I am nit sure if it is a memory issue or time issue.
I run the following query against DB2 in data warehouse (through SAS environment):
LIBNAME out '/uti';
LIBNAME DW DB2 UID=***
pwd=***
ds=DWQ;
PROC SQL;
CREATE TABLE out.ppl_with_diag AS
(SELECT c.mem_prof_key
FROM DW.ICD_DIAG_GRP t
INNER JOIN DW.MEDICAL_SERVICE S ON S.ICD_DIAG_GRP_KEY=t.ICD_DIAG_GRP_KEY
INNER JOIN DW.MEDICAL_CLAIM C ON c.clm_mem_key=S.clm_mem_key
INNER JOIN DW.CLAIM_STD_PRODUCT_PROFILE P ON c.STD_PRODUCT_KEY=P.STD_PRODUCT_KEY
WHERE (C.RECEIVED_DT GE '01Jan2006'd AND C.RECEIVED_DT LE '31Dec2006'd ) AND
(substr(t.icd_diag1_cd,1,3)="595" Or substr(t.icd_diag1_cd,1,3)="599" OR substr(t.icd_diag1_cd,1,4)="5950" OR
substr(t.icd_diag1_cd,1,4)="5952" OR substr(t.icd_diag1_cd,1,4)="5953" OR substr(t.icd_diag1_cd,1,4)="5958" OR
substr(t.icd_diag1_cd,1,4)="5959" OR substr(t.icd_diag1_cd,1,4)="5990" OR substr(t.icd_diag1_cd,1,5)="59589" OR substr(t.icd_diag1_cd,1,5)="59651"
OR substr(t.icd_diag1_cd,1,5)="59654" ) AND t.icd_DIAG1_DESC ne "DW Created" OR
(substr(t.icd_diag2_cd,1,3)="595" Or substr(t.icd_diag2_cd,1,3)="599" OR substr(t.icd_diag2_cd,1,4)="5950" OR
substr(t.icd_diag2_cd,1,4)="5952" OR substr(t.icd_diag2_cd,1,4)="5953" OR substr(t.icd_diag2_cd,1,4)="5958" OR
substr(t.icd_diag2_cd,1,4)="5959" OR substr(t.icd_diag2_cd,1,4)="5990" OR substr(t.icd_diag2_cd,1,5)="59589" OR substr(t.icd_diag2_cd,1,5)="59651"
OR substr(t.icd_diag2_cd,1,5)="59654" ) AND t.icd_DIAG2_DESC ne "DW Created" OR
(substr(t.icd_diag3_cd,1,3)="595" Or substr(t.icd_diag3_cd,1,3)="599" OR substr(t.icd_diag3_cd,1,4)="5950" OR
substr(t.icd_diag3_cd,1,4)="5952" OR substr(t.icd_diag3_cd,1,4)="5953" OR substr(t.icd_diag3_cd,1,4)="5958" OR
substr(t.icd_diag3_cd,1,4)="5959" OR substr(t.icd_diag3_cd,1,4)="5990" OR substr(t.icd_diag3_cd,1,5)="59589" OR substr(t.icd_diag3_cd,1,5)="59651"
OR substr(t.icd_diag3_cd,1,5)="59654" ) AND t.icd_DIAG3_DESC ne "DW Created" AND P.PROD_LEVEL3_CD="HB"
);quit;

And I got the following message in the log:
WARNING: Compression was disabled for data set OUT. PPL_WITH_DIAG because compression overhead would increase the size of the data set.
=== COMMANDS PASSED TO SYNCSORT: ===
********************************************
It seems to me that my 4-way join of tables all on the DB2 side looks like I didn’t have to cross the boundary hardly at all. However, my log indicates that compression was disabled. Totally confused.
How can I overcome the problem? Is there any way to optimize this query?
Thank you in advance!

Iren

 
This message is perfectly normal, and your query will run normally.


The issue here is that the data contained on field c.mem_prof_key would not benefit from data compression, which is by default used by syncsort on this situation, and hence he is advising you that he (syncsort) will not attempt to create the output table with compression active.

As for optimizing the query, its impossible to say without knowing all the indexes of the tables in question, and also with what is the cluster ratio of each index.


As for the query it self, if you are using V7 or greater, then the following would be possible, but just for clarity of the SQL. No performance is possible using substr.

(substr(t.icd_diag1_cd,1,3) in ("595","599")
OR substr(t.icd_diag1_cd,1,4) in ("5950","5952","5953","5958","5959","5990")
OR substr(t.icd_diag1_cd,1,5) in("59589","59651","59654"))
AND t.icd_DIAG1_DESC ne "DW Created"
OR (substr(t.icd_diag2_cd,1,3) in ("595","599")
OR substr(t.icd_diag2_cd,1,4) in ("5950","5952","5953","5958","5959","5990")
OR substr(t.icd_diag2_cd,1,5) in ("59589","59651","59654"))
AND t.icd_DIAG2_DESC ne "DW Created"
OR (substr(t.icd_diag3_cd,1,3) ("595","599")
OR substr(t.icd_diag3_cd,1,4) in ("5950","5952","5953","5958","5959","5990"))
OR substr(t.icd_diag3_cd,1,5) in ("59589","59651","59654")
AND t.icd_DIAG3_DESC ne "DW Created" AND P.PROD_LEVEL3_CD="HB"
the above fields in BOLD are not required on the SQL, as other bits of it include those conditions.
e.g. substr(t.icd_diag1_cd,1,3) = "595" will automatically include the
substr(t.icd_diag1_cd,1,4) = "5950"

As substr should be avoided if possible, specially if the above fields are part of an index, the above could be rewritten as follows.
( t.icd_diag1_cd like "595%"
OR t.icd_diag1_cd like "599"
OR t.icd_diag1_cd like "59651%"
OR t.icd_diag1_cd like "59654%")
AND t.icd_DIAG1_DESC ne "DW Created"
OR (t.icd_diag2_cd like "595%"
OR t.icd_diag2_cd like "599%"
OR t.icd_diag2_cd like "59651%"
OR t.icd_diag2_cd like "59654%")
AND t.icd_DIAG2_DESC ne "DW Created"
OR (t.icd_diag3_cd like "595%"
OR t.icd_diag3_cd like "599%"
OR t.icd_diag3_cd like "59651%"
or t.icd_diag3_cd like "59654%")
AND t.icd_DIAG3_DESC ne "DW Created" AND P.PROD_LEVEL3_CD="HB"






However if the icd_diagxx fields are the second field of a index, and if the fields mentioned on the joins are the first one of the index, then it would be better to have several sql's with union all between them, instead of a single sql.


Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top