Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

data step or proc sql for performance time

Status
Not open for further replies.

sap1958

Technical User
Oct 22, 2009
138
US
data lib.NewTable (keep=ID Provider);
set table1
table2
table3;
if status = '3';
if code = '2';
run;

compared to

proc sql;
create table lib.NewTable as
select ID,Provide
From
table1 a
left outer join
table2 b
on a.ID=b.ID
left outer join table3 c
on a.ID=c.ID
where status = '3'
and code = '2'
quit;

I want to do is include ID and Provider from all tables regardless. I am trying to reduce processing time as table1 2 and 3 have thousands of records. If proc sql is not the answer in this case, I am looking for ways to reduce processing time using the data step method (current method)
 
Thousands of records isn't so bad, I don't think you will see much difference either way - unless sitting on >really< slow CPUs and I/O. You'd need more complex joins and lookups with 100k's or millions of records crossed with thousands to see a real difference. Then things like "SET KEY=" on datastep or using HASH objects become interesting.

In your example you are trying to do different things I think: the datastep simply appends 3 tables and keeps those records where the status and code match as defined - no merge here. To accelerate that you can put a where clause on each single table: "status eq '3' and code eq '2'". Could do a bit, subsetting at the source is always good, but is not so significant if the table is not wide.

The SQL is a join, something completely different, so it would be interesting here what table each variable comes from. Or was "union" the intention because all tables have the same layout as implied by the first datastep? In case of the latter, the system has to read each table completely, with limiting the I/O to the referenced variables (KEEP statement or variable list in SQL) I see no performance difference... via SAS/SHARE I don't know, too long ago since I used that. Limit fairly much should be physically defined by CPU and disk / storage when you work locally and have "append".
 
I have to agree here, when I comes to creating tables/datasets sql or sas base is more or less the same, its when you come to exacting 100k's/million rows of data and/or merging (joining) as you dont have to pre sort a join, whereas a datastep merge you do..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top