Biguidance
Instructor
Imagine you want to have a list of products bought by customers, for instance the ones who live in the US.
If your database has a somewhat reasonable structure, the transaction data and the customer’s country will be kept separately.
So there are 2 different tables in your SAS data warehouse, let’s say sales_2012_02 and customerinfo.
In order to get to the customers you want you could write regular SAS code, sorting the tables, merging and keeping the customers you need (example here is if you don't use formats or hash tables which often perform better than sorting and merging);
PROC SORT DATA=sales_2012_02 OUT= neededsalestable;
BY customerid;
RUN;
PROC SORT DATA=customerinfo (WHERE=(Country = "US"))
OUT= neededcustomerstable;
BY customerid;
RUN;
DATA dataIwant;
MERGE neededsalestable neededcustomerstable (IN= neededcustomers);
BY customerid;
IF neededcustomers THEN OUTPUT;
RUN;
this works but doesn't look nice and 2 proc sorts might take a bit of computing time.
You can do all this in one easy PROC SQL.
PROC SQL;
CREATE TABLE dataIwant AS
SELECT * FROM sales_2012_02 WHERE customerid IN (
SELECT customerid FROM customerinfo WHERE Country = "US");
QUIT;
Whether one will run faster than the other will depend on your system configuration (we will not discuss that here) but it sure is easier to read this way.
This technique is called SQL subsetting and you should use it wisely.
BIGuidance
If your database has a somewhat reasonable structure, the transaction data and the customer’s country will be kept separately.
So there are 2 different tables in your SAS data warehouse, let’s say sales_2012_02 and customerinfo.
In order to get to the customers you want you could write regular SAS code, sorting the tables, merging and keeping the customers you need (example here is if you don't use formats or hash tables which often perform better than sorting and merging);
PROC SORT DATA=sales_2012_02 OUT= neededsalestable;
BY customerid;
RUN;
PROC SORT DATA=customerinfo (WHERE=(Country = "US"))
OUT= neededcustomerstable;
BY customerid;
RUN;
DATA dataIwant;
MERGE neededsalestable neededcustomerstable (IN= neededcustomers);
BY customerid;
IF neededcustomers THEN OUTPUT;
RUN;
this works but doesn't look nice and 2 proc sorts might take a bit of computing time.
You can do all this in one easy PROC SQL.
PROC SQL;
CREATE TABLE dataIwant AS
SELECT * FROM sales_2012_02 WHERE customerid IN (
SELECT customerid FROM customerinfo WHERE Country = "US");
QUIT;
Whether one will run faster than the other will depend on your system configuration (we will not discuss that here) but it sure is easier to read this way.
This technique is called SQL subsetting and you should use it wisely.
BIGuidance