Multiple table lookups and implicitly retained variables

MatthiasB · Jul 24, 2008

Hi everyone,

I have a tricky problem with table lookups and implicit retaining of variable from those input vectors. We have a base table with several million records and >1,000 variables, the lookup tables are also in the region of several hundred K observations and several hundred variables.

Selected variables have to be updated, so the tables are looked up by their key and only variables involved are kept. After basic iorc correction an include file (>100,000 lines of code) is executed that in a matrix style selection process picks the correct assignment / calculation. These include files are maintained by a different part of the organization.

That's were the problem arises: as soon as a new variable is introduced in the include file which is not kept explicitly on the lookup table, potentially a previously read value from another lookup table is taken instead. Following a very simple example:

[tt]/*-- base table w/keys */
data base;
x = 1; output;
x = 2; output;
x = 3; output;
run;

/*-- variable lookup 1 */
data tree1 (index=(x));
x = 1; a1 = .; a2=1; output; /*-- missings allowed! */
x = 2; a1 = 1; a2=1; output;
x = 3; a1 = .; a2=1; output;
run;

/*-- variable lookup 2 */
data tree2 (index=(x));
x = 1; a2 = .; output;
x = 2; a2 = 2; output;
x = 3; a2 = 2; output;
run;

/*-- data enrichment */
data new1 (keep = x a1 b1)
new2 (keep = x a2 b2);
set base;

/*-- lookup #1 */
set tree1 (keep = x a1) key=x / unique;
if _iorc_ ne 0 then
_error_ = 0;
/*-- simulate business logic include file #1 */
b1 = a2;

/*-- lookup #2 */
set tree2 (keep = x a2) key=x / unique;
if _iorc_ ne 0 then
_error_ = 0;
/*-- simulate business logic include file #2 */
b2 = a2;

run;[/tt]

The resulting table work.new1 is the problem: for variable A2 I would expect all values to be 1, yet due to the implicit retain of the input vector from the 2nd lookup table that table's A2 variable is used, "delayed" by one observation:

[tt]OBS X A1 A2
1 1 . .
2 2 1 .
3 3 . 2[/tt]

Does anyone have an idea how to best identify these kind of issues? We are talking a huge number of datasteps where this kind of issue could be on, grown over a number of years...

Many thanks! Cheers, Matthias

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Multiple table lookups and implicitly retained variables

MatthiasB

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor