Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Taking differences at each row

Status
Not open for further replies.

proximo20

Technical User
Sep 29, 2009
22
US
Hi my data looks like this

E 100
C 110
C 112
C 111
E 101

I need to calculate the difference between C's and E's for each row.

So it should look like this.

E 100
C 110 10
C 112 12
C 111 11
E 101 10

My tables are much longer but my problem is simply this. I need calculate the differences between rows for each type(C and E). I appreciate if anyone can give me an idea or advice.
 
Thanks dunnes. You see my problem is a little more complicated than this.

For example for the C on the 4th row it is not the previous row but the last "E" that I should take the difference from and the number of consecutive C's and E's change all the time.

 

i think it will work, all you need to do is ensure that the lag value stays at the letter E value until it finds the next letter E and then update it.

Apologies i dont have time to write the step as im having a few issues with data myself....

Dunnes
 
Thanks again Dunnes. I thought of that and I thought several other methods but I am not at a level that I can apply the things that I can think of. I greatly appreciate it if someone could send me an algorithm that I can apply to my problem.
 
Here you go:

Code:
data x;
rec = 'E'; num = 100; output;
rec = 'C'; num = 110; output;
rec = 'C'; num = 112; output;
rec = 'C'; num = 111; output;
rec = 'E'; num = 101; output;
run;


data x2;
set x;
retain prev_e;
prev_rec = lag(rec);
prev_num = lag(num);

if prev_rec = 'E' then prev_e = prev_num;
if REC = 'E' THEN difference = abs(num - prev_num);
else difference = abs(num - prev_e);
run;
 
Thanks Mdieckman. I get this result from your code.

diff
E 100 10
C 110 2
C 112 1
C 111 10
E 101

But I was trying to get
diff
E 100
C 110 10=>110-100
C 112 12=>112-100
C 111 11=>111-100
E 101 10=>abs(101-111)
 
Sorry I tried the exact code that you have written and it does what I asked for but when I apply it to my data it gives the results that I just wrote above.

I don't understand the only difference is that you have written "output".

The file that I am using is more than 1GB which is full of such sequences, so I can't write output on each line.
 
Did you alter the code any? Can you post your code and the column names? Is that what your actual data looks like or did you alter it for example purposes?
 
Oh and the output in datastep x merely outputs what's in the PDV to the dataset.
 
Yes I have altered it

My table looks like this

Type Price

A 14500
B 13900
B 13950
A 14550
B 14000
B .
B .
A .
A
A
B
A
.
.

So I changed your code into

data temp1;
set temp2;
retain prev_e;
prev_type = lag(type);
prev_price = lag(price);

if prev_type = 'A' then prev_e= prev_price;
if TYPE = 'A' THEN spread = abs(price - prev_price);
else spread = abs(price-prev_e);

run;
 
BTW in some files the first type is A and in some files it is B.
 
OK I see where the problem is

Diff
A 14966
B 14884 82 (right)
A 14964 80 (true)
A 14962 78 (ok)
B 14885 77 (still fine)
B 14887 2 (here is the PRoblem, this should have been 75)
B 14888 1 (should have been 74)
A 14961 73 (it is fine again)

You see the problem starts when there is more than one consecutive B or C in the first data.
 
actually the problem occurs when there are two consecutive E.

Because of this line

if REC = 'E' THEN difference = abs(num - prev_num);

It only calculates the difference between two E lines instead subtracting E's value from the last C.

So I added one more retain line, in short it looks like this.

data temp1;

set temp2;
retain prev_e;
retain prev_c;
prev_type = lag(type);
prev_price = lag(price);

if prev_type = 'A' then prev_e= prev_price;
if prev_type= 'B' then prev_c=prev_price;
if TYPE = 'A' THEN spread = abs(price - prev_c);
if type='B' then spread = abs(price-prev_e);

run;

And it works fine now. I would not think of this without your help Mdieckman. Thanks a lot.
 
Sweet - glad you got it fixed! Was just checking back to see if you replied.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top