Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Running a Loop that Counts Observations

Status
Not open for further replies.

marksnape

Programmer
Aug 4, 2008
5
GB
Hi,

I need to code a loop, my data looks (simplified) like this:

Obs Type Time
1 QUOTE 12:01:00
2 TRADE 12:02:00
3 QUOTE 12:03:00
4 QUOTE 12:04:00
5 QUOTE 12:06:00
6 TRADE 12:18:00
7 QUOTE 12:23:00
8 TRADE 12:28:00
9 QUOTE 12:29:00
10 QUOTE 12:30:00
11 TRADE 12:32:00
12 QUOTE 12:36:00
13 TRADE 12:42:00

Basically, what I want to do is count the trades that took place in the last half an hour. So using obs 13 as an example, I want to write a loop that counts the trades (and not the quotes) between 12:12:00 and 12:42:00. The output will count obs 6,8 & 11 so the result will be a new variable (say Trade_Count) with a value of 3.

Is a loop even the way to go? I admit that I am a novice, but hopefully with some help I can eventually begin to contribute to the forum. I have three large SAS books on route from Amazon so I have a fun time of becoming a self-taught pro ahead!

Mark
 
You could use PROC SQL's grouping functions to get this result. You can also use proc freq. If you want to use a loop and the DATA STEP you need to first sort this data by type & time. You keep only the TRADE types. You then select (using the IF statement) only those obs that fall between your start and end time, then you count.

ex using SQL
Code:
proc sql;
create table trade_counts as
select type,count(time)
from your_data
where type="TRADE" and ('12:12:00't lt time le '12:42:00't)
group by type;
quit;

ex using datastep
Code:
proc sort
   data = your_data
   out  = dat_sorted;
   by type time;
   where type = 'TRADE';
run;
data results;
  set dat_sorted end=last;
  if _n_ = 1 then ctr=0;
  if '12:12:00't lt time le '12:42:00't then
    ctr+1;
  if last then
   put 'Number of trades: ' ctr;
run;

Of course you need to use the ideas presented above and adapt the method that works for you.
Klaz
 
I think I can adapt the second code to acheive what I want. Thanks Klaz
 
Okay, I havn't quite got there. Basically, i was really trying to simplify my task. I actually have 30 millions rows of data. What I can do is take out all the trades, so we can forget about QUOTES. For every new trade, I need to count the number of trades that occured in the previous half an hour. Any ideas?
 
Hi Mark,

You can use a correlated subquery and reflexive (self) join. Minus the jargon, this means querying your dataset from each row of your dataset to get the counts. Proc SQL I believe is your best bet for this.

Regarding the 30M rows, you might want to make sure that you have indexed your database.

Code:
data test;
input Obs Type$ Time :time8.;
format time time8.;
cards;
1 QUOTE 12:01:00
2 TRADE 12:02:00
3 QUOTE 12:03:00
4 QUOTE 12:04:00
5 QUOTE 12:06:00
6 TRADE 12:18:00
7 QUOTE 12:23:00
8 TRADE 12:28:00
9 QUOTE 12:29:00
10 QUOTE 12:30:00
11 TRADE 12:32:00
12 QUOTE 12:36:00
13 TRADE 12:42:00
;
run;


proc sql;
select (select count(time) 
        from test 
        where (out.time-'00:30:00't < time < out.time)
        and type ='TRADE'), time
from test as out
where type ='TRADE';
quit;
 
I think the following would be worth considering, it should be quite efficient, given number of obs ...

data dummy;
set test(where=(upcase(type)="TRADE" and time()-time<="00:30"t));
run;

data _null_;
set dummy nobs=last30tr;
put "Number of trades in last 30 mins is ..." last30tr;
stop;
run;

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top