Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading raw data file

Status
Not open for further replies.

123sush

IS-IT--Management
Aug 20, 2008
7
US
I am new to SAS and I need to generate report for an input data given name of care, their rating and year. The report should show average rating of car in particular year. Te data look like

car1 car2 car3 car4 car5 car6 .......car30
1990 3 2
1991 4 4 and so on....
1992 5 5
1993 6 6
1994 4 7
1996 6 8
1997 9 9

I need to find mean % of each car in each year. Example average of car1 in 1991 is (4-3)/4 * 100. Then add all the mean % and divide with total number gives average.

Question: do i have to define 30 variables in Var with Infile?

How to store variable or do we have array function in SAS?


 
Hi Sush,
OK, it seems that you are REALLY new to SAS there. :)
Yes, you need a different variable for each field on the input file. You can define them individually, however, as you're looking at Car1, Car2 .. Carn, you can actually take a shortcut toa void listing each one individually.
You CAN build arrays, but they aren't stored in the datasets and are generally only used for very specific purposes.

Code:
filename yourfl '<path>';

data dset1;
  infile yourfl recfm=v lrecl=2056 dsd dlm='09'x missover firstobs = 2;

  length year 5
         car1-car30 4;

  input year
        car1-car30;
run;
In this I've assumed that your data is TAB delimited text ('09'x means "09 Hex" which is the ascii code for a TAB).
FIRSTOBS=2 tells it to ignore the first record (which I assume is the header row). If you don't have a header row, you can leave this bit out.
RECFM=V tells it that the data has variable legnth records (as opposed to fixed length records)
LRECL=2056 tells it that the maximum length it's going to encounter is 2056 bytes long. I tend to include this by default these days. I'm not sure what the default record length is that SAS uses, but if you exceed that length, the data just gets chopped off, which can be very hard to track down.
The DSD option is an option which tells SAS that adjacent delimiters mean that there's a missing value. Without this, it would take the second delimiter in a pair to be the value of that field.

This is a pretty important fundamental of SAS, so I'd recommend checking out the documentation on it if you're planning to be doing a reasonable amount of this kind of work, as it'll give you a pretty good overview of the concepts.
Overview of SAS Datastep Processing:-

Overview of readin RAW data.

And finally, the Infile statement. There are ALOT of options available for the INFILE statement which this section goes into:-



Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Thanks Chris. I am a Cobol developer and just started in SAS. I wrote the code to read the input file thru Array like below.
*********************************************
data car_price;
infile 'C:\SAS\annual_car_price.txt';
Array SY{10,24} Y1S1-Y1S24 Y2S1-Y2S24 Y3S1-Y3S24 Y4S1-
Y4S24 Y5S1-Y5S24 Y6S1-Y6S24
Y7S1-Y7S24 Y8S1-Y8S24 Y9S1-Y9S24 Y10S1-Y10S24;
input Y1S1-Y1S24 Y2S1-Y2S24 Y3S1-Y3S24 Y4S1-Y4S24 Y5S1-Y5S24 Y6S1-Y6S24 Y7S1-Y7S24 Y8S1-Y8S24/
Y9S1-Y9S24 Y10S1-Y10S24;
RUN;

proc print data=car_price;
run;
******************************************************

This is printing output into output window. Now I am thinking to create new variable for average and applying the formula using these array variables and print on report.

Thanks for giving me the links :)
 
No worries. The array there is pretty redundant. We tend to use arrays in SAS only when we want to loop through a list of variables, as the array then allows us to set the suffix via a do loop.
The / at the and of the first line of input shouldn't be necessary, did you put that in just to break it up here in the forum?
SAS doesn't need any help determining that a line breaks over multiple lines, it assumes everything till the next ; is all one line.
I think you'll find SAS a much simpler language than Cobol :).

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Thanks Chris, I am using / for reading next line. I removed it and it also worked.

I am little confused that if I have to use formula for percentage rating for car1 in year 1991
= (rating of car1 in 1991 - rating of car1 in 1990) * 100 divide by rating of car1 in 1990

Then in that case how should I write. I was using Do loop but then i have to define so many variables...and this I have to do for all the years and for all 30 cars.
I think what you said using array is redudent in this case looks right. I should go for simple solution.

The output should loook like


Percentage Average Rating OF CARS

year car1 car2 car3 car4 ........

1991 4 5 4 5
1992 5 3 3 4.....
.
.
. and so on..
.
1997
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top