Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regression help

Status
Not open for further replies.

ScottSer

Programmer
May 25, 2008
1
US
I am a relatively new user to SAS, in addition to being somewhat rusty with statistics, and am sorry if this question appears ignorant, but I did not know enough to easily find a solution myself. Anyway, I have a data file with about 14 variables for each record and about 40,000 records. The problem is that 10 of the variables are character variables. The only way I would know to solve this is to group each specific character and make them 1/0 coded. This would produce about 500 variables. First of all, it doesn't appear easy to manipulate the data into this form, because only excel seem appropriate and it is borderline ridiculous to try with such a large sample. Is there some way that SAS can automatically pick up that each of the characters should be considered a 1/0 variable? I would imagine that either way I would need to list all of the variables in the model code, correct? Is there some way that is easier than excel to format the data how I want it using SAS? I think SAS can handle this much data and more, correct? I do have the learners edition (I believe). Please let me know what I should do and if there is anything else I should consider. Again, I have only moderate coding experience. Thanks a ton.
 
Hi Scott,
It sounds like a sticky problem, but I'd never suggest using Excel over SAS (though that's mostly personal prejudice).
I'm thinking that Proc Transpose MIGHT possibly be useful to you here.
Code:
data dset1;
  set dset;

  cnt=1;
run;

proc transpose data=dset1 out=dset2 prefix=var1_;
  by recid ;
  var cnt;
  ID var1;
run;
Proc Transpose can only transpose 1 variable at a time, so you'll need to write a different proc transpose for each variable you need to do like this, then join the results together at the end.
In the above code, RECID will be whatever the primary key is on your dataset, and VAR1 will be your character variable. CNT will just put a 1 into the field where necessary. The Prefix= option prefixes the resulting variables with "VAR1_", and the ID statement tells it to complete the new column names with the value that is in the variable. NB - Make sure that the values in VAR1 are values which can be used in SAS variable names (no special characters or spaces etc).

I hope that this helps.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Hi
for character regression don't use proc reg.
you can use proc glm or proc genmod.
e.g.

proc glm data = user.blah;
class your character variables;
model something = your character vairable,with your numerical varaibles;
output out = user.blah1 p=pred;
run;

when using character variables, make the one with the most observations the base (bottom) figure, as this will improve the model.
the p=pred lets you see the predicted value in your output data set
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top