Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

import large data

Status
Not open for further replies.

zack30

Technical User
Apr 18, 2007
10
DE

Hi all,
New to SAS.
I would like to know how to work with large datasets.
that is importing txt files with more than 10 000 variables.

Thanks for your help

Z
 
Check this out, all the info you'll need:-

Read it, have a play. That should get you firmly on the right path, let us know if you have any problems.
SAS will have no problems reading a file of 10k records. I regularly work with datasets in the order of 20million, I know of people who laugh at how puny my datasets are. :)

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 

The problem is that mydata has only 100001, when I know there are more.
I import from txt tab delimited file.

here is what appears in the log:
"...
NOTE: The data set LIB.mydata has 112 observations and 10001 variables.
...
"

Thanks

Z
 
Are you using the Learning edition of SAS? I believe that is restricted to 10,000 records...

Try putting this at the top of the program, you may have had a limit set:-
Code:
options obs=MAX;
If that doesn't work, and you're not using the learning edition, check the data that has been read in, make sure it was read in correctly.
Post your log file as well, that might have a clue.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 

Hello there,
Thanks for your replies,
Here is the log I get.


Number of names found is less than number of variables found.
Number of names found is less than number of variables found.
784 /**********************************************************************
785 * PRODUCT: SAS
786 * VERSION: 9.1
787 * CREATOR: External File Interface
788 * DATE: 17APR07
789 * DESC: Generated SAS Datastep Code
790 * TEMPLATE SOURCE: (None Specified.)
791 ***********************************************************************/
792 data NBLIB.data ;
793 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
794 infile
794! 'G:\Studies\data.txt'
794! delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=2 ;
795 informat PGX_ID $15. ;
796 informat SNP_A_1909444 $13. ;
797 informat SNP_A_4303947 $13. ;
798 informat SNP_A_1886933 $13. ;
799 informat SNP_A_2236359 $13. ;
800 informat SNP_A_2205441 $13. ;
801 informat SNP_A_2116190 $13. ;
802 informat SNP_A_4291020 $13. ;
803 informat SNP_A_1902458 $13. ;
804 informat SNP_A_2131660 $13. ;
805 informat SNP_A_2109914 $13. ;
806 informat SNP_A_2291997 $13. ;
807 informat SNP_A_4277872 $13. ;
808 informat SNP_A_4221087 $13. ;
809 informat SNP_A_2118217 $13. ;
810 informat SNP_A_1866065 $13. ;
811 informat SNP_A_2288244 $13. ;
....
11714 format AAA_ID $15. ;
11715 format SNP_A_1909444 $13. ;
11716 format SNP_A_4303947 $13. ;
11717 format SNP_A_1886933 $13. ;
11718 format SNP_A_2236359 $13. ;
11719 format SNP_A_2205441 $13. ;
11720 format SNP_A_2116190 $13. ;
11721 format SNP_A_4291020 $13. ;
...

21711 format VAR9998 $2. ;
21712 format VAR9999 $2. ;
21713 format VAR1E4 $2. ;
21714 format VAR1E4 $2. ;
21715 format VAR1E4 $2. ;
21716 format VAR1E4 $2. ;
21717 format VAR1E4 $2. ;
21718 format VAR1E4 $2. ;
21719 format VAR1E4 $2. ;
....
22633 input
22634 PGX_ID $
22635 SNP_A_1909444 $
22636 SNP_A_4303947 $
22637 SNP_A_1886933 $
22638 SNP_A_2236359 $
22639 SNP_A_2205441 $
22640 SNP_A_2116190 $
22641 SNP_A_4291020 $
....
24972 SNP_A_1806342 $
24973 SNP_A_4287996 $
24974 SNP_A_2238081 $
24975 VAR2342 $
24976 VAR2343 $
24977 VAR2344 $
24978 VAR2345 $
24979 VAR2346 $
24980 VAR2347 $
24981 VAR2348 $
24982 VAR2349 $
...
32630 VAR9997 $
32631 VAR9998 $
32632 VAR9999 $
32633 VAR1E4 $
32634 VAR1E4 $
32635 VAR1E4 $
...

33131 VAR1E4 $
33132 VAR1E4 $
33133 VAR11E3 $
33134 VAR11E3 $
33135 VAR11E3 $
33136 VAR11E3 $
...

33551 VAR11E3 $
33552 VAR11E3 $
33553 ;
33554 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
33555 run;

NOTE: The infile
'G:\Studies\data' is:

File
Name=G:\Studies\data.txt,
RECFM=V,LRECL=32767

NOTE: 112 records were read from the infile
'G:\Studies\data.txt'.
The minimum record length was 32767.
The maximum record length was 32767.
One or more lines were truncated.
NOTE: The data set NBLIB.data has 112 observations and 10001 variables.
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
NOTE: DATA statement used (Total process time):
real time 5.07 seconds
cpu time 2.07 seconds


112 rows created in NBLIB.darta from
G:\Studies\data.txt.



NOTE: NBLIB.data was successfully created.
NOTE: Import Cancelled.
 
Sorry, I'd completely misunderstood, I thought you were dealing with 10000 records, not 10000 variables.
Looking at the log, it looks like you generated that code using the import process in SAS. It might be that this is the problem.
All I can think to do first is to try re-writing the input step manually, like this:-
Code:
data NBLIB.data;
  infile 'G:\Studies\data.txt' delimiter='09'x 
         MISSOVER DSD lrecl=32767 firstobs=2 ;

  /* change var20000 to whatever number you actually */
  /*   need                                          */
  informat var1-var20000 $13.;
  length var1-var20000 $13;
  input Var1-Var20000;

run;
And see how that works.
Also, if you've retained the code that was generated (which is always a good idea) open it up and see if it has all the variables you would expect, it might be that the code generator can't handle that many variables at once.




Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Hi Chris,
Thanks for your reply,
Yes I am working with variable numbers up to 500 000.

So I tried your prog and it works but I need the variable names. So is there a way of asking the input step to get variable names from first row of the file.

Something like the GETNAME = YES in the input step?

instead of input var1-var70000?

Thanks again
Z
 
First off, at this point, I think it's worth contacting SAS Support, you've effectively found a limit in on of the processes, and it might be worth seeing if they have a work around.
The alternative method is going to be long and drawn out and involves using SAS to write code and then %include it.
I'll have to have a play to get that working, and as it's 7:50pm and I'm still at work, it's going to have to wait I'm afraid.


Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Thanks for your help, I did contact SAS support and waiting on their ...input


cheers

Z
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top