Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Skip over variables when importing flat file into sas

Status
Not open for further replies.

sap1958

Technical User
Oct 22, 2009
138
US
I wrote the following code to extract a flat file into sas
data lib.test2;
length Rank 3 Code1 $3 Drug $25 Manufacturer $25 Sales 8 PerChg 8;
infile 'c:\Test2.txt' DLM=' ()';
input Rank Code1 Drug Manufacturer Sales PerChg ;
run;

Here is a sample of what the data looks like in the flat file

3 Plavix Bristol-Myers Squibb Company 4,223,124 11.2%
4 Advair Diskus GlaxoSmithKline 3,653,410 2.3%
5 (1) Seroquel AstraZeneca Pharmaceuticals 3,117,591 7.2%
6 (6) Abilify Bristol-Myers Squibb Company 3,083,351 30.0%
7 Singulair Merck & Co., Inc. 3,027,378 4.5%
The reason I have a variable called Code1 is because some of the entries have a number with parenthesis. As an example two of the entries have such a sequence. I have about 200 records to import. Is there a way to skip the numbers with a parenthesis. All I really want is
Rank
Drug
Manufacturer
Sales
Percentage

 
I tend to provide lengthy solutions, yet this makes sure you can handle these optional cases. Principle problems with you data are an optional variable placed on the data line, plus you have no delimiter between drug and manufacturer; I assume the drug is a single word, but that seems to not be the case - happy fixing that one :)

You could do sequenced inputs with reading the line bit by bit, here I prefer sucking the whole line in and cutting everything around the manufacturer name. Being lazy I coded the cutting as macro to be able to reuse it on different variables.

Code:
%*--  cut string OUT from side CUTFROM of string IN with delimiter DLM.  ;
%macro mCutStr (in, out, cutFrom=L, dlm=" ");
  %if %upcase("&CUTFROM.") eq "L" %then %do;
    &OUT. = scan (&IN., 1, &DLM.);
    &IN. = left (substr (&IN., length(&OUT.)+1));
  %end; %else %do;
    &OUT. = scan (&IN., -1, &DLM.);
    &IN. = left (substr (&IN., 1, length(&IN.)-length(&OUT.)-1));
  %end;
%mend;

%*--  cut number OUT from side CUTFROM of string IN with delimiter DLM.  ;
%*--  use TEMP as a buffer variable.                                     ;
%macro mCutNum (in, out, temp, cutFrom=L, dlm=" ");
  %if %upcase("&CUTFROM.") eq "L" %then %do;
    &TEMP. = scan (&IN., 1, &DLM.);
    &IN. = left (substr (&IN., length(&TEMP.)+1));
    &OUT. = input (&TEMP., BEST.);
  %end; %else %do;
    &TEMP. = scan (&IN., -1, &DLM.);
    &IN. = left (substr (&IN., 1, length(&IN.)-length(&TEMP.)-1));
	&TEMP. = compress (&TEMP., ',');
    &OUT. = input (&TEMP., BEST.);
  %end;
%mend;

data WORK.test (keep = rank drug manufacturer sales percentage);
  infile 'c:\temp\test.txt'
         length=linelen;
  length invar        $500
         dummyStr     $10
         rank         8
         drug         $20
         manufacturer $50
         sales        8
         percentage   8;
  /*--  read the whole line into one string.  */
  input invar $varying. linelen;
  /*--  read the rank (first word on string)  */
  %mCutNum (invar, rank, dummyStr);
  /*--  drop an optional parenthesis number  */
  if invar =: '(' then
    invar = left (substr (invar, index(invar, ')')+1));
  /*--  get the drug (one (!) word on the left)  */
  %mCutStr (invar, drug);
  /*--  drop pct sign on the right  */
  invar = substr (invar, 1, length (trim(invar))-1);
  /*--  cut percentage and sales from the right  */
  %mCutNum (invar, percentage, dummyStr, cutfrom=R);
  %mCutNum (invar, sales, dummyStr, cutfrom=R);
  /*--  what is left: manufacturer incl. opt. spaces.  */
  manufacturer = invar;
run;
 
This solution is good if your flat file is under unix. You can remove all the characters with the parenteses as follows:

If you are using vi editor then in your flatfile type the following command in command mode

:%s/ (.*)//g

This would take out all the characters and make your input file clean and ready to read into SAS.

Regards,
Swetha
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top