I am writing a SAS program that must read in an Excel file. My team has Perl code that converts the Excel file to a CSV file that our SAS programs are able to read. Now, I've done this in the past with no problems. However, this time around the program must read in an Excel file with cell values that may contain new-line characters.
Although I still consider myself relatively new to SAS, I believe I have figured out the issue I am having. When an INPUT statement is called, an entire line is read into a buffer. It determines the end of each line by a new-line character. Since some of the cells in the Excel input file may contain new-line characters, an observation is split. When this occurs, the left-part of the observation is read in, but the right-part is read in on the next iteration as a seperate observation. Let me illustrate with example data, where '?' will represent a new-line character:
Example Excel File
============
Field 1 Field 2 Field 3 Field 4
-------------------------------------------------------------
415 Smith, John G troublesome?field orange
416 White, Bob M another value blue
417 Rogers, Bill J a third record green
If there is no new-line character in Field 3, normally the INPUT statement would read in all of the first obersvation ('415' through 'orange' values). However, since this Excel file may have a new-line character in the field, what happens is the INPUT statement reads in only part of the first observation ('415' through 'troublesome'). Then for the second iteration, the INPUT statement continues reading values from 'field' as if it has already reached the second line.
Is there any way I can read in entire observations whether there is a new-line character in a field or not, without having to modify the Perl code to remove the character? There has to be a way to let SAS know how many fields are in each observation and/or to ignore certain new-line characters. Any help will be very much appreciated!!! My current code is below, in case you would like to read/use it. Thanks, everyone
data file_data; infile csv_file dlm=&delimiter_cmd dsd truncover missover firstobs=2;
keep record_date_seconds record_date record_date_formatted;
input id:10. employee_name:$8. employee_id:$10. job_title:$50. location:$50.
location:$50. reporting_manager:$50. ticket_number:$15. record_date_seconds;
record_date = mdy(01,01,1900) + record_date_seconds - 2;
record_date_formatted = put(record_date, worddate18.);
run;
*****************************************************************;
Although I still consider myself relatively new to SAS, I believe I have figured out the issue I am having. When an INPUT statement is called, an entire line is read into a buffer. It determines the end of each line by a new-line character. Since some of the cells in the Excel input file may contain new-line characters, an observation is split. When this occurs, the left-part of the observation is read in, but the right-part is read in on the next iteration as a seperate observation. Let me illustrate with example data, where '?' will represent a new-line character:
Example Excel File
============
Field 1 Field 2 Field 3 Field 4
-------------------------------------------------------------
415 Smith, John G troublesome?field orange
416 White, Bob M another value blue
417 Rogers, Bill J a third record green
If there is no new-line character in Field 3, normally the INPUT statement would read in all of the first obersvation ('415' through 'orange' values). However, since this Excel file may have a new-line character in the field, what happens is the INPUT statement reads in only part of the first observation ('415' through 'troublesome'). Then for the second iteration, the INPUT statement continues reading values from 'field' as if it has already reached the second line.
Is there any way I can read in entire observations whether there is a new-line character in a field or not, without having to modify the Perl code to remove the character? There has to be a way to let SAS know how many fields are in each observation and/or to ignore certain new-line characters. Any help will be very much appreciated!!! My current code is below, in case you would like to read/use it. Thanks, everyone
data file_data; infile csv_file dlm=&delimiter_cmd dsd truncover missover firstobs=2;
keep record_date_seconds record_date record_date_formatted;
input id:10. employee_name:$8. employee_id:$10. job_title:$50. location:$50.
location:$50. reporting_manager:$50. ticket_number:$15. record_date_seconds;
record_date = mdy(01,01,1900) + record_date_seconds - 2;
record_date_formatted = put(record_date, worddate18.);
run;
*****************************************************************;