Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Is this a Unix problem? 1

Status
Not open for further replies.

PetrOtahal

Technical User
Dec 3, 2007
12
AU
Having a problem with the data step.

The following code works on my windows machine running SAS 9.1:

data CleanM;
set Clean;
if pathol ne "Burnie" then delete;
run;

But will not work on a Unix machine running SAS 9.1.

The only difference is that the windows machine has a random subsample of the data on the Unix machine (due to the large size of the data file).

The pathol variable is a character variable of length 13 and contains missing values. I can get the following to work on the both machines:

data CleanM;
set Clean;
if pathol ne "" then delete;
run;

This code gives me all the observations where pathol has a missing value.

I have done a proc freq on the Unix machine and there are over 100 000 observations (out of 600 000) where pathol = "Burnie".

Any help greatly appreciated.

Petr

 
Hi Petr, I've certainly not come across this before. Have you tried trim and upcase just incase SAS is being a little particular?!

Code:
data CleanM; 
    set Clean; 
    if trim(pathol) ne "Burnie" then delete;
run;

or

Code:
data CleanM; 
    set Clean; 
    if upcase(pathol) ne "Burnie" then delete;
run;

or both

Code:
data CleanM; 
    set Clean; 
    if trim(upcase(pathol)) ne "Burnie" then delete;
run;

Cheers

Nick
 
If I remember correctly unix handles trailing whitespace different than windows. Use the TRIM() function and you should see the results that you want.

FYI,
Even on a Windows OS you should always use the trim() function if you're using that value as a match.

Klaz
 
Nick and Klaz,

Thanks for the prompt replies.

Unfortunately neither the trim nor upcase functions worked to solve the problem.

I have found another way using a different variable (numeric) to do the subsetting.

Thanks again.
Petr
 
Another method that might help is this (and it's a nifty little trick if you've never seen it before).
Code:
data CleanM;
    set Clean;
    if upcase(pathol) ne: "BURNIE" then delete;
run;

The colon translates as "starts with", so it doesn't matter if there's white space or other words after BURNIE, it'll still get picked.

I've been hit with some funny issues before like this when trying to drop records with no data in a certain field. The usual "IS NULL" and "= ''" and "trim(field) = ' '" didn't work.
It turned out that there were ASCII NULL characters in the field (ASCII value 00). Version 8 of SAS didn't use them, so the code worked fine, but when we upgraded to version 9 we started getting the problems because they were now recognised as characters. It took me ages to track the problem down. I ended up converting the data field to HEX numbers ro view the contents before I found the issue. The characters didn't show up in any other way. This was also working on UNIX, so maybe you've got some special characters in the field that you can't see.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Thanks ChrisW75,

the colon solved the problem, don't need the upcase() command.

The following works.
Code:
data CleanM;
    set Clean;
    if pathol ne: "Burnie" then delete;
run;

Thanks again,
Petr
 
Hmmmm. IT seems then that there is something after the word "Burnie" that isn't getting removed by the TRIM function, but isn't displayed.
Try this:-
Code:
  hex_pathol = put(pathol,HEX20.);
Every character gets turned into a 2 digit number. You can look up the values here:-
I reckon you might see something like this for Burnie...
4275726E696500
Everything up to the 00 makes up Burnie, the 00 would be your problem. Another one you might see there is 09 or 0A or 0D (Tab, Line Feed or Carriage Return).

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Here is what I see:

4275726E69650D2020

so it looks like there is (null carriage return space space) after "Burnie".


 
Nice.
Well, at least we have an explanation of why it doesn't work on Unix then. I would suggest working out a way of stripping those characters from the field so that you know what you're working with further down the line. Messy data like this can cause issues with joins and filters (as you've seen).
The original data source would ideally be where the data should be cleansed. I sometimes see this sort of thing in databases where data has been entered manually and it has trapped extra information when the user has hit enter, rather than just moving on to the next field.
To clean the field in SAS you can use the ASCII format in the same manner as the HEX format above to convert 0D to it's ascii value, then use compress to strip it out of the field.
something like this I think.
Code:
  cr = put('0D'x,ascii2.);
  clean_pathol = trim(compress(pathol,cr));
[code]
That should give you a version of the field without the extra rubbish at the end.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
The following also works to remove the undesirable characters:
Code:
 clean_pathol=compress(pathol,,'s');

 
That's a new one to me. Cheers.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top