Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Find part of field value 1

Status
Not open for further replies.

Kayo

Programmer
Aug 13, 2001
28
US
1. I want the frequency of a field with broken values:

Field = UIC(Unit Identification Code)
Value = W1234
Lentth = 5
Rep = Alpha

2. I need to print all UICs that are not correct format or missing characters e.g., _1234, W12_4, W_ _34 etc.

Thanks in advance.
 
Hi Kayo,

don't know if UIC is any standard - do you have a description of how this code value must look? Is it alpha + numeric*n ? Once you have that algorithm it's not so hard at all.

Cheers,
Matthias
 
Alpha/Numeric
The UIC value identifies Army units in the field. Each Army unit has its own unique UIC e.g., W1234, W4889, W9876 all UICs begin with a "W" and are 5 characters long.

 
Hi Kayo,

you could try this quick&dirty datastep:

Code:
data uic_data;
  length uic $10;
  uic = 'W1111';  output;
  uic = 'W222';   output;
  uic = 'W0123';  output;
  uic = 'W01234'; output;
  uic = 'W1_11';  output;
  uic = 'w3333';  output;
  uic = 'A4444';  output;
  uic = 'WO123';  output;
run;

data uic_ok
     uic_bad (keep = uic err_msg);
  set uic_data; /*--  contains UIC  */
  length err_msg $50
         i       8
         ;

  /*--  check leading W  */
  if substr (uic, 1, 1) eq 'w' then do
    err_msg = 'lowercase leading W'; 
    output uic_bad;
  end; else if substr (uic, 1, 1) ne 'W' then do
    err_msg = 'wrong leading character'; 
    output uic_bad;

  /*--  check length  */
  end; else if length (uic) ne 5 then do
    err_msg = 'UIC must be length 5'; 
    output uic_bad;

  end; else do

    /*--  must all be digits  */
	err_msg = '';
    do i = 2 to 5; 
      if indexc (substr (uic, i, 1), '0123456789') eq 0 then do;
	    err_msg = 'invalid numeric';
	  end;
	end;
	if err_msg ne ''
	  then output uic_bad;
	  else output uic_ok;
  end;

run;

What this doesn't check is of course if units - though right format - really exist, e.g. are W0000 and W9999 okay?

Cheers,
Matthias
 
I think that I have a method that you may like. From your question I understand that you want to know how many values have blanks in them (therefore not valid) and what their (invalid) freqs.

Ok lets see if this works.

data test(keep=uic);
set your_current_data_set;
*** UIC CAN ONLY BE 5 IN LENGTH ***;
uic = trim(uic); *THIS IS JUST FOR XTRA WHITESPACE;
pos = index(uic,' ');
if 0 lt pos lt 6 then
output;
run;
*** NOW DO THE PROC FREQ ***;
proc freq data = test;
tables uic;
run;
*** THIS CODE WILL ONLY REPORT THE UIC THAT HAVE MISSING VALUES. THIS ALSO ASSUMES THAT YOU MEAN MISSING AS WHITESPACE (BLANK). IF YOU HAVE UNDERSCORES IN THE MISSING PLACE JUST SUBSTITUTE THAT IN THE INDEX FUNCTION. YOUR VALIDATION ALGORITHMS CAN BE INSERTED INTO THE TEST DATASET TOO (THESE ARE THE ALGORITHMS THAT MAKE SURE THAT THE UIC MATCHES THE PROTOCOL THAT WAS SETUP. IE. MUST START WITH A CAPITAL 'W' CONTINUE WITH A NUMBER ETC....);

I hope this example helps.
klaz2002
 
Sorry, it has taken me so long to get back w/you folks. Thanks for all the help. I tried all the examples. Klaz2002, your code produced exactly what I was looking for.

Again, thanks to you all.

V/r,

Kayo{/b}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top