Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Audit File - Finding and listing longest character values

Status
Not open for further replies.

btmonroe

Programmer
Jun 25, 2007
5
US
I have been trying to develop a macro to help me take in a file and for each charater field, find it's formatted length, max length, and a sample of up to 5 of the longest records. I have a macro to find largest value and formated length. However, my thought is to assign a length variable for each character field, and then select the top 5 for each variable. Any help in looping through the variables and assigning it would be great.

T
 
Hi btmonroe.

There is a table called sashelp.Vcolumn

Which lists EVERY Column in EVERY dataset. so depending on how much data you have in SAS this might be something to look at.

Shenn
 
I was able to piece together a couple of macros that we use here and create exactly what I was looking for. It may be a little sloppy, but happy it works.

%macro audit_max(title,saspath,sasname,saspath2);

%let data=&saspath..&sasname;


title;footnote;run;
ods listing close;
ods html body="&saspath2\&sasname._max_audit.xls";
ods select attributes position;
proc contents data=&data position noprint out=varlist(keep=name varnum type length) ;
run;

footnote;
title;
proc sort data=varlist;by varnum;run;
data _null_;set varlist end=done;
call symput('var'||left(_n_),name);
call symput('type'||left(_n_),type);
call symput('length'||left(_n_),length);
if done then call symput('num',_n_);
run;

/*create varlist*/
%macro varlist();
%do i=1 %to #
&&var&i
%end;
%mend;
run;

/*make the log go away*/
filename mylog "&saspath2\&sasname._max_audit.log";
proc printto log=mylog;
run;


/*create indicators*/
%macro doit(i);
%do i=1 % to #
data &&var&i (keep=&&var&i ar_seqnum l_m);
set &data;
l_m=length(&&var&i);
run;

proc sort data=&&var&i;
by descending l_m;
run;

data &&var&i;
set &&var&i;
by descending l_m;
if _n_ le 5;
run;

data &&var&i ;
format Longest $100.;
set &&var&i;
longest = &&var&i;
variable="&&var&i";
type=left("&&type&i");
len_m=&&length&i;
run;



%end;
run;
%mend;

%doit()
data all;
set %varlist;
if type = '2';
run;

data _null_;
set all;
call symput('num_obs',_freq_);
run;




proc print data=all noobs label split='*';
var variable len_m l_m longest ar_seqnum;
label
variable='Field*Name'
len_m='Formatted*Field*Length'
l_m='Maximum*Field*Length'
longest='5 Longest*Char Values'
run;

run;


ods html close;
ods listing;
run;
title;footnote;run;
 
I wrote an audit macro myself a short while ago, it might suit you better, it audits all numeric and character variables and produces an HTML report. There's an option to output to PDF instead.
There's example code at the start that can be submtited directly after you've run the macro definition, which will give you an idea of what it does. It works out the columns for you, you just need to give it the dataset name.
Let me know how it goes for you...

I'm pretty proud of this macro, one of the features is that it turns off all the logging for the duration of the macro so that the only messages that appear in the log are the messages that I want to go in there, at the end the options are then returned to what they were before.

Code:
********************************************************************;
* audit.sas  - Dataset Audit Macro.                                *;
*  Macro for auditing a SAS dataset.  Produces a report in a       *;
* format suitable for HTML output.                                 *;
*                                                                  *;
* Date.       User    Comments                                     *;
* 30/08/2006  ChrisW  Initial Version.                             *;
* 27/10/2006  ChrisW  Fixed 2 issues. 1st for datasets with no     *;
*                     labels. 2nd Warning with long quoted string. *;
* 05/06/2007  ChrisW  Added extra columns, lengths of variables.   *;
*                     Added option to switch off Character or      *;
*                     Number report.                               *;
*                                                                  *;
********************************************************************;

/**********
How to Run the macro.

libname test 'C:\tempsas';

title 'JN000000 - Bodgit and Legit';

ods listing close;
ods html path='C:\'  (URL=NONE)
         file='Audit_report.htm'
         style=styles.pshtml;

%audit2(SASHELP.dmgmsg);

ods html close;
ods listing;
 ***********/


%macro audit2(indset,X=50,debug=0,charrep=Y,numrep=Y,format=htm);
/* Get current Options settings prior to changing them for the macro.    */
proc sql noprint;
    select setting
      INTO :options separated by ' '
    from dictionary.options
    where optname in('NOTES','SOURCE','SOURCE2','MLOGIC','MPRINT','SYMBOLGEN'
                     ,'DATE','NUMBER','QUOTELENMAX')
    ;
quit;

%if &debug = 0 %then 
  %do;
     options nomprint nosymbolgen nomlogic nonotes nosource nosource2 nodate nonumber noquotelenmax;
  %end;
%else
  %do;
    options mprint symbolgen mlogic notes source source2 nodate nonumber noquotelenmax;
  %end;
/* turn off proc title, represses the "Frequency Procedure" title from Proc Freq */
ods noproctitle;

data _null_;
  length today $5;
  today = put(today(),5.);
  call symput('AUDNUM',today);
run;


/* In case job has been killed partway through and restarted, drop the report */
/* tables if they exist.                                                      */
/* If this isnt done, the proc append steps will fail.                        */
%if %sysfunc(exist(ALLCHAR_MSG_&AUDNUM)) %then %do;
  proc sql;
    drop table allchar_msg_&AUDNUM;
  quit;
%end;

%if %sysfunc(exist(ALLNUM_MSG_&AUDNUM)) %then %do;
  proc sql;
    drop table allnum_msg_&AUDNUM;
  quit;
%end;

/* Set up rundate for report*/
%let rundt = %sysfunc(date(),date9.);

/* Get Dataset name and libname fields */
%let dset = %UPCASE(%scan(&INDSET,-1,'.'));
%if %UPCASE(%scan(&INDSET,1,'.')) = &DSET %then %let libname = WORK;
%else %let libname = %UPCASE(%scan(&INDSET,1,'.'));

%put *****************************************;
%put * Unique Value count threshold = &X;
%put * Library  = &LIBNAME;
%put * Dataset  = &DSET;
%put * Run Date of Report = &RUNDT;
%put * Tempfile suffix = &AUDNUM;
%put *****************************************;

%put NB: If you cancel this run before the macro completes, submit the following;
%put line to restore your options to their previous state:-;
%put options &options %STR(;);
%put ;

/*Create Formats*/
proc format;
  value $pop
    ' ' = 'Missing'
    other = 'Populated'
    ;

  value pop
    . = 'Missing'
    other = 'Populated'
    ;
quit;

%local CHARS NUMS LABEL;

/* Get list of Character Variables ... */
proc sql noprint;
  select UPCASE(name)
    INTO :nums separated by ' '
  from dictionary.columns
  where LIBNAME = UPCASE("&LIBNAME")
    and MEMNAME = UPCASE("&DSET")
    and type    = 'num'
    and name   ^= 'cwpsgroup000'
    ;

/* ... and numeric variables           */
  select UPCASE(name)
    INTO :chars separated by ' '
  from dictionary.columns
  where LIBNAME = UPCASE("&LIBNAME")
    and MEMNAME = UPCASE("&DSET")
    and type    = "char"
    and name   ^= 'cwpsgroup000'
    ;

quit;

title2 "Data Audit on &libname..&dset run on &RUNDT";
%put Analysing Character Variables...;


%if "&CHARS" ne "" %then 
%do;
  /* Analyse the Character Variables */
  %let i = 1;
  %let var = %scan(&chars,&i);
  
  /* loop through all the character variables ... */
  %do %until (&var = );
  
    %put -- &VAR;
    proc sql noprint;
      select count(distinct &VAR)
            ,n(&VAR)
            ,nmiss(&VAR)
            ,max(length(&VAR))
      into :distinct_values, :pop, :miss, :ALENGTH
      from &libname..&dset
      ;
  
      select label
            ,length
        INTO :label, :VLENGTH
      from dictionary.columns
      where LIBNAME = UPCASE("&LIBNAME")
      and MEMNAME = UPCASE("&DSET")
      and UPCASE(NAME)    = UPCASE("&VAR")
    ;

    quit;
  
    /* Write out message to the log file */
    data tmp_msg_&AUDNUM;
      length variable $32 values 8 type $10 label $50 miss pop 8;
      variable = "&VAR";
      type     = 'Character';
      Values   = &distinct_values;
      pop      = &pop;
      miss     = &miss;
      label    = "&label";
      vlength  = &VLENGTH;
      alength  = &ALENGTH;
    run;


    %if &CHARREP = Y %then
    %do;
        /* If there's <= the threshold set, do a simple freq */
        %if %eval(&distinct_values) LE %eval(&X) %then
        %do;
            proc freq data=&libname..&dset;
              table &VAR /missing;
              title3 "Character Variable: &VAR";
            run;
        %end;
        /* otherwise just check the population rate... */
        %else
        %do;
            proc freq data=&libname..&dset;
                table &VAR /missing;
                format &VAR $pop.;
                title3 "Character Variable: &VAR";
            run;
        %end;
    %end;

    /* Increment the counter variable, get next macro var to work with */
    %let i = %eval(&i + 1);
    %let VAR = %scan(&chars,&i);
  
    proc append base=allchar_msg_&AUDNUM data=tmp_msg_&AUDNUM;
    run;
  %end;
%END;
  
/* Get work copy of the dataset, add in a mock group variable */
data WORK.audit_numvars_&AUDNUM;
  set &libname..&dset;
  cwpsgroup000 = 1;
run;


%put Analysing Numeric Variables...;

%if format = pdf %then 
%do;
    goptions ftext='HELVETICA'
            ctext=BLACK
            htext=1
            htitle=2
            device=pdfc
            hsize=22cm
            vsize=20cm
            ;
%end;
%else
%do;
    goptions ftext='HELVETICA'
             ctext=BLACK
             htext=1
             htitle=2
             device=html
             hsize=15cm
             vsize=15cm
             ;
%end;
axis1 label=none value=none major=none;
symbol color = salmon;

%if "&NUMS" ne "" %then
%do;
  /* Loop through the numeric variables */
  %let i = 1;
  %let var = %scan(&nums,&i);
  
  %do %until (&var = );
    %put -- &VAR;
    proc sql noprint;
      select count(distinct &VAR)
            ,n(&VAR)
            ,nmiss(&VAR)
            ,min(&VAR)
            ,max(&VAR)
            ,median(&VAR)
      into :distinct_values, :pop, :miss, :min, :max, :median
      from WORK.audit_numvars_&AUDNUM
      ;
  
      select label
        INTO :label
      from dictionary.columns
      where LIBNAME = UPCASE("&LIBNAME")
        and MEMNAME = UPCASE("&DSET")
        and UPCASE(NAME)    = UPCASE("&VAR")
      ;
    quit;
    
      /* Write out message to the log file */
      data tmp_msg_&AUDNUM;
        length variable $32 values 8 type $10 label $50 miss pop 8;
        variable = "&VAR";
        type     = 'Numeric';
        Values   = &distinct_values;
        pop      = &pop;
        miss     = &miss;
        min      = &min;
        max      = &max;
        median   = &median;
        label    = "&label";
      run;
  
      %if  &NUMREP = Y %then
      %do;
        %if &FORMAT = pdf %then
        %do;
            ods pdf startpage=NOW;
            ODS layout start rows=2;
            ODS region; 
        %end; 
        proc boxplot data=WORK.audit_numvars_&AUDNUM;
            plot &VAR*cwpsgroup000 / cframe   = vligb 
                            boxstyle = schematic
                            cboxes   = dagr
                            cboxfill = ywh
                            haxis=axis1
                            idcolor = red
                            ;
            inset nobs='Populated Records'
                mean='Mean'
                    / header = 'Overall Stats' position=NE cfill=ywh;
  
            title3 h=2 "Numeric Variable: &VAR";
  
        run;
        quit;
        %if &FORMAT = pdf %then
        %do;
           ODS layout end;
        %end; 
      %end;
  
  
     %let i = %eval(&i + 1);
     %let VAR = %scan(&nums,&i);

     proc append base=allnum_msg_&AUDNUM data=tmp_msg_&AUDNUM;
     run;

  %end;
%END;

%if %sysfunc(exist(ALLCHAR_MSG_&AUDNUM)) %then %do;
  proc report data=allchar_msg_&AUDNUM nowd split='*';
    columns variable label type Values pop miss vlength alength;

    define variable / display 'Variable*Name';
    define label    / display 'Label';
    define type     / display 'Variable*Type';
    define Values   / display 'Number of*Distinct*Values';
    define pop      / display 'Number of*Populated*Records';
    define miss     / display 'Number of*Unpopulated*Records';
    define vlength  / display 'Length of*Variable';
    define alength  / display 'Maximum*Length of Values'; 
    title3 'Overview of Character Variables';
  run;
%end;

%if %sysfunc(exist(ALLNUM_MSG_&AUDNUM)) %then %do;
  proc report data=allnum_msg_&AUDNUM nowd split='*';
    columns variable label type Values pop miss min max median;

    define variable / display 'Variable*Name';
    define label    / display 'Label';
    define type     / display 'Variable*Type';
    define Values   / display 'Number of*Distinct*Values';
    define pop      / display 'Number of*Populated*Records';
    define miss     / display 'Number of*Unpopulated*Records';
    define min      / display 'Minimum*Value';
    define max      / display 'Maximum*Value';
    define median   / display 'Median*Value';
    title3 'Overview of Numeric Variables';
  run;
%end;

%if %sysfunc(exist(ALLCHAR_MSG_&AUDNUM)) %then %do;
  proc sql;
    drop table allchar_msg_&AUDNUM;
  quit;
%end;

%if %sysfunc(exist(ALLNUM_MSG_&AUDNUM)) %then %do;
  proc sql;
    drop table allnum_msg_&AUDNUM;
    drop table audit_numvars_&AUDNUM;
  quit;
%end;

%if %sysfunc(exist(tmp_msg_&AUDNUM)) %then %do;
  proc sql;
    drop table tmp_msg_&AUDNUM;
  quit;
%end;

%put ;
%put *****************************************;
%put Done!;
%put *****************************************;
%put ;
%put ;

options &OPTIONS;

%mend audit2;


Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Thanks, I will try and incorporate some of these concepts in mine as well.
 
Chris,
Thanks for your post and I am trying o adjust and add my audit to it. I am having a problem adding my code to it. I would either like to add my display of those longest values to the end of your program or create another html file off of your that has my information. I would like to set something up in the paramaters like mine that points the file to a specific location. We usually output many files within a program and like the idea of making the path and file name a paramater. I am not sure how you are using the path and file name as the output device, but would like to incorporate the filename and path into your paramaters.

Any help would be appreciated.

Thanks.
 
The way the reports are output is controlled outside of this macro. If you look at the example code, just before hte macro definition, you'll see an ODS statement which specifies the output destination for the reports.
Code:
*How to Run the macro. *;

libname test 'C:\tempsas';

title 'JN000000 - Bodgit and Legit';

ods listing close;
ods html path='C:\'  (URL=NONE)
         file='Audit_report.htm'
         style=styles.pshtml;

%audit2(SASHELP.dmgmsg);

ods html close;
ods listing;
This means that you can change the output destination to use alternate ods destinations, such as PDF, without having to change the macro. If you use PDF, I've added in some code there to handle some quirks of PDF, so use the switch in the macro call to tell it that you are writing out to PDF ie
%audit2(SASHELP.dmgmsg,format=pdf);

Enjoy.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top