Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Searching for control chars

Status
Not open for further replies.

grega

Programmer
Feb 2, 2000
932
GB
Cross posted from the Perl forum in case anyone has any ideas here.

I have a tab delimited file, 16 fields, 1,000,000 records.

I want to scan the file and report all lines which have control characters in them. When I say control characters, I basically mean all chars except [a-z][A-Z][0-9] and all the other punctuation characters available via a standard keyboard, i.e. (`¬!&quot;£$%^&*-=_+[]{};'#:mad:~,./<>?\|) etc. Probably want to ignore carriage returns also.

Hope this makes sense. I'm happy to use any tool available but I'm best with awk, sed or perl. I figure all is needs is the right regular expression.

Any advice much appreciated.

Greg.
 
Hi,
if your editor nows how to enter control characters

awk '/[^A-^Z] { printf(&quot;Contorl character at line %d\n&quot;,NR);}' filename

^A would be entered as <Ctrl-v> <ctrl-A> in vi
^Z would be entered as <Ctrl-v> <ctrl-Z> in vi

I guess you build the AWK script with all the unprintable characters ( written in C or Perl )

for ( $x = 1 ; $x < 255 ; $x ++ )
{
if ( ($x < 32 ) || (($x > 127) && ($x<160)))
{
printf(&quot;/%c/ { printf(%c Control Character %o found on line %%d\\n%c,NR); }\n&quot;,
$x,'&quot;',$x,'&quot;');
}
}

Just delete the lines for TAB and Line Feed and <Carriage return> from the script before you run it.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top