Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need to grep for null Characters in an acsii file.

Status
Not open for further replies.

galger

MIS
Jan 16, 2002
79
US
Need to grep for null Characters in an acsii file.

I have a file with null characters before a bit of text im am greping for.. It is not showing up in my results.

the Null characters are hiding whats in front of them.. How can I grep for them.

Line with Null characters ============================
0000000 D S A 0 0 0 9 2 6 L 7 8 0 2 3 1
0000020 6 2 N N N N
0000040 N N L N N N N N N N 0
0000060 0 0 0 1 6 0 0 1 6 0 0 1 6 0 0 0
0000100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000120 0 0 0 0 0 0 0 0 \0 \0 \0 \0 \0 \0 \0
0000140 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000160 \0 \0 \0 D U P M E R I C A E U
0000200 R O P E A N S U C T N F U
0000220 N D D R A S S A ( A U T
0000240 O ) ( X F X X H O X X )
0000260
*
0000340 N Y
0000360
*
0000760 X \n
0000765
======================================================

Given two equally predictive theories, choose the simpler.
 

If you pass the input through:

cat file | tr -d '\000' | yourgrep

then the nulls will be eliminated. the nulls are confusing grep because of string termination.

gene

 
Would if wanted to search for the Null characters.?

Given two equally predictive theories, choose the simpler.
 
cat file | tr -d '\000' | yourgrep -- this did not work.. fyi

Given two equally predictive theories, choose the simpler.
 
I guess I should add -- when i vi this file.. I do not see the \0 \0 in the file--

II only see them when I perform od -c on the file.

Given two equally predictive theories, choose the simpler.
 
> Need to grep for null Characters in an acsii file.
Which means it isn't an ascii file at all.
It's just a file which happens to contain bits of text in some places.

It looks like the data file for some program. Does that program have a "print" function to display it's own database?

Most tools aimed at text files (like grep) get very confused by non-printable characters, and especially by '\0' which is often used to mark the end of a string.

It's only low-level file functions like 'od' which allow you to see exactly what the file contains.

You could try using the 'strings' utility to extract just the printable text.
Eg.
[tt]strings -a myfile | grep whatever[/tt]

--
 
[highlight #FF99FF]>Need to grep for null Characters in an acsii file.
Which means it isn't an ascii file at all.
[/highlight]

Erm, this is not so, null IS an ASCII character - it is ASCII 0
 
Hi

taupirho said:
Erm, this is not so, null IS an ASCII character - it is ASCII 0
Then the ELF executables are also ASCII files ? Just because any file is composed by bytes and ASCII has a character for each value storable on a byte ?

galger, probably this will not help but you can force [tt]grep[/tt] to search in binary files too. At least with my GNU [tt]grep[/tt] :
man grep said:
-a, --text
Process a binary file as if it were text; this is
equivalent to the --binary-files=text option.

Feherke.
 
Actually you are wrong again - regular ASCII DOES NOT have a character for each value storable on a byte. It only has a character for byte values 0 (null) to 127 (Delete/Rub-out, cross hatch box). I guess ELF files have non ASCII values in them also - that's what makes them binary files. They use ALL 8 bits of a byte not just the seven bits like ASCII does.
 
Probably a better term to describe the files that you would normally search using grep is "ASCII text".

I DOS days I recall referring to high or extended ASCII for the bytes from 128 to 255 although they seem to have fallen out of use.

Annihilannic.
 
Hi

Oops, no coffee this morning. [morning] You are right taupirho, I actually thought to extended ASCII. ( In my early PC days in each documentation I read the word ASCII refered to the extended ASCII, and as you can see, sometimes I mix up the terms even now. )

Although that, I still belive that a file containing other control characters then CR, LF and TAB is binary.

Feherke.
 
In fact, in UNIX text files, IMHO it is only LF and TAB...

You need to grep for e.g. "\0\0\0 DUPM" in the file as in original post?

use tr to translate the \0 chars into some readable character, but not used in your file, then grep for translated string

tr '\0' '@' </path/to/your/file | grep "@@@ DUPM"

or (but watch out for the UUoC police, they regularly patrol this website)

cat /path/to/your/file | tr '\0' '@' | grep "@@@ DUPM"

so you can use the tr+grep filter in another way:

your_process_that_produces_the_output | tr '\0' '@' | grep "@@@ DUPM"


HTH,

p5wizard
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top