Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Trying to read a file 2

Status
Not open for further replies.

frangac

Technical User
Feb 8, 2004
163
ZA
Hi All, PHV

I am stuck with the following or if there is a better solution please post.

What I am trying to achive is to read a bin file with "od"

0000000 4837 7470 722b 3a30 3964 7331 3753 4654
0000020 3030 3030 6161 6161 6161 6161 6161 6161
0000040 6161 6161 6161 6161 6161 3781 0000 0000
0000060 0123 0108 92aa aa20 2020 2020 2020 2020
0000100 2020 2020 2020 2020 0604 1513 5516 0123
0000120 2621 96aa aaaa aaaa aaaa aaaa aaaa aaaa
0000140 aaaa aa00 0004 2020 2053 5350 0000 26ac
0000160 0000 0037 8100 0000 0001 2301 0892 aaaa
0000200 2020 2020 2020 2020 2020 2020 2020 2020
0000220 2006 0415 1355 4201 2326 2196 aaaa aaaa
0000240 aaaa aaaa aaaa aaaa aaaa aaaa 0000 0420
0000260 2020 5353 5000 0026 ac00 0000 5476 7470
0000300 722b 3a30 3964 7331 3753 4654 3030 3030
0000320 3030 3032 3030 3030 3030 3030 3030 3030
0000340 3030 3030 3030 3030 3230 3030 3030 3030
0000360 3030 3030 3030 3030 3030 3030 3032 3030
0000400 3030 3030 3030 3030 3030 3030 3030 3030
0000420 3030 3230 3030 3030 3030 3030 3030 3030
0000440 3030 3030

Result
=======
H 7 tpr+:09 ds17 VAL 0000
aaaaaaaa aaaaaa aaaa aaaa
7 80 xxxxxxxxxxaaaa 20060415135516 0123262196aaaaaaaa 000004 007599 S SP 000026 ac000000
7 80 0218552997aaaa 20060415214645 0860007249aaaaaaaa 000066 009703 S SP 000026 ac000000
T v tpr+:09 df17 VAL


Script

awk 'BEGIN{

HEADER=0;
RECORDS=-1;
TRAILER=0;
String1="";
cmd = "od -x FILENAME";

while (cmd | getline >0)
for (i=1; i<=NF; ++i)
if(length($i)>4)
{
continue
}
else
{
if (HEADER < 21)
{
HEADER=HEADER+1;
#cmd1=hex2ascii(substr($i,1,2)) " | cut -f2 -d\" \""
#cmd2=hex2ascii(substr($i,3,2))
#cmd1 | getline b ; close(cmd1);
#print b

}
else if (RECORDS<38)
#else if (RECORDS<24)
{
RECORDS=RECORDS+1;
#if (RECORDS < 23 )
if (RECORDS < 37 )
{
#RECORDS=RECORDS+1;
printf $i
}
else
{
if(substr($i,1,2)==37)
{
print ""
printf $i
RECORDS=0;
}
else
{
TRAILER=1;
RECORDS=38;
#RECORDS=25;
print""
print $i
#exit 99;
}
}
}
else
{
print $i
}
}
close (cmd)
}


###################################################
function hex2ascii(hex_string)
###################################################

{

#Convert any size hex string to 8-bit ASCII

result_ascii = "";
module = length(hex_string) % 2
if ( module==1 )
strcol="0"hex_string

strcol = length(hex_string) - 1;

while (strcol > 0)

{

#Capture binary 8-bit chunks starting from right to left
chunk = substr(hex_string, strcol, 2);
if ( chunk < 20 ){
print hex_string" ""is not a valid Hexa value to Convert to ASCII or is not printable\n"
exit 1
}
#Convert hex to decimal
dec = hex2uint(chunk);

#Convert decimal to ASCII character
digit = sprintf("%c", dec);

#Concatenate and resume looping
result_ascii = digit result_ascii;

strcol = strcol - 2;

}

return(result_ascii);
}


###################################################
function hex2uint(hex_string){
###################################################

# Convert hex string to unsigned integer

result = 0;
power = 0;
MSD = substr (hex_string, 1, 1);

for (strpos = length(hex_string); strpos > 0; strpos--)
{
digit = substr(hex_string, strpos, 1);
if (match(digit, /[a-fA-F]/))
{
gsub(/[aA]/, "10", digit);
gsub(/[bB]/, "11", digit);
gsub(/[cC]/, "12", digit);
gsub(/[dD]/, "13", digit);
gsub(/[eE]/, "14", digit);
gsub(/[fF]/, "15", digit);
}
result = result + digit*(16**power);
power++;
}
return (result);
}'

My script seems to be going nowhere

Please help
Many Thanks
Chris
 
Hi All,

Any help on this.

Many Thanks
Chris
 
Do you want to get the readable text out of a binary file? If so, try the command:

strings /path/to/binfile


HTH,

p5wizard
 
Hi p5wizard,

Thanks, but no to your answer. The result that I need to achieve as indicated above meaning that I have to convert certain bits in ASCII and certain bits leave as is.
Please see result above. Can you assist?


Many Thanks
Chris
 
Hi All,PHV

If my explanation is not clear enough , please let me know

Thanks
Chris
 
In fact, what do you want to do ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Hi PHV,

Thanks for responding. I know that you can help.

Here goes:

What I am trying to achieve is firstly read a bin file with "od"

0000000 4837 7470 722b 3a30 3964 7331 3753 4654
0000020 3030 3030 6161 6161 6161 6161 6161 6161
0000040 6161 6161 6161 6161 6161
3781 0000 0000
0000060 0123 0108 92aa aa20 2020 2020 2020 2020
0000100 2020 2020 2020 2020 0604 1513 5516 0123
0000120 2621 96aa aaaa aaaa aaaa aaaa aaaa aaaa
0000140 aaaa aa00 0004 2020 2053 5350 0000 26ac
0000160 0000 00
37 8100 0000 0001 2301 0892 aaaa
0000200 2020 2020 2020 2020 2020 2020 2020 2020
0000220 2006 0415 1355 4201 2326 2196 aaaa aaaa
0000240 aaaa aaaa aaaa aaaa aaaa aaaa 0000 0420
0000260 2020 5353 5000 0026 ac00 0000
5476[/b] 7470
0000300 722b 3a30 3964 7331 3753 4654 3030 3030
0000320 3030 3032 3030 3030 3030 3030 3030 3030
0000340 3030 3030 3030 3030 3230 3030 3030 3030
0000360 3030 3030 3030 3030 3030 3030 3032 3030
0000400 3030 3030 3030 3030 3030 3030 3030 3030
0000420 3030 3230 3030 3030 3030 3030 3030 3030
0000440 3030 3030



Then
Start reading the fields or bytes/fields starting from 4837 7470........ for the first 21 fields , ignoring the first
columns i.e. 0000000,0000020,0000040 .......
First 21 fields equals the HEADER which should result like

H 7 tpr+:09 ds17 VAL 0000
aaaaaaaa aaaaaa aaaa aaaa

4837 converted in ASCII
H=48
7=37

7470
74=t
70=p

722b
72=r
2b=+ ...etc

Then once the header is done read the following starting from 3781 or 0037 81 for the next 37 fields
Result
=======

3781
37=7
81=81 (no conversion)
0000= do nothing
0000= do nothing
0123= 0123
0108= 0108
92aa= 92aa
aa20 2020 2020 2020 2020 = these are spaces after converting to ASCII until
0604= 200604 (year , month)
1513= 1513 (hour min)
etc

7 80 0123010892aaaa 20060415135516 0123262196aaaaaaaa 000004 007599 S SP 000026 ac000000

the next record starts at the 37th field either by 0037 8100 where 00 belongs to the previous record.
7 80 0218552997aaaa 20060415214645 0860007249aaaaaaaa 000066 009703 S SP 000026 ac000000

And once all the 3780 are finished and the next byte starts with 5476 , when converted
to ASCII = "T" which is the trailer, read till end of file and result should look like this

Result
=======
T v tpr+:09 df17 VAL

FINAL RESULT


H 7 tpr+:09 ds17 VAL 0000
aaaaaaaa aaaaaa aaaa aaaa
7 80 0123010892aaaa 20060415135516 0123262196aaaaaaaa 000004 007599 S SP 000026 ac000000
7 80 0218552997aaaa 20060415214645 0860007249aaaaaaaa 000066 009703 S SP 000026 ac000000
T v tpr+:09 df17 VAL


I hope you understand. Looking forward to any questions that you may have.

Many Thanks
Chris

 
IMO you are making things harder for yourself by converting a binary file to a hex dump and then trying to process the dump. Why not just read it into a struct in C and process it that way, especially since the data lengths seem to be fixed?

Annihilannic.
 
Nobody at your location can write a specific C program ?
awk is not well suited for binary file handling ...
PS: Sorry, I don't understand the rules transforming input to final result :~/

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Hi Annihilannic,

I know that this can be done with C or even perl, but I would like to learn how awk can do it. For me this is the best way of learning.

Thanks Again
Chris
 
Hi PHV,

Thanks Again,

Would you like to give it a go with awk and if so please reply !!! I will then try to explain to you the Final Result in a different method or just leave awk and attempt it in C.

Thanks
Chris
 
Perhaps a framework like this would make it simpler?

Code:
awk '
BEGIN { i=0 }
{
        for (j=2; j<=NF; j++) {
                a[i++]=substr($j,1,2)
                a[i++]=substr($j,3,2)
        }
}
END {
        for (k=0; k<i; k++) {
                if (a[k] == "48" && a[k+1] == "37") {
                        # process header and increment k
                }
                if (a[k] == "37" && a[k+1] == "81") {
                        # process record and increment k
                }
                # etc
        }
}
'

Annihilannic.
 
Hi Annihilannic,

Thanks. Where would I place the while (cmd | getline >0) , before BEGIN or after END.

Thanks Again
Chris
 
You don't need to, just place the input filename after the awk script, or pipe it in via standard input. The file will be read automatically and each line processed by the piece of code between the plain set of { } braces. For example:

Code:
od -x FILENAME | awk ' ... '

You should only ever have to use getline to skip lines of input or to read from another command or file.

Annihilannic.
 
Hi Annihilannic,

Thanks for your framework. I am progressing but I have noticed that the for loop if (a[k] == "37"
&& a[k+1] == "81")
is a bit wild meaning that within the record a byte can consist of "37" and
next byte "80" but that does not mean that we should start the next record. What I was thinking is
that with the for loop can we not add if "37" is found and the next byte is "80" count 46 bytes
which will be the first record and then if the 47 byte is == "37" and the following byte is =="80"
count 46 bytes ....etc until we read the tail of the file which "5476"


for example


awk '
BEGIN { i=0 }
{
for (j=2; j<=NF; j++) {
a[i++]=substr($j,1,2)
a[i++]=substr($j,3,2)
}
}
END {
for (k=0; k<i; k++) {
if (a[k] == "48" && a[k+1] == "37") {
# process header and increment k
}
if (a[k] == "37" && a[k+1] == "81") {
count the next 46 bytes, regardless of the a[k] == "48" && a[k+1] == "37"
and then ask
if the 47 byte (a[k] == "37" && a[k+1] == "81")
count the next 46 bytes and so on..........
# process record and increment k
}
# etc
}
}
'

Thanks once again
Chris
 
Like I said in the comments... "process record and increment k"... so if you expect that section to be 46 bytes long, you would add 46 to k. Then the next time it arrives at the if statement it is already pointing at where it *expects* the next record to be.

You may need to increment k by (recordlength - 1) because the for loop will automatically increment k by 1. Or you could take out the k++ and handle all of the increments of k explicitly.

Annihilannic.
 
Incidentally, are you aware that on a little-endian architecture like the PC, od -x will dump the data with the least significant byte first? e.g. if you echo 12 | od -x the first 16-bit integer printed will be 0x3231 rather than the 0x3132 that you might expect? You may be better off using od -t x1.

On Linux on a PC:

[tt]$ echo 12 | od -x
0000000 3231 000a
0000003
$ echo 12 | od -t x1
0000000 31 32 0a
0000003
$[/tt]

On Solaris/SPARC:

[tt]$ echo 12 | od -x
0000000 3132 0a00
0000003
$ echo 12 | od -t x1
0000000 31 32 0a
0000003
$[/tt]

Annihilannic.
 
Something like:

Code:
awk '
BEGIN { i=0 }
{
        for (j=2; j<=NF; j++) {
                a[i++]=substr($j,1,2)
                a[i++]=substr($j,3,2)
        }
}
END {
        for (k=0; k<i; k++) {
                print k,a[k]
                if (a[k] == "48" && a[k+1] == "37") {
                        # process header and increment k
                        print "found a header"
                        k+=41
                }
                if (a[k] == "37" && a[k+1] == "81") {
                        # process record and increment k
                        print "found a record"
                        k+=72
                }
                if (a[k] == "54" && a[k+1] == "76") {
                        # process trailer and increment k
                        print "found a trailer"
                        k+=103
                }
        }
}
' inputfile

Annihilannic.
 



Hi Annihilannic,

Thanks for all your responces
Below is a problem which I was refering to about, even if I increment k=46





48618 ---------> here we should add 46 which = 48664
37 BYTE
48664 BODY ---------> 48664 (good)
7 81 0123588230aaaa 20060524095128 3780226aaaaaaaaaaa 00000001 000044 S 00 7aca4e07 V A 1 31
48665 K
81 BYTE ------> here is the problem, its reading it all over again
48666 K
01 BYTE
48667 KK
23 BYTE
48668 K
58 BYTE
48669 K
82 BYTE
48670 K
30 BYTE
48671 K
aa BYTE
48672 K
aa BYTE
48673 K
20 BYTE
48674 K
06 BYTE
48675 K
05 BYTE
48676 K
24 BYTE
48677 K
09 BYTE
48678 K
51 BYTE
48679 K
28 BYTE
48680 K
37 BYTE -------> this belongs to the currect record so : with the if statement it thinks its a start of another record.
48726 BODY
01 is not a valid Hexa value to Convert to ASCII or is not printable


What am I doing wrong

Many Thanks
Chris
 
There must be a bug in your code... but... without seeing your code there's no way we can identify the bug!

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top