Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to decode such characters for a string or file content

Status
Not open for further replies.

micotao

Programmer
Jun 16, 2004
33
0
0
CN
Hi, All:
Suppose this is the file content I have, and I need to do some "translation" to specific characters.File content or some string like this:
This is a decoding process.
We have to do this.\\1aWe need \\7b CHANGE \\7d.
The characters that need decoding are:
'\\1a' ----> '\n'
'\\7b' ----> '{'
'\\7d' ----> '}'
...
...
Because we need a common decoding logic for processing, and
But when I try to use Array and sed to finish this job,I did some trial at first, but failed. I find I can't get what I want.
Code:
set -A map_from '\\1a'
set -A map_to '\n'
echo "This is${map_from[0]}That is." | sed "s/${map_from[0]}/${map_to[0]/g"
But the result output is this:
This isnThat is

The intended output should be like this:
This is
That is
Because we want the \\1a to be translated to \n, which means a newline.

Please help with such a issue, what can I do to finish this characters decoding appropriately???
Thank you very much!!




 
A starting point:
awk '
BEGIN{
split("\\\\\\\\1a,\\\\\\\\7b,\\\\\\\\7d",from,",")
split("\\n,{,}",to,",")
n=3
}
{ for(i=1;i<=n;++i) gsub(from,to)
gsub("\\\\n","\n");print
}
' /path/to/input > ouput

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Hi, PHV:
Thanks for your valuable instruction.
I am still trying to understand your code.
But during my testing for trial of the following code:
Code:
# echo "This is\\\\\\\\1aThat is" | tr '\\1a' "\n"
This is

1aThat is

Can you tell me why that is the output result? As I want to do the '\\1a' ----> "\n" transformation, in order to get the following:
This is
That is
But instead, the 1a has always been left there.
Can you help with this strange output?
Best Thanks!
 
Hi, PHV:
This is what I done to test the code from you. I wrote a ksh script which comes from your code directly.But I got the following output error:

+ awk { split("\\\\\\\\1a,\\\\\\\\7b,\\\\\\\\7d",from,",");split("\\n,{,}",to,","); N=3 }{ for (i=1;i<=N;++i) gsub(from,to);gsub("\\\\n","\n"); print } ./inputfile > ./outputfile
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1


Can you please help to figure out what the problem is? Or did I input something wrong when coding?
Thanks very much!
 
Mico, that echo | tr works for me on debian.

Your topic inspired me to have another try on some awk scripts to decode/encode files with <from>-<to> char mappings that i began in late 2003.
It just wouldnt work, now finally does tho. It has problems mapping the '&' char back tho, no idea why.

Just later i noticed you kinda want name obfuscation or something, not byte exchanging. I shall post it tho.
I dont feel like getting rid of old commented code or something, so its on an online dir ( xmb.ath.cx/code/char_enc/ / char_enc.tgz )

enc.sh - small sh frontend
enc5.awk - the simple enc/decoder
crypto2.awk - unique char pair generator out of /dev/urandom

. Mac for productivity
.. Linux for developement
... Windows for solitaire
 
Mico, probably running it with nawk or something newer than your plain 'awk' will fix that.

My char encoder could be modded to word instead char mapping by small changes, to work like PH showed.
 
[underline]Hi, xmb:[/underline]
Thank you very much. I will take time to investigate your code and raise my comments.

Hi, PHV
 
Hi, xmb:
Thank you very much. I will take time to investigate your code and raise comments soon.

Hi, PHV:
Thank you a lot. I use nawk, and it works. Thanks xmb for reminder too.

 
Hiya mico =)

A last comment about my char encoder ( char_enc.tgz )
It works with gawk / the original awk, probably nawk, not with mawk.

[file=map_file] [awk=awk] [do_enc=1] enc.sh [files]

$ awk=gawk sh enc.sh
plz encrypt this & kthx
¸³Ü­%
éWW[tÀªW[
thanks
W[
­À

$ awk=gawk do_dec=1 sh enc.sh
enc and decrypt this in a pipe -> & <- wrong
enc and decrypt this in a pipe -> ª <- wrong
 
Hi, xmb:
Can you help with the usage about gsub( which has the same behaviour as sub) and the split?
Because I don't know how many times and how the backslash '\' will be processed?
I really feel confused by the \ backslash in the string or some file content.
Thanks very much in advance!
 
Hi, xmb:
Thanks for your help.
There is a related question about \ backslash escape impact in a string:
How can I understand these two output, the different \ numbers,but the same result:

under KSH
Code:
# printf "%s" "This is\nThat is"
This is\nThat is

# printf "%s" "This is\\nThat is"
This is\nThat is

 
You are using printf to print a string and therefore disabling interpretation of the newline sequence.
Also note that the newline sequence is removed from the normal quoting rules.

The bash man page is clearer than the ksh man page, although both use the same rule for quoting.
There are three quoting mechanisms: the escape character, single
quotes, and double quotes.

A non-quoted backslash (\) is the escape character. It preserves the
literal value of the next character that follows, with the exception
of <newline>. If a \<newline> pair appears, and the backslash is not
itself quoted, the \<newline> is treated as a line continuation (that
is, it is removed from the input stream and effectively ignored).

Enclosing characters in single quotes preserves the literal value of
each character within the quotes. A single quote may not occur
between single quotes, even when preceded by a backslash.

Enclosing characters in double quotes preserves the literal value of
all characters within the quotes, with the exception of $, `, and \.
The characters $ and ` retain their special meaning within double
quotes. The backslash retains its special meaning only when followed
by one of the following characters: $, `, ", \, or <newline>. A dou-
ble quote may be quoted within double quotes by preceding it with a
backslash.
To go back to the original question, you should consider using a sed script, e.g...
Code:
$ cat script1.sed
s/\\\\1a/\
/g
s/\\\\7b/\{/g
s/\\\\7d/\}/g

$ echo 'This is\\1aThat is' | sed -f script1.sed
This is
That is
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top