Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

change texts in a range using sed 3

Status
Not open for further replies.

huangwason

Programmer
Oct 10, 2006
21
DE
Hello, guys, I am thinking about a script that can change element contents of a specified tag in xml file. For example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TS>
<TS version="2.0" language="en_GB">
<context>
<name>ezono::ccapp::bup::BiosUpgrade</name>
<source>Failed to update BIOS.</source>
<translation type="unfinished">huang </translation>
</message>
</context>
<context>
<name>ezono::ccapp::bup::BiosUpgrade</name>
<source>Failed to update BIOS.</source>
<translation> here is a test
</translation>
</context>
</TS>

I will change the element contents between the tag <translation> from lower case to upper case.

I intend to use Sed

sed -e '/<translation*>/ s/\(.*\)/\U\1/' -e '/<\/translation>/,$ s/\(.*\)/\U\1/' input.xml

some problems:
1) it changes the tag to upper case as well(<TRANSLATION>), how to figure it out only for those texts between tags
2) I use range by pattern of sed here
sed '/start/,/stop/ s/regular express/replaced string/'
the "start" partern I used is "<translation*>" which can handle tag begin with <translation>, can not handle tag such as <translation type="unfinished">
3) the above script is line oriented, that means it work if <translation> and </translation> in one line, but sometime, the begin and end of a tag are not always in one line.

How to solve the problem? Is Sed really good to solve this problem?
 
A starting point:
Code:
awk -F'>' 'BEGIN{RS="<"}/^translation/{$2=">"toupper($2)}NR>1{printf "<%s",$0}' input.xml

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thank you very much PHV, it works fine. Here you use "toupper" function to change lower case to upper case, I used some ascii codes in the XML file, when change to upper case, these ascii codes should reserve, can you propose how to reserve ascii codes in this case?
 
Yes feherke, they are character entity, such like

<translation > THE MARKED PATIENT RECORD &APOS;%1&APOS; WAS NOT READABLE. </translation>

in this case, &APOS shouldn't be changed

 
Hi

Then instead of [tt]toupper()[/tt]ing the whole $2 field :
[ul]
[li][tt]split()[/tt] it up on /&/[/li]
[li]make the first piece [tt]toupper()[/tt][/li]
[li]loop over the rest of the pieces[ul]
[li]find the [tt]index()[/tt] of ";"[/li]
[li]copy the [tt]substr()[/tt] before the ";" unmodified[/li]
[li]make the [tt]substr()[/tt] after the ";" [tt]toupper()[/tt][/li][/ul][/li]
[/ul]
At least in my approach.

Feherke.
 
Code:
awk -F'>' 'BEGIN{RS="<"}
/^translation/{$2=">"toupper($2);while(match($2,/&[A-Z]*;/)){x=substr($2,RSTART,RLENGTH);gsub(x,"\\"tolower(x),$2)}}
NR>1{printf "<%s",$0}
' input.xml

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thanks your guys, I tired PHV's scripts, it works for most of my XML files, but

<message>
<location filename="../code/a_mes/ccapp_mes_qt_measurementapp.cpp" line="1824"/>
<source>Patients starting...</source>
<translation>Abriendo el &quot; Registro de Pacientes&quot; ...</translation>
</message>




 
Hi

It works for me if I change [tt][A-Z][/tt] with [tt][[:upper:]][/tt].

But I would not do it. It transforms the whole sting into uppercase, then transforms all character entities into lowercase. But character entities are case sensitive. There is a difference, for example in case of a-umlaut : &Auml; is Ä and &auml; is ä. But after the two transformations both will be lowercase.


Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top