Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How can I do this

Status
Not open for further replies.

hmehta

IS-IT--Management
Jun 5, 2002
27
US
I have text of this format throughout the file

<audio src=&quot;../prompts/say_name_you_want_to.wav&quot;>Please say the skill code </audio>

What I need to do is create a file with mapping between .wav file and the text above so the mapping in this case would be :

say_name_you_want_to.wav Please say the skill code

Essentially wherever I say .wav file, I need to put the correspoding text whihc would be between
<audio.....> and /<audio>

How can I do this for all files under one dir?

 

Hi hmehta,

There are numerous problems with doing this in awk. The main issues are that there is often no control over whether this type of code is 100% consistent in use of upper- and lower-cases, the <audio...>/</audio> tags may be split over multiple lines, and special characters may cause confusion if they happen to occur inside text areas and strings. So, be warned that this a dubious proposition at best. Which is why I would rather do it in a language like Perl, where there are numerous HTML and XML parsing tools to draw on.

Having said that, I still would like to take a crack at it.

The following is a sketch of how I would try it if I were constrained to do this in awk. (I would much prefer to do it in Perl). Please bear in mind that I wrote this at home, without access to awk -- not even an awk manual -- so nothing is tested, and even the syntax should be double- and triple-checked. Try to extract the idea, and re-write it your own way.


/<audio src=.*wav.>/{
line=$0;

# In case <audio>...</audio> crosses many lines,
# append the lines together.
while (match(/<\/audio>/, $0) == 0)
{
getline;
line=line $0;
}

# Get the wave.
match(/audio src=.[^>]*/, line);
wave=substr(line,RSTART+12,RLENGTH-12);

# Simplify: reduce the 'line' to what is essential.
line=substr(line, RSTART+RLENGTH+1);

# Get the text.
match(/.*<\/audio>/, line);
text=substr(line,RSTART+1,RLENGTH-8);

printf(&quot;wav: %s text: %s\n&quot;,wave, text);
}


I hope this gives you the starting point to achieve your goal, and that you are able to straighten out whatever mistakes I have made in syntax, etc.

Good luck,
Grant.
 
Hi hmehta,

I am at work and had a free moment, so I quickly checked the code I suggested above in my previous post. As promised, there were some mistakes, so I made some changes and came up with the following:

% cat wavetext.awk
#!/usr/bin/awk -f
/<audio src=.*wav.>/ {
line=$0;

while ( match($0, /<\/audio>/) == 0)
{
getline;
line=line $0;
}

match(line, /audio src=.[^>]*/);
wave=substr(line,RSTART+11,RLENGTH-12);

line=substr(line, RSTART+RLENGTH+1);

match(line, /.*<\/audio>/);
text=substr(line,RSTART,RLENGTH-8);

printf(&quot;wav: %s text: %s\n&quot;,wave, text);
}


I created a simple file, to test:
% cat mywave

<audio src=&quot;../prompts/say_name_you_want_to.wav&quot;>Please say the skill code </audio>
<audio src=&quot;../mydir/something_else.wav&quot;>Jitterbug</audio>


I ran it and the result is:
wav: ../prompts/say_name_you_want_to.wav text: Please say the skill code
wav: ../mydir/something_else.wav text: Jitterbug


It passed an easy test. That's nice, but it would definitely fail tougher tests. For instance, if there were multiple <audio src=...> tags on the same line. It might have trouble if there were other tags on the same line, and so on. As long as you have a very simple file with no surprises, this might work for you, but I can't vouch for it if used on a file that presents any complications.

Good luck,
Grant.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top