Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Hard time writing SED or Regular Expression to isolate string

Status
Not open for further replies.

nauliv

Technical User
Feb 7, 2006
40
US
Hello Folks,

I'm trying to parse an HTML file to extract a city name, using bash scripting on Linux. The tags around that city name are always formatted the same way. Here is an example with LOS ANGELES:


<....bunch of HTML code.....>
<tr><td align="right"> Prefix</td><td><b>240</b></td></tr>
<tr><td align=right>City</td><td><b>LOS ANGELES </b></td></tr>
<tr><td align=right>State</td><td><b>California</b></td></tr>
<....bunch of HTML code.....>


Basically, I am looking into isolating the string between

[highlight]City</td><td><b>[/highlight]

and

[highlight]</b></td></tr>
<tr><td align=right>State[/highlight]

I have banged by head trying to do a 2 process job:
* remove everything before the city name
* remove everything after the city name
So I end up with what I need... but I am sooooo getting lost in the sed and RE expressions.... :-(


Any help or hints would be greatly appreciated ! :)
 
Try this:

Code:
sed -n '/City/{ s/.*<b>//; s/ *<\/b>.*//; p; }' inputfile

Annihilannic.
 
Hello Annihilannic,


Thanks so much for your fast reponse. ALMOST THERE! There's just some extra code after the city name.

Here is the result:

[highlight]LOS ANGELES
<A href="ZipCityPhone.asp?90071">90071</A>
<A href="ZipCityPhone.asp?90013">90013
<A href="ZipCityPhone.asp?90014">90014[/highlight]

You can see the source of the following URL:


My script is:

#!/bin/bash
callerid=2139743211
curl -s -m 3 -A Mozilla/4.0 | sed -n '/City/{ s/.*<b>//; s/ *<\/b>.*//; p; }'

Thanks again !
 
Just widen the "City" match so it doesn't match ZipCityPhone... i.e. include the greater than/less than signs either side of it.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top