Hard time writing SED or Regular Expression to isolate string

nauliv · Oct 14, 2008

Hello Folks,

I'm trying to parse an HTML file to extract a city name, using bash scripting on Linux. The tags around that city name are always formatted the same way. Here is an example with LOS ANGELES:

<....bunch of HTML code.....>
<tr><td align="right"> Prefix</td><td>240</td></tr>
<tr><td align=right>City</td><td>LOS ANGELES </td></tr>
<tr><td align=right>State</td><td>California</td></tr>
<....bunch of HTML code.....>

Basically, I am looking into isolating the string between

[highlight]City</td><td>[/highlight]

and

[highlight]</td></tr>
<tr><td align=right>State[/highlight]

I have banged by head trying to do a 2 process job:
* remove everything before the city name
* remove everything after the city name
So I end up with what I need... but I am sooooo getting lost in the sed and RE expressions.... :-(

Any help or hints would be greatly appreciated !

Annihilannic · Oct 14, 2008

Try this:

Code:

sed -n '/City/{ s/.*<b>//; s/ *<\/b>.*//; p; }' inputfile

Annihilannic.

nauliv · Oct 14, 2008

Hello Annihilannic,

Thanks so much for your fast reponse. ALMOST THERE! There's just some extra code after the city name.

Here is the result:

[highlight]LOS ANGELES
<A href="ZipCityPhone.asp?90071">90071</A>
<A href="ZipCityPhone.asp?90013">90013
<A href="ZipCityPhone.asp?90014">90014[/highlight]

You can see the source of the following URL:

http://www.melissadata.com/lookups/phonelocation.asp?number=2139743211

My script is:

#!/bin/bash
callerid=2139743211
curl -s -m 3 -A Mozilla/4.0

http://www.melissadata.com/lookups/phonelocation.asp?number=${callerid}

| sed -n '/City/{ s/.*//; s/ *<\/b>.*//; p; }'

Thanks again !

Annihilannic · Oct 14, 2008

Just widen the "City" match so it doesn't match ZipCityPhone... i.e. include the greater than/less than signs either side of it.

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Hard time writing SED or Regular Expression to isolate string

nauliv

Technical User

Annihilannic

MIS

nauliv

Technical User

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor