Here comes a question from a RegEx newbie. I have the following HTML fragment that I am trying to fix:
There is a stray </a> tag in the fragment. I am trying to capture the entire contents of the table cell (<td>) which contains this problem. My end goal is to make this generic for any two tags. Since RegEx is coupled most closely with Perl, I decided to ask the question here.
My first idea was to use the following expression:
which almost did the trick, but it was a lucky stroke, since it would not work for tags of length greater than 1. I am currently around the lines of:
but this is not correct.
The logic for my tag is:
Capture the contents contained by (and including) the <td> element where there is no beginning <a> tag, but there is a </a> tag.
I am trying to ensure that I do not enter into another <td> element in this search. Does anyone have any ideas as to how I can do this?
Nick Ruiz
Associate Integrator
PPLSolutions IT Billing and Transactions
Code:
<table cellpadding="3" cellspacing="0" bordercolor="#CCCCCC" border="1">
<tr align="Center" bgcolor="#CCCCCC">
<td valign="top" class="tablefont" colspan="2"><b>Service Classification for 2006</b></td>
<td valign="top" class="tablefont" width="29%"><b>EDI Load Profile Code</b></td>
<tr>
<td valign="top" class="tablefont" width="31%">SC-1, SC1B</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc1std_06.xls">Standard Service</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC1, 2SC1</td></tr>
<tr>
<td valign="top" class="tablefont" width="31%">SC-1C</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc1c_06.xls">Optional Large Time of Use</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC1C, 2SC1C </td></tr>
<tr>
<td valign="top" class="tablefont" rowspan="2" width="31%">SC-2</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc2nd_06.xls">Non-Demand</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC2, 2SC2 </td></tr>
<tr>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc2dem_06.xls">Demand</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">2SC2D, 3SC2D, 1SC2D</td></tr>
<tr>
<td valign="top" class="tablefont" rowspan="4" width="31%">SC-3</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc3sec_06.xls">Secondary</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC3</td></tr>
<tr>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc3pri_06.xls">Primary</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">2SC3</td></tr>
<tr>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc3sub_06.xls">Subtransmission</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">3SC3</td></tr>
<tr>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/sc3tra_06.xls">Transmission</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">4SC3</td></tr>
<tr>
<td valign="top" class="tablefont" width="31%">Private Area Lighting</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/pal_06.xls">Private Area Lighting</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC1L</a> (xls)</td></tr>
<tr>
<td valign="top" class="tablefont" width="31%">Traffic Signals</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/traffic_06.xls">Traffic Signals</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC4L</td></tr>
<tr>
<td valign="top" class="tablefont" width="31%">Street Lighting</td>
<td valign="top" class="tablefont" width="40%"><a href="../../non_html/stlght_06.xls">Street Lighting</a> (xls)</td>
<td valign="top" class="tablefont" width="29%">1SC2L, 1SC3L, 1SC5L, 1SC6L</td></tr>
</table>
My first idea was to use the following expression:
Code:
(<td[^>]*>[^<a.*?>]*?)(</a>)
Code:
<(td)[^>]*>.*?(?!(<td[^>]*>)|(<a[^>]*>))(</a>).*?</td>
The logic for my tag is:
Capture the contents contained by (and including) the <td> element where there is no beginning <a> tag, but there is a </a> tag.
I am trying to ensure that I do not enter into another <td> element in this search. Does anyone have any ideas as to how I can do this?
Nick Ruiz
Associate Integrator
PPLSolutions IT Billing and Transactions