Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegEx query returning too much data 1

Status
Not open for further replies.

MadJock

Programmer
May 25, 2001
318
0
0
GB
Hi,

I have the following HTML:
Code:
<tr>
<th class="TableCaption" align="center" valign="top" bgcolor="#f2f2f2" width="20%">
										Time CET
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										CH&gt;AT
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										CH&gt;DE
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										CH&gt;FR
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										CH&gt;IT
									</th>
<th class="TableCaption" bgcolor="#e0e0e0"></th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										AT&gt;CH
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										DE&gt;CH
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										FR&gt;CH
									</th>
<th class="TableCaption" align="center" valign="bottom" bgcolor="#f2f2f2" width="10%">
										IT&gt;CH
									</th>
</tr>

On this code, I use the following RegEx:
Code:
Regex borderRegex = new Regex("<th.*>(?<data>[\\S\\s\\w\\w&gt;\\w{1}\\S\\s]*?){1}</th>", RegexOptions.Multiline);

The intent is to return a match for each table header (where there is text in the header). However, it's not quite right.

The first to 5th match are as expected. However, the 6th match includes the empty table header as well as the next one with text, i.e.:

Code:
[b]"<th class=\"TableCaption\" bgcolor=\"#e0e0e0\"></th>[/b]\n<th class=\"TableCaption\" align=\"center\" valign=\"bottom\" bgcolor=\"#f2f2f2\" width=\"10%\">\n\t\t\t\t\t\t\t\t\t\tAT&gt;CH\n\t\t\t\t\t\t\t\t\t</th>"

From my expression, I understand why the bold area is being returned but I don't want it to be!

I'd be really grateful for any suggestions.

Thanks,

Graeme

"Just beacuse you're paranoid, don't mean they're not after you
 
Approach the pattern very simply: try not to over construct it, though sometimes it is difficult. The key to it is that the text content would not contain any open angle bracket. Hence, for non-empty text between th, you can use the pattern like this.
[tt]
Regex borderRegex = new Regex([blue]@[/blue]"<th.*>(?<data>[[red]^<[/red]][red]+[/red]?)</th>");
[/tt]
Since [^<] character class automatically matches line-breaks etc - excluded from the simple dot without qualification to modify its interpretation, - in the above realization, you can as well spare the multiline option.
 
Thanks tsuji

"Just beacuse you're paranoid, don't mean they're not after you
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top