Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help with regex asp

Status
Not open for further replies.

guestAsh

Technical User
Feb 27, 2004
65
GB
Hiya,

I've created a small asp app that brings back some page details. i'm using MSXML2.ServerXMLHTTP to scrape the page and then regex to find the information I need. The problem i'm having is to find the header of the page which would be in the H1 tag.

i'm using:

Code:
.Pattern = "h1(.*)h1"

Which brings back the header of the page fine, However some older pages have a header such as the following:

Code:
 <h1><span class="redheading">foo</span><br>foo foo foo</h1>

regex can't match it and brings back nothing, i have a feeling it may be the < or the " ? any idea ?

Ta

Ash
 
would line breaks effect it?

so the actual html appears lik this:

Code:
 <h1><span class="redheading">foo</span>
<br>foo foo foo</h1>

on 2 lines, i'll keep on testing!
 
>would line breaks effect it?
Sure.

>.Pattern = "h1(.*)h1"
I don't like it. But if it serves the initial purpose, the minimum I can propose to adapt to this case is this.
[tt] .Pattern = "h1([\s\S]*)h1"[/tt]
But I cannot help proposing at least to make it non-greedy as one h1-tag might have the designer tempted to make another on a page however it is deprecated.
[tt] .Pattern = "h1([\s\S]*?)h1"[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top