Hiya,
I've created a small asp app that brings back some page details. i'm using MSXML2.ServerXMLHTTP to scrape the page and then regex to find the information I need. The problem i'm having is to find the header of the page which would be in the H1 tag.
i'm using:
Which brings back the header of the page fine, However some older pages have a header such as the following:
regex can't match it and brings back nothing, i have a feeling it may be the < or the " ? any idea ?
Ta
Ash
I've created a small asp app that brings back some page details. i'm using MSXML2.ServerXMLHTTP to scrape the page and then regex to find the information I need. The problem i'm having is to find the header of the page which would be in the H1 tag.
i'm using:
Code:
.Pattern = "h1(.*)h1"
Which brings back the header of the page fine, However some older pages have a header such as the following:
Code:
<h1><span class="redheading">foo</span><br>foo foo foo</h1>
regex can't match it and brings back nothing, i have a feeling it may be the < or the " ? any idea ?
Ta
Ash