Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching exactly zero occurrences using Regular expression ...

Status
Not open for further replies.

Rayz66

Programmer
Sep 3, 2002
30
GB
Could someone help me out with a strange regular expression I'm trying to write.

I'm trying to scan through a HTML page, and convert image tags into links. The problem is that the page maybe scanned more than once, so I don't want to convert image tags that have been already done in a previous run.

So I reckon that I want match an image tag, but only if it IS NOT enclosed by a
Code:
<a href/>
tag.

Code:
<img src=&quot;my.jpg&quot;/>

should match, but

Code:
<a href=&quot;xxxxx&quot;><img src=&quot;my.jpg&quot;/></a>

Should not match

Any ideas how I go about this?
 
If the href tag is always on the same line as the img tag, while reading the file using the BufferedReader.readLine() method, you could do something like

Code:
//where &quot;line&quot; is your String from readLine()
int href = line.indexOf(&quot;href&quot;);
int img = line.indexOf(&quot;img&quot;);
if (href == -1 && img != -1) {
  // Got an img but no href
}

If they are on separate lines you could do something similar but have two flags for &quot;got href&quot; and &quot;got img&quot;, and act accordingly.

Obviously this will fall over if anything on the line contains the names &quot;img&quot; or href&quot; which are not tags.
Well, something to work with anyway ...
 
if you're worried about it catching an &quot;img&quot; or &quot;href&quot; that's not in tags you could take it one step further to search for:

Code:
int href = line.indexOf(&quot;<href&quot;);
int img = line.indexOf(&quot;<img&quot;);

granted that doesn't guarantee it will be a tag, but you can be pretty sure it will be most of the time

-kaht
 
Yep, I think this would easier than using regex.

Thanks chaps!
 
boy do I feel silly.....

Code:
int href = line.indexOf(&quot;<href&quot;);

should be

Code:
int href = line.indexOf(&quot;<a href&quot;);

but I'm sure you figured that out by now....

-kaht
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top