Regular Expression HELL!!!

KristianW · Sep 9, 2004

Hi all.

I know this isn't a JS specific question, but I have posted this in the ColdFusion Forum for a while and the only response I received was advice to try this board if I was unsuccessful there. So here I am...

I'm patching a website for a company, and part of this site uses RegExs to convert hyperlinks to become relative to the calling page. However, there seems to be a flaw in the RE used, as it some times appends a URL to the end of the current url. For example after the REreplace function runs, some links are good, and others look like:

http://abc.com/here/index.cfm?\http://def.com/there.cfm

This obviously will not work. So my plan is to create another RE that searches for this occourance (as well as www, or mailto links appending as this also happens on occasions). So I have created the following code (all one line):

Code:

<cfset PageContent=REreplacenocase(PageContent,"(href="")([^\?]*)\?([^(http|www|mailto|"")]*)((http|www|mailto)[^""]+)","\1\4","ALL")>

The first section of the RE ((href="")([^\?]*)\?) should match any hyperlink up to the querystring. The ([^(http|www|mailto|"")]*) section is designed to get zero or more of ANY character, unless it is http, mailto, www, or the closing quote for the hyperlink.

The next section ((http|www|mailto)[^""]+) then searches for data starting with http, mailto, or www, and then any more info up until the closing ". The problem is that this only seems to work some of the time. For example:

http://abc.com/dir1?index.cfm/http://efg.com

gets resolved to

http://efg.com

which is great. However, the following link is not recognised (I'm assuming this, as it is not being changed)

http://abc.com/dir1?index.cfm/something/http://efg...

From what I can tell, there is nothing wrong with my RE (but then again my eyes are losing focus right now...) Shouldn't the ([^(http|www|mailto|"")]*) section return EVERYTHING other than what is in the square brackets? Ie, it should catch anything, or nothing, and then keep going, as long as it isn't http, www, mailto or the closing "

Can anyone help me with this? I just can't understand why it's not happening....

Thanks,
K.

chessbot · Sep 9, 2004

I know this isn't a JS specific question, but I have posted this in the ColdFusion Forum for a while and the only response I received was advice to try this board if I was unsuccessful there. So here I am...

Sorry to be redundant, but if you don't get anything here, you can try the Perl board...

--Chessbot

cLFlaVA · Sep 9, 2004

I suck royally at reg ex's. That's why I use this site:

http://www.regexlib.com/Search.aspx?k=url

*cLFlaVA
----------------------------
Ham and Eggs walks into a bar and asks, "Can I have a beer please?"
The bartender replies, "I'm sorry, we don't serve breakfast.

Westbury · Sep 9, 2004

Fantastic site!!! I disn't use reg ex'a much cos it took me ages to figure them out, but now i might use them a bit more

If it aint broke, redesign it!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Regular Expression HELL!!!

KristianW

Programmer

chessbot

Programmer

cLFlaVA

Programmer

Westbury

Programmer

Similar threads

Part and Inventory Search

Sponsor