Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Complex Regular Expression perhaps involving look behind?

Status
Not open for further replies.

jbailey268

Programmer
May 25, 2005
51
US
Has anyone, who is proficient in Regular Expressions, seen this one?

In a string expression that contains or may contain ordinal words like 14TH 23RD 31ST 2ND - we need to strip any/all occurences where TH, RD, ST and ND exist but only if the immediately preceeding character is a digit.
ex)
125 THUNDER AVE AND 14TH STREET.
the TH in Thunder will not be touched but the TH after 14 must be removed (and leave the 14 alone) leaving the string 125 THUNDER AVE AND 14 STREET

Someone suggested a lookbehind or lookaround construct.

I could write a user defined function with many lines of code to do this - but a Reg expression would be a better fit since I will have many iterations and Reg expressions would save time.

Any help would be greatly appreciated.
Thank you
 
A very simple pattern would be:
[tt]
\dST|\dND|\dRD|\dTH
[/tt]

don't forget to set the IgnoreCase option to true.

Using a MatchEvaluator, then simply:

[tt]return m.ToString.Substring(0, m.ToString.Length-2)[/tt]

where [tt]m[/tt] is the match

It is possible to produce a more clever pattern, but then making sense of it can be difficult. There is no need for either lookbehind or lookahead.

[vampire][bat]
 
Actually, a better MatchEvaluator:

[tt]return m.ToString.Substring(0, 1)[/tt]

as we only want the first character of the match to be returned


[vampire][bat]
 
the problem is with "124 5TH Avenue" it became 124 Avenue"
the 5 disappeared as well as the TH. Where did the 5 go - I need the 5 just not the TH.

I wrote this code - is this right?

Dim padd1 As String = "124 5TH Avenue" 'an example)
Dim maskStripped As Regex = New Regex("\dST|\dND|\dRD|\dTH")
Dim replacement As String = ""
padd1 = maskStripped.Replace(padd1, replacement)

' padd1 becomes "124 Avenue", no good.

* That is why someone suggested I research into LOOKBEHIND.
Remember we remove it after a digit but DO NOT remove the digit.
-----

a string like
125 FIRST AND BYRD STREET AT 5TH AVENUE
should become
125 FIRST AND BYRD STREET AT 5 AVENUE

I'm not just simply stripping TH or RD from the end of a string so I don't see how m.ToString.Substring(0, m.ToString.Length-2) as someone else suggested would help.
 
Try this... it's not a RegEx solution, but it should work.


Private Function Cleanse(ByVal sInput As String) As String
Dim sVals As String() = sInput.Split(" ")
Dim sRet As String = ""
For Each sVal As String In sVals
sVal = sVal.Trim
If sVal.Length >= 3 Then
Dim s2 As String = sVal.Substring(sVal.Length - 2).ToLower
If s2 = "st" Or s2 = "rd" Or s2 = "th" Or s2 = "nd" Then
If IsNumeric(sVal.Substring(0, sVal.Length - 2).Trim) Then
sVal = sVal.Substring(0, sVal.Length - 2).Trim
End If
End If
End If
sRet += sVal & " "
Next
Return sRet.Trim
End Function

Senior Software Developer
 
another thought... You may also want to handle instances of a space between the number and the 2 charater string. Like "3 rd".

If the above works, then add a simple test to the function that removes any 2 charater strings of sVal such as your "rd".

Senior Software Developer
 
This should do what you want:

Code:
Imports System.Text.RegularExpressions

'...

Dim resultString as string
resultString = Regex.Replace(originalString, _
   "(?<number>\d{1,})(?<suffix>ST|ND|RD|TH)", "${number}")
 
Actually better still:

Code:
Imports System.Text.RegularExpressions

'...

Dim resultString as string
resultString = Regex.Replace(originalString, _
   "(?i)(?<number>\d{1,})(?<suffix>ST|ND|RD|TH)", "${number}")
 
SiriusBlackOp,

I'm not sure that matching "3 RD" was ever a requirement, but this should do it:

Code:
Imports System.Text.RegularExpressions

'...

Dim resultString as string
resultString = Regex.Replace(originalString, _
   "(?i)(?<number>\d{1,}) {0,1}(?<suffix>ST|ND|RD|TH)(?=\s|$)", _
   "${number}")
 
If you do want to use a look behind assertion, yet another way is this:

Code:
Imports System.Text.RegularExpressions

'...

Dim resultString as string
resultString = Regex.Replace(originalString, _
   "(?i)(?<=\d{1}) {0,1}(ST|ND|RD|TH)(?=\s|$)", _
   "")
 
Thanks for all you help - and so fast too!
Regular Expressions are not for the faint hearted, you must be guru's.

This will work for me. Thanks again.

I'm looking at a book by Jeffrey Friedl about Regular Expressions - it's informative but not many VB .NET examples.

Great Forum.
 
My solution works perfectly and covers all suggested scenarios.

Not only did I provide the correct pattern to match, I also explained how to use it. I did not say to use RegEx.Replace, I said to use a MatchEvaluator and I provided the return string from the Evaluator - in fact I reposted with an improved return string from the Evaluator.

Your variation of my solution will not work.

So please do say that my solution does not work when you have not even tried it.

And if you get the impression that I am annoyed, then you are correct.

As I said before, you do not need to use either LookBehind or LookAhead in such a SIMPLE scenario. My solution was thoroughly tested before posting.


[vampire][bat]
 
Code:
        Dim padd1 As String = "124 5TH Avenue" 'an example)
        Dim maskStripped As Regex = New Regex("(\d)(ST|ND|RD|TH)")

        padd1 = maskStripped.Replace(padd1, "$1")
        Debug.Print(padd1)
 
earthandfire,

Sorry I misunderstood your example. As I think you have also misunderstood ours if you think it was a variation of yours. We were just using Replace, no MatchEvaluator in sight.

For those that didn't get what earthandfire was talking about either here is a fuller example:

Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
		TextBox1.Text = RemoveNumberSuffix("125 THUNDER AVE AND 14TH STREET")
	End Sub

	Private Function RemoveNumberSuffix(ByVal input As String) As String

		Return Regex.Replace(input, "\dST|\dND|\dRD|\dTH", New MatchEvaluator(AddressOf Evaluator), RegexOptions.IgnoreCase)

	End Function

	Private Function Evaluator(ByVal m As Match) As String
		Return m.ToString.Substring(0, 1)
	End Function
 
Aptitude, not at all - all the other WORKING suggestions are fine and I fully support a variety of solutions to any question posted.

There is very rarely only one valid answer to a problem.

SiriusBlackOp chose to offer a non-RegEx suggestion. I've not tested it but see no reason to assume that it doesn't work - but it is a lot more long-winded.

Your suggestions, again I've not tested them, and again see no reason to assume that they won't work. They use a more complicated pattern than the one I suggested.

WinblowsME's pattern is by far the simplest and essentially involves using two Groups, retaining the contents of the first.

My argument is with the OP only - who after NOT implementing my suggestion said that it was wrong and didn't work. He/she was also blinkered by the requirement for a LookBehind / LookeAhead solution.

As an aside, I tend not to use the basic RegEx.Replace because the vast majority of scenarios with which I work require far more complex handling and thus a MatchEvaluator. Additionally (especially if you are new to using Regular Expressions), I think that handling the replacement in an Evaluator is clearer.


[vampire][bat]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top