Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular expression (or otherwise) to NOT count "if" as "horrified 5

Status
Not open for further replies.

RebLazer

Programmer
Jun 7, 2002
438
0
0
US
I am using the following function to count occurrences of a given word or phrase in a larger string:[tt]

Function occurrences(ByVal searchString As String, ByVal fullString As String) As Integer
Dim upperSearchString As String = searchString.ToUpper()
Dim upperFullString As String = fullString.ToUpper()
Dim iStart As Integer = 0
Dim iCount As Integer = 0
Dim iStrLen As Integer = upperFullString.Length
Dim iPos As Integer

While iStart <= iStrLen
iPos = upperFullString.IndexOf(upperSearchString, iStart)
If (iPos > -1) Then
iCount = iCount + 1
iStart = iPos + upperSearchString.Length
Else
Exit While
End If
End While
Return iCount
End Function[/tt]

I want it to count words no matter where they appear in a sentence (first word, mid-sentence, or last word). And it is doing that well.

But it is also (currently) counting words within a word. For example, if I am searching for the word &quot;if&quot; - it will count &quot;horrified&quot; as an occurrence of the word &quot;if&quot;!

Is there some sort of modification that can be made to the above function to avoid it counting words like &quot;if/horrified&quot; in error? Would the use of a regular expression be appropriate here in addition to my function?


Thank you very much!
Lazer
 
Try this. You will see the word &quot;is&quot; appears 3 times, but the word &quot;isaac&quot; also appears. It only counts &quot;is&quot;.

Code:
Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
        MessageBox.Show(FindWords(&quot;Hello, my name is isaac, yes it is.  It is.&quot;, &quot;is&quot;))
    End Sub

    Public Function FindWords(ByVal SearchString As String, ByVal Word As String)
        'Integer for number of occurences found
        Dim numOccurences As Integer
        numOccurences = 0

        'Get rid of commas
        SearchString = SearchString.Replace(&quot;,&quot;, &quot;&quot;)
        SearchString = SearchString.Replace(&quot;.&quot;, &quot;&quot;)

        'Assign array elements to each word in the string
        Dim StringArray() As String
        StringArray = SearchString.Split(&quot; &quot;)

        Dim i As Integer
        Do Until i = StringArray.Length()
            If StringArray(i) = Word Then
                numOccurences = numOccurences + 1
            End If
            i = i + 1
        Loop

        FindWords = numOccurences
    End Function
 
Or with a regular expression:
------------------------------------------------
Imports System.Text.RegularExpressions
Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
MessageBox.Show(CountSubStr(&quot;Hello, my name is isaac, yes it is. It is.&quot;, &quot;is&quot;).ToString)
End Sub

Private Function CountSubStr(ByVal InputText As String, ByVal CountStr As String) As Integer
Dim MyRegEx As New Regex(&quot; &quot; & CountStr & &quot;\W&quot;)
Dim Mc As MatchCollection = MyRegEx.Matches(InputText)
CountSubStr = Mc.Count
End Function
-------------------------------------------------

Sunaj
'The gap between theory and practice is not as wide in theory as it is in practice'
 
RiverGuy and Sunaj,

Thanks very much to both of you for your help. I really like the elegance of that RegEx, Sunaj.

But... neither work exactly as I need them to.

I used this string:
IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! IS

with the search string: &quot;IS&quot;

The RegEx counted 4 and the function counted 5. It should be 6...

Also, I need this to be able to count words or phrases. When I searched that same string for &quot;IS YOUR&quot; (which appears once), both the RegEx and the function counted 0.


Thank you both so much!
Lazer
 
...my original function (posted at the top of this thread) counts 8 for &quot;IS&quot; and 1 for &quot;IS YOUR&quot;. So it counts phrases properly - but also counts the IS's in Chris and Isaac...
 
Oh....the reason why mine counted one too few...is that it would have had to have a leading space...I think. See what happends when you pad every string with a leading space.
 
RiveryGuy, I tried padding the beginning of the string with a space (i.e. &quot; IS YOUR...&quot;), but it didn't help.

Lazer
 
Did you try padding Sunaj's too? A work around would be to pad it with a word at the beginning that you know would never be searched, such as pnbtgyt, or something like that. I know there has to be a fix. I'll see if I can find something.
 
Sunaj,

I've been trying to figure out why your RegEx isn't exactly working. I slightly modified it like this:[tt]
Dim MyRegEx As New Regex(&quot;(\s|^)&quot; & CountStr & &quot;\W|$&quot;)[/tt]

For the string
&quot;IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! IS&quot;
it now returns the proper count of 6 - i.e. it counts the first and last &quot;IS&quot;'s, too.

That is good.

But there is still a problem with my version of the RegEx. It only counts 2 consecutive occurrances of &quot;IS&quot; one time. So for the following string:
&quot;IS IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! IS&quot;
it still counts 6!

But if you add one more &quot;IS&quot;:
&quot;IS IS IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! IS&quot;
then it counts 7!!!

...And so it goes - the counter only goes up for every odd-numbered occurrance of &quot;IS&quot;'s...

This is very strange!

Can someone explain this?

- Deb

 
In case this helps expediate solving the problem, here is my little test application I've been using (below is entire contents of the .vb file). The RegEx's are in red, bold font.

Lazer[tt]



Imports System.Text.RegularExpressions
Public Class Form1

Inherits System.Windows.Forms.Form

#Region &quot; Windows Form Designer generated code &quot;

Public Sub New()
MyBase.New()

'This call is required by the Windows Form Designer.
InitializeComponent()

'Add any initialization after the InitializeComponent() call

End Sub

'Form overrides dispose to clean up the component list.
Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
If disposing Then
If Not (components Is Nothing) Then
components.Dispose()
End If
End If
MyBase.Dispose(disposing)
End Sub

'Required by the Windows Form Designer
Private components As System.ComponentModel.IContainer

'NOTE: The following procedure is required by the Windows Form Designer
'It can be modified using the Windows Form Designer.
'Do not modify it using the code editor.
Friend WithEvents Button1 As System.Windows.Forms.Button
Friend WithEvents TextBox1 As System.Windows.Forms.TextBox
Friend WithEvents TextBox2 As System.Windows.Forms.TextBox
Friend WithEvents Label1 As System.Windows.Forms.Label
Friend WithEvents RadioButton1 As System.Windows.Forms.RadioButton
Friend WithEvents RadioButton2 As System.Windows.Forms.RadioButton
Friend WithEvents RadioButton3 As System.Windows.Forms.RadioButton
<System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
Me.Button1 = New System.Windows.Forms.Button
Me.TextBox1 = New System.Windows.Forms.TextBox
Me.TextBox2 = New System.Windows.Forms.TextBox
Me.Label1 = New System.Windows.Forms.Label
Me.RadioButton1 = New System.Windows.Forms.RadioButton
Me.RadioButton2 = New System.Windows.Forms.RadioButton
Me.RadioButton3 = New System.Windows.Forms.RadioButton
Me.SuspendLayout()
'
'Button1
'
Me.Button1.Location = New System.Drawing.Point(176, 384)
Me.Button1.Name = &quot;Button1&quot;
Me.Button1.Size = New System.Drawing.Size(320, 136)
Me.Button1.TabIndex = 0
Me.Button1.Text = &quot;Count&quot;
'
'TextBox1
'
Me.TextBox1.Location = New System.Drawing.Point(64, 40)
Me.TextBox1.Name = &quot;TextBox1&quot;
Me.TextBox1.TabIndex = 1
Me.TextBox1.Text = &quot;IS&quot;
'
'TextBox2
'
Me.TextBox2.Location = New System.Drawing.Point(64, 96)
Me.TextBox2.Multiline = True
Me.TextBox2.Name = &quot;TextBox2&quot;
Me.TextBox2.Size = New System.Drawing.Size(496, 176)
Me.TextBox2.TabIndex = 2
Me.TextBox2.Text = &quot;IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! I&quot; & _
&quot;S&quot;
'
'Label1
'
Me.Label1.Location = New System.Drawing.Point(568, 424)
Me.Label1.Name = &quot;Label1&quot;
Me.Label1.TabIndex = 3
Me.Label1.Text = &quot;(Total)&quot;
'
'RadioButton1
'
Me.RadioButton1.Checked = True
Me.RadioButton1.Location = New System.Drawing.Point(248, 304)
Me.RadioButton1.Name = &quot;RadioButton1&quot;
Me.RadioButton1.Size = New System.Drawing.Size(104, 48)
Me.RadioButton1.TabIndex = 5
Me.RadioButton1.TabStop = True
Me.RadioButton1.Text = &quot;Deb18's (modified) RegEx&quot;
'
'RadioButton2
'
Me.RadioButton2.Location = New System.Drawing.Point(368, 312)
Me.RadioButton2.Name = &quot;RadioButton2&quot;
Me.RadioButton2.Size = New System.Drawing.Size(104, 40)
Me.RadioButton2.TabIndex = 6
Me.RadioButton2.Text = &quot;RiverGuy's function&quot;
'
'RadioButton3
'
Me.RadioButton3.Location = New System.Drawing.Point(496, 312)
Me.RadioButton3.Name = &quot;RadioButton3&quot;
Me.RadioButton3.Size = New System.Drawing.Size(136, 24)
Me.RadioButton3.TabIndex = 7
Me.RadioButton3.Text = &quot;My original function&quot;
'
'Form1
'
Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
Me.ClientSize = New System.Drawing.Size(712, 581)
Me.Controls.Add(Me.RadioButton3)
Me.Controls.Add(Me.RadioButton2)
Me.Controls.Add(Me.RadioButton1)
Me.Controls.Add(Me.Label1)
Me.Controls.Add(Me.TextBox2)
Me.Controls.Add(Me.TextBox1)
Me.Controls.Add(Me.Button1)
Me.Name = &quot;Form1&quot;
Me.Text = &quot;Form1&quot;
Me.ResumeLayout(False)

End Sub

#End Region

Private Function CountSubStr(ByVal InputText As String, ByVal CountStr As String) As Integer
'Dim MyRegEx As New Regex(&quot; &quot; & CountStr & &quot;\W&quot;) 'Sunaj
Dim MyRegEx As New Regex(&quot;(\s|^)&quot; & CountStr & &quot;\W|$&quot;) 'Deb18

Dim Mc As MatchCollection = MyRegEx.Matches(InputText)
CountSubStr = Mc.Count
End Function

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
If RadioButton1.Checked Then
Label1.Text = CountSubStr(TextBox2.Text, TextBox1.Text)
Else
If RadioButton3.Checked Then
Label1.Text = occurrences(TextBox1.Text, TextBox2.Text)
Else
Label1.Text = FindWords(TextBox2.Text, TextBox1.Text)
End If
End If
End Sub

Public Function FindWords(ByVal SearchString As String, ByVal Word As String)
'Integer for number of occurences found
Dim numOccurences As Integer
numOccurences = 0

'Ignore case
'SearchString = SearchString.ToUpper
'Word = Word.ToUpper

'Get rid of commas
SearchString = SearchString.Replace(&quot;,&quot;, &quot;&quot;)
SearchString = SearchString.Replace(&quot;.&quot;, &quot;&quot;)

'Assign array elements to each word in the string
Dim StringArray() As String
StringArray = SearchString.Split(&quot; &quot;)

Dim i As Integer
Do Until i = StringArray.Length()
If StringArray(i) = Word Then
numOccurences = numOccurences + 1
End If
i = i + 1
Loop

FindWords = numOccurences
End Function

Function occurrences(ByVal searchString As String, ByVal fullString As String) As Integer
Dim upperSearchString As String = searchString.ToUpper()
Dim upperFullString As String = fullString.ToUpper()
Dim iStart As Integer = 0
Dim iCount As Integer = 0
Dim iStrLen As Integer = upperFullString.Length
Dim iPos As Integer

While iStart <= iStrLen
iPos = upperFullString.IndexOf(upperSearchString, iStart)
If (iPos > -1) Then
iCount = iCount + 1
iStart = iPos + upperSearchString.Length
Else
Exit While
End If
End While
Return iCount
End Function


Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)
TextBox2.Text = TextBox2.Text.ToUpper()
TextBox1.Text = TextBox1.Text.ToUpper()
End Sub
End Class
[/tt]
 
Sunaj,

Deb18's problem is exactly what I would love an answer to. I would really like to do this with a RegEx - but that counting issue that Deb18 mentioned is truly a problem for me.

Do you have any idea how to fine-tune Deb18's version of your RegEx to actually get it to count all &quot;IS&quot;'s correctly?

I would be most appreciative!

BTW, the only reason I posted all of that code above was to make it easy for others (like yourself and RiverGuy) to test out potential solutions - not for you to debug all that code - it works fine! :)

Thanks,
Lazer
 
Try this:

Dim MyRegEx As New Regex(&quot;(\s|^|\b)+&quot; & CountStr & &quot;(\W|$)+&quot;)

 
Well, sjcidt has already posted a solution (using the \b = word boundary). I guess you could actually just
Dim MyRegEx As New Regex(&quot;\b&quot; & CountStr & &quot;\b&quot;)

RE: 'IS IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! IS'
The reason that (\s|^)IS\W|$ only counts one of the leading 'IS' is that, there has to be a leading white space (or start of string) and a trailing non-word character (or end of string). The first space in the string is 'used' by the trailing nonword character of the first 'IS' and there is therefore no leading whitespace for the second 'IS'. Was that understandable?

To examine which parts of the string that is actually matched you can use something like:
------------------------------------------------------
Private Sub ShowSubStrPos(ByVal InputText As String, ByVal CountStr As String)
Dim MyRegEx As New Regex(&quot;(\s|^)&quot; & CountStr & &quot;\W|$&quot;)
Dim Mc As MatchCollection = MyRegEx.Matches(InputText)
Dim M As Match
If Mc.Count > 0 Then
For Each M In Mc
MsgBox(M.Index)
Next
End If
End Sub
------------------------------------------------------


Sunaj
'The gap between theory and practice is not as wide in theory as it is in practice'
 
Sjcidt and Sunaj,

Thank you both very much for your solutions! I must say, though, Sunaj's RegEx works a bit better for my specific needs. When searching the string:
IS YOUR NAME JOHN? NO IT IS CHRIS. MAYBE ACTUALLY IT IS ISAAC. IT IS IT IS! IS
with the phrase &quot;IS YOUR &quot; - note the single space after the word &quot;YOUR&quot; - Sunaj's RegEx returned a count of 1 whereas Sjcidt's returned a count of 0.

Thank you both so much for your efforts!!!

[rainbow]
Lazer
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top