Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Char count with RegEx 1

Status
Not open for further replies.

xhat

Technical User
May 18, 2009
25
US
Environment: Win XP
Access 2000
Microsoft VBScript Regular Expressions 5.5

I'm teaching myself regular expressions, so I'm starting slow. I want to test for the presence of a defined number of occurrences of a particular character. For example, if the string were "Mary had a little lamb," I might test for exactly 4 'a' characters. My understanding of VB's implementation of regex is that the following pattern ought to evaluate to true when tested:
Code:
regexobj.Pattern = "a{4}"
Unfortunately, it does not.

Here is my actual code, and the output it produces:
Code:
Public Sub SeeHand()
    'Test sub to get hand data and sort it
    Dim rs As DAO.Recordset
    Dim rsq As DAO.QueryDef
    Dim rgxp As New RegExp
    Dim strBoardandHoleCards As String
    
    Set rsq = CurrentDb().QueryDefs("qrySeeShowdownCards")
    
    Set rs = rsq.OpenRecordset
    
    strBoardandHoleCards = rs.Fields("HandStr")
    
    rgxp.Global = True
    rgxp.IgnoreCase = True
    rgxp.Pattern = "h{3}"
    
    
    Debug.Print strBoardandHoleCards
    Set rest = rgxp.Execute(strBoardandHoleCards)
    
    Debug.Print rest.Count
    
    
End Sub


Output (from the immediate window): 2c9sAhTsTh7h 3s
                                    0

As you can see from the output, the test string has 3 'h' chars in it, yet the count shows 0. Any idea where I'm going wrong?

TIA

"There is no spoon..." - anonymous enlightened child
 
[0]
>My understanding of VB's implementation of regex is that the following pattern ought to evaluate to true when tested:
>regexobj.Pattern = "a{4}"
That means four consecutive character "a".

[1] To verify the fact with a single regex pattern, you can do this.
[tt] rgxp.Pattern = "^([^a]*a[^a]*){4}$"[/tt]

ps: If your interest is nothing other than vb, I would suggest you post the question to vba forum. This is vbscript forum not exactly the same.
 
tsuji

Thank you for the post and the possible solution. My need is not to search for a particular character in succession within a string, but rather to search for a particular count of characters within a string, no matter where they appear in the string.

The Sub code snippet's purpose is to figure out how to do that using regular expressions. Every string that will be passed to the Sub will contain some combination of 'h', 'd', 'c', or 's'. The count could be anywhere from 0-7, but I will always be looking for exactly 5 of any of those 4 chars within the string.

However, I decided to try out your solution (modified as you'll see below to correspond with the test string I'm using) anyway to see if I could gain some more insight into regex. Weirdly, I get two different results depending upon which method I use. Using....
Code:
rgxp.Pattern = "^([^h]*h[^h]*){3}$"
rgxp.Execute("2c9sAhTsTh7h 3s")
Debug.Print rgxp.Count
... the output is 1. However, using...
Code:
rgxp.Pattern = "^([^h]*h[^h]*){3}$"
Debug.Print rgxp.Test("2c9sAhTsTh7h 3s")
... the output is False. I was surprised to see the False as the output, since rgxp.Count = 1 seems to suggest a True evaluation of the pattern!

"There is no spoon..." - anonymous enlightened child
 
[2] .count is for match collection. There is no such thing as rgxp.count after executing .execute(). The .count could appear like this, for instance.
[tt] rgxp.execute("2c9sAhTsTh7h 3s").count[/tt]

[3] The second instance of string should give exactly true with rgxp.test(...) as well. So, I suspect you make some other mistake on testing it.
 
tsuji,

I'm going to cut and paste my test code and output. I don't believe I have any transcription errors, but it is possible, I realize.

Code and output with .Count:
Code:
Public Sub SeeHand()
    [COLOR=green]'Test sub to get hand data and sort it[/color]
    Dim rs As DAO.Recordset
    Dim rsq As DAO.QueryDef
    Dim rgxp As New RegExp
    Dim strBoardandHoleCards As String
    [COLOR=green]'Dim CharacterCount As Integer[/color]
    
    Set rsq = CurrentDb().QueryDefs("qrySeeShowdownCards")
    
    Set rs = rsq.OpenRecordset
    
    strBoardandHoleCards = rs.Fields("HandStr")
    
    rgxp.Global = True
    rgxp.IgnoreCase = True
    rgxp.Pattern = "^([^h]*h[^h]*){3}$"
    
    Debug.Print "-------------------"
    Debug.Print strBoardandHoleCards
    Debug.Print rgxp.Execute(strBoardandHoleCards).Count
    [COLOR=green]'Debug.Print rgxp.Test("strBoardandHoleCards")[/color]
    
    rs.Close
    rsq.Close
    Set rs = Nothing
    Set rsq = Nothing
    Set rgxp = Nothing
    [COLOR=green]'Set rest = Nothing[/color]
    
End Sub

Output from Immediate Window:
-------------------
2c9sAhTsTh7h 3s
 1

And code and output from .Test:
Code:
Public Sub SeeHand()
    [COLOR=green]'Test sub to get hand data and sort it[/color]
    Dim rs As DAO.Recordset
    Dim rsq As DAO.QueryDef
    Dim rgxp As New RegExp
    Dim strBoardandHoleCards As String
    [COLOR=green]'Dim CharacterCount As Integer[/color]
    
    Set rsq = CurrentDb().QueryDefs("qrySeeShowdownCards")
    
    Set rs = rsq.OpenRecordset
    
    strBoardandHoleCards = rs.Fields("HandStr")
    
    rgxp.Global = True
    rgxp.IgnoreCase = True
    rgxp.Pattern = "^([^h]*h[^h]*){3}$"
    
    Debug.Print "-------------------"
    Debug.Print strBoardandHoleCards
    [COLOR=green]'Debug.Print rgxp.Execute(strBoardandHoleCards).Count[/color]
    Debug.Print rgxp.Test("strBoardandHoleCards")
    
    rs.Close
    rsq.Close
    Set rs = Nothing
    Set rsq = Nothing
    Set rgxp = Nothing
    [COLOR=green]'Set rest = Nothing[/color]
    
End Sub

Output from Immediate Window:
-------------------
2c9sAhTsTh7h 3s
False

Perhaps something is wrong with my implementation of of VBScript Regular Expressions 5.5. Recall that my environment is WinXP, Access2000, and VBS RegEx 5.5.

PHV,

Thanks for the link, but I'm already familiar with it. I have been working with that document, and several other on the MSDN site, trying to teach myself regex. I got stuck on (what I believed would be a simple thing) counting the number of chars in a string using regex.

"There is no spoon..." - anonymous enlightened child
 
>what I believed would be a simple thing

But why would you think that? Regular Expressions are for matching patterns, not counting (there are some minor exceptions to this); any counting ability is simply side effect of the pattern matching. Basically you can only get a count AFTER the matching has been done.

So the trick is to write a function that counts a particular character's (or string's) occurrence using a pattern, e.g. (and this is VBA, not VBScript, so as tsuji says you might want to move this over to forum707)
Code:
[blue]Public Function CountChars(source As String, chars As String, Optional IgnoreCase As Boolean = True) As Long
    With New RegExp
        .IgnoreCase = IgnoreCase
        .Global = True
        .Pattern = "(.*?)" & chars
        CountChars = .Execute(source).Count
    End With
End Function[/blue]
 
A word about the choice of which forum to post in: Microsoft themselves cannot seem to make up their mind about where regular expressions live. If you follow this link to the Regular Expression introductory page on the MSDN website you'll notice on the navigation pane that the information lives under Jscript. Meanwhile, the implementation of regular expressions inside of VB 6.0 requires setting a reference to Microsoft VBScript Regular Expressions 5.5. Lastly, I have yet to find a section in the MSDN under either Office or Access that refers to regular expressions without linking back to one of the scripting languages. So, I decided this post was less about VBA than VBScript, but hey, I'm just learning, so what do I know...

StrongM,

I believed it would be simple to do this because, at essence, finding a defined count of chars that come from a defined set (n chars from a set of 4 chars) is essentially identifying a pattern, which is what regex is supposed to do. So basically, I simply want to say to regex, "Find out for me if 5 'h' chars live in this string," or "Find out for me if 5 'd' chars live in this string," etc. Basic pattern matching to my simplistic understanding of regex.

Anyway, I tested your code, strongm, and it works, so thank you for that. I'm going to go back to the documentation now and see if I can explain to myself why this works. Thanks for the assistance everyone.

"There is no spoon..." - anonymous enlightened child
 
BTW, the reason your earlier tests produce mismatched results is that:

Debug.Print rgxp.Test("strBoardandHoleCards")

should be

Debug.Print rgxp.Test(strBoardandHoleCards)

> Basic pattern matching to my simplistic understanding of regex

Not at all. Having 5 characters randomly distributed through a string is hardly a pattern. It is possible to write a pattern to match against this, but it can involve some lateral thinking, as tsuji's pattern should show
 
strongm,

Thanks for pointing out that error with my test of .Test. Completely missed the the quote marks. When I properly referenced the variable, tsuji's pattern did, indeed, work correctly.

I'm still not convinced that what I'm looking for in a string is not a pattern. As you've probably deduced by now, the string represents 7 playing cards, and I'm trying to write an all-encompassing, or series of all-encompassing, regex patterns to find things like straights, flushes, full houses, etc, that are in the string. So, my little brain goes, in the case of a full house, for example, the presence of three cards with identical pip values, followed by the presence of two cards with identical pip values, is a pattern. By definition, a full house is a pattern in an otherwise random distribution of playing cards. If it wasn't, the odds of making a full house would simply be 50-50 with each deal! Similarly, a flush is a pattern.

But the distribution of playing cards per full hand of poker, however, is a random event. So from that perspective, I see why you say the presence of 5 chars in a string is hardly a pattern. But I think we're looking at the same problem from different perspectives. You're saying on any given hand, the presence of 5 chars distributed within a finite string is a random event, and I'm saying I'm looking for patterns within this random event. So, just my thoughts...

Thanks for your assistance and thoughts on this matter. I'm still going to pursue using regex to tackle this pattern, if only because I think it represents a great learning opportunity for me!

"There is no spoon..." - anonymous enlightened child
 
>I'm still not convinced that what I'm looking for in a string is not a pattern.
I don't think you've to insist too much on this. To me, it is no doubt a pattern. To assert the contrary should be taken as highlight some aspect of it. The pattern, you will notice it, involve some global consideration, ie, of the form "^...$", so the execute().count is always one (1). Instead, if you consider that kind of say, 5 consecutive character a, it is something "local". Its .execute().count can be multiple local match. (Do you kind enough mathematics to appreciate global vs local?) I don't take strongm's assert too at heart. I think it is just kind of a slip of tongue.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top