Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expression Help 1

Status
Not open for further replies.

combs

Programmer
Apr 18, 2002
78
US
This should be very easy and straight-forward, but I can't get this figured out....

I need to grab a number out of a text file that I have captured off the web. I've stripped out the newlines, returns, form feeds, and spaces. A sample of the data I'm sorting through: (need the red number)

Code:
...<strong>LotSize:</strong></font></td><td><fontsize="2">[COLOR=red]10,500[/color]Sq.Ft.</td>...

I am using this as the pattern:
Code:
strPattern = "LotSize:.{39}(\d+,*\.*\d+)"

It is returning no matches! Is there something obvious that I'm missing here? I'd really appreciate any help anyone is able to offer....

Thanks!
 
Are you sure that's the exact data? I noly ask as it returns one match on mine (which contains LotSize...10,500).

As I don't know what the rest of your data looks like (or if the numbers will always be formatted in the way posted, so some of this might be overkill) but here's a sample expression that will work (works for the data posted and for the data posted with number formatted without comma's):
Code:
"\d{1,6}(|,\d{1,5})(?=Sq.Ft.)"
Hope this helps

HarleyQuinn
---------------------------------
Carter, hand me my thinking grenades!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before posting.

 
HarleyQuinn,

Thanks for your reply.
Yes, that will always be the format of the the data. (At least from the "LotSize:" to the number. The exception could be that it is listed in Acres instead of square feet, so I can't always count on there being a "Sq.Ft." in the data.

I will post the code for that Sub below since the error may be somewhere else... (especially if it's working for you)

Code:
Public Sub FetchCamaData(ByVal strGeoCode As String)
        Dim strData2 As String
        Dim strURL As String
        strData2 = ""
        Dim dblTemp As Double

        strURL = "[URL unfurl="true"]http://***.******.*****.****.asp?Geo_code="[/URL] & strGeoCode
        Dim uriWebSite As New Uri(strURL)
        Dim wReq As WebRequest = WebRequest.Create(uriWebSite)
        Dim wResp As WebResponse = wReq.GetResponse()
        Dim sr As New StreamReader(wResp.GetResponseStream)

        Try
            strData2 = sr.ReadToEnd
        Catch WebExcp As WebException
            MsgBox(WebExcp.Message, MsgBoxStyle.Exclamation, "Error:")
        End Try
        sr.Close()

        If InStr(1, strData2, "The page cannot be found") > 0 Then
            MsgBox("I couldn't get the CAMA data from the County Web Site!" _
            & vbNewLine & vbNewLine & "Verify that the County web site is " _
            & "available..." & vbNewLine & vbNewLine & "A cause of this " _
            & "problem may be that the County has updated their webpage format." _
            & "  If this is the case, then someone will need to update the code " _
            & " to search the new format.  Until the code is updated, this feature " _
            & "will not be available...", vbExclamation, "County WebSite Format Changed:")
            Exit Sub
        End If

        strPattern = "[\n+\t+\r+\f+\s]+"
        Dim RegExp As New Regex(strPattern)

        [COLOR=green]'STRIP OUT ALL THE NEW LINES, LINE FEEDS, CARRIAGE RETURNS, SPACES AND TABS[/color]
        strData2 = RegExp.Replace(strData2, "")
        [COLOR=green]'MsgBox(strData2)[/color]
        strPattern = "LotSize:.{39}(\d+,*\.*\d+)"
        Dim oMatch As Match
        oMatches = RegExp.Matches(strData2)
        If oMatches.Count > 0 Then
            For Each oMatch In oMatches
                strTemp = oMatch.Groups(1).ToString
                dblTemp = CDbl(oMatch.Groups(1).ToString)
                If dblTemp < 100 Then
                    [COLOR=green]'CONVERT ACRES TO SQUARE FEET (THIS CASE THE Sq.Ft. WOULD BE Acres)[/color]
                    dblTemp = dblTemp * 43560
                End If
                strTemp = dblTemp
                If Form1.txtSquareFeet.Text = vbNullString Then
                    If strTemp <> vbNullString Then
                        Form1.txtSquareFeet.Text = strTemp
                    Else
                        MsgBox("Cannot find Square Feet!", vbInformation, "No Square Feet:")
                    End If
                Else
                    MsgBox(strTemp, vbInformation, "Square Feet (Lot Size):")
                End If
            Next
        Else
            [COLOR=red]MsgBox("----->Couldn't find the Square Feet!<-----")[/color]
        End If
        RegExp = Nothing
        strData2 = Nothing
    End Sub

The red code is the message that I am receiving ALL the time!

Thanks again for your help!
 
You will, you don't tell RegExp to use the new pattern [wink]

How about:
Code:
strPattern = "[\n+\t+\r+\f+\s]+"
        'STRIP OUT ALL THE NEW LINES, LINE FEEDS, CARRIAGE RETURNS, SPACES AND TABS
        strData2 = RegEx.Replace(strData2, strPattern,  "")
        'MsgBox(strData2)
        strPattern = "(?<=LotSize:.{39})\d+,?\d+"
        Dim RegExp As New Regex(strPattern)
        Dim oMatch As Match
        oMatches = RegExp.Matches(strData2)
This pattern uses positive lookbehind so as to allow you to check for the preceeding text but not include it in your match (so negates the need for groups.

On a final note, the RegEx used in RegEx.Replace does not need to be declared as it comes as an object from System.Text.RegularExpressions.

Hope this helps

HarleyQuinn
---------------------------------
Carter, hand me my thinking grenades!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before posting.

 
HarleyQuinn,

Thanks for your help - that works perfectly.

I've used RE in perl and VBA before, but this is my first experience with VB 2008....

I've not seen that notation before ("?<=")... Can you point me to a URL that does a good job of explaining RE's for VB2008?

Thanks again and a star for you!
 
Hi combs,

Glad I could help, thanks for the star [smile]

The notation you refer to is positive lookbehind. This wasn't available in VBA or VB6 (which caused me no end of annoyance [wink]), though I can't say if you can use it in Perl as I've never touched it! [smile]

The following thread705-1552734 contains good links for RE from both PHV and myself.

Hope this helps

HarleyQuinn
---------------------------------
Carter, hand me my thinking grenades!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before posting.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top