Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Microsoft Excel - Filter Cells by Format 5

Status
Not open for further replies.

kjv1611

New member
Jul 9, 2003
10,758
US
I've come accross a particular need to be able to filter a column in Excel by the format. The specific problem consists of a field that is populated with various types of personal information from various consumers. It for a report showing what employees searched for with a particular application.

The current report only gives us what that person searched for. For instance, Agent Smith might have searched for "123 Main St", whereas Agent Jones might have searched for "123-45-6789", and Agent Miller searched for "555-555-1212" just as some examples as to the actual data in that column.

What I would like to do (and I can import it into Access if that would make it simpler) is to split all of this out into separate Excel Spreadsheets (or tables/queries in Access) based on what type of information is in that field.

I was not sure of a way to do this in Excel with an autofilter (it is possible that I just overlooked it, or am just not aware of the particular method, as of yet). So, I thought I could come up with a VBA method to loop through the cells in the one column, and based on the findings for each column, copy that record to the appropriate spreadsheet.

So, what I would want to eventually have (in whatever is the best format - Excel, Access, SQL, whatever) is probably up to 5 or 6 different files, and each would be for a different search criteria. One would be address, one = SSN, and one = telephone (there are a few others, but I think I can just take care of those rather easily, based on a different field).

I have done some VBA in Excel (haven't done much in the past year or so), but I can't think of what would be the best way to check for specific formatting in the cells. I am sure there is a way, and there may be an altogether better method than I am currently dealing with.

Here is what I've gotten started with so far:
Code:
Private Sub FilterSSN()
    Dim wb As Workbook
    Dim ws As Worksheet
    Dim x As Long [green]'Long Integer Variable used for counting rows[/green]
    Dim bolLoop As Boolean [green]'True if to continue Looping because there is still a record, or False if no record to check.[/green]
    
    Set wb = ActiveWorkbook
    Set ws = wb.Worksheets(ActiveWorksheet)
    x = 2
    bolLoop = True
    Do While bolLoop = True
        If ws.Cells(x, 1) = vbNullString Then bolLoop = True
        Else
            Select Case ws.Cells(x, 3).Formula
             ~just psuedo-code here...
                Case Format1
                    Copy Row to Sheet1

                Case Format2
                    Copy Row to Sheet2

                Case Format3
                    Copy Row to Sheet3

                Case Format4
                    Copy Row to Sheet4

                Case Else
                    Copy Row to SheetUnknowns 'For manual review
            End Select
        End If
    Loop
End Sub

Of course, the error handling and every detail has not been worked out yet, but I'm trying to get the idea down pat before I worry about the extras.

Any suggestions would be greatly appreciated. Thanks in advance.
 
Awesome! That will be the perfect way to accomplish my task, here! Thanks a ton!
 
Ok, an additional question on this. I'm not terribly sure at the how to yet for setting the formats to search by. If I can get that down pat, I can probably make this thing REALLY accurate. One format I want to include is this:

AA0010016540

So, I would say something like this?

Const newformat As String = "\a{2}\d{10}*"

Or something like that?

Does anyone know of an online guide showing what characters mean what, and such for using this sort of registry check - at least I think that's what this is.

Thanks for any info/advice/help.

--

"If to err is human, then I must be some kind of human!" -Me
 
Thanks a bunch for the link, PHV. That should prove MOST helpful, as I can get a good understanding of what's going on! [smile]

--

"If to err is human, then I must be some kind of human!" -Me
 
Also, just so you know, VB has a Like operator similar to SQL Server's Like operator, for tasks which are simple enough to not absolutely require regular expressions.

Code:
If sText Like "([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9]" Then
...

Wilcards are * and ?, character set negation mark is !, as opposed to SQL Server wildcards % and _, character set negation mark ^.
 
Interesting.. Although from what I've seen, I'd be made to think that using the regular expressions would be faster when the program runs. But, this will be helpful information for comming back to later. Thanks, E2

--

"If to err is human, then I must be some kind of human!" -Me
 
I'd test to see if regex is actually faster. It might have faster native code, but you're dealing with a call into another library. Also, the Like operator has a very simple syntax and this means far less code behind it, so it could be faster, anyway.
 
I'll keep that in mind, as I could try that to see if I can use one method or the other to reduce the total time and resources required to run various little pieces of this particular project. I added quite a bit to the original function as posted by stevexff, and it now runs a good bit slower. But, part of the reason is that for some particular matching, I'm running through a loop of each character to see what all is in the string value.

I'm sure I can use a filter of this sort, or either a reg expression filter to make that shorter. But, I'm just not sure where to go with it just yet.

What I am looking for is to see what is a SSN vs what is a PHONE entry. If they were all 100% the same format, I'd already have it, but I found some that were not, and at least a couple here and there that are not following ANY known pattern, possibly because of typos. Nonetheless, I'm trying to make it work for close to 99% accuracy - I am sure I can't reach 100%, b/c there are a few records which do not seem to match any format of a searchable criteria - these are probably major typos. [wink]

Anyway, in case anyone is interested in looking at what I've got so far, and how I can speed the thing up, here is what I have for the edited function so far:
Code:
Function SearchFormat(text As String) As String

    Const phonefmt As String = "(\d{3}\D+){2}\d{4}\D*"
    Const ssnfmt As String = "\d{3}\D+\d{2}\D+\d{4}\D*"

    SearchFormat = "ADDRESS"
    
    If Len(Trim(text)) < 7 Then
        SearchFormat = "UNKNOWN"
        Exit Function
    End If

[green]'    If Len(Trim(text)) > 16 Then Exit Function[/green]
    If regCheck(text, phonefmt) = True Then
        SearchFormat = "PHONE"
        Exit Function
    End If
    
    If InStr(text, "(") Then
        If Len(Trim(text)) < 17 Then
            SearchFormat = "PHONE"
            Exit Function
        End If
    End If
    
    If InStr(text, ")") Then
        If Len(Trim(text)) < 17 Then
            SearchFormat = "PHONE"
            Exit Function
        End If
    End If
    
    If regCheck(text, ssnfmt) = True Then
        SearchFormat = "SSN"
        Exit Function
    End If

[green]'    If (Len(Trim(text)) = 7) Or (Len(Trim(text)) = 10) Then[/green]
        Dim idx As Long
        Dim intNumber As Integer
        Dim intAlpha As Integer
        Dim intDash As Integer
        intNumber = 0
        intAlpha = 0
        intDash = 0
        'Debug.Print "LengthOfString = " & Len(Trim(text))
        For idx = 1 To Len(text)
[green]            'Debug.Print "idx = " & idx
'            If Mid$(text, idx, 1) = " " Then
'                Debug.Print "Character = Space"
'            Else
'                Debug.Print "Character = " & Mid$(text, idx, 1)
'            End If[/green]
            Select Case Asc(Mid$(text, idx, 1))
                Case 48 To 57 [green]'Number characters 0 through 9[/green]
                    intNumber = intNumber + 1
                Case 65 To 90, 97 To 122 [green]'Letter characters[/green]
                    intAlpha = intAlpha + 1
                Case 45 [green]'- dash[/green]
                    intDash = intDash + 1
            End Select
        Next idx
    
        If intAlpha > 0 Then
            If intNumber > 0 Then
                SearchFormat = "ADDRESS"
                Exit Function
            ElseIf intNumber = 0 Then
                SearchFormat = "NAME"
                Exit Function
            Else
                If intNumber = 9 Then
                    SearchFormat = "SSN"
                    Exit Function
                ElseIf intNumber = 10 Or intNumber = 7 Then
                    SearchFormat = "PHONE"
                    Exit Function
                End If
            End If
        Else
            Select Case intNumber
                Case 8, 9
                    SearchFormat = "SSN"
                    Exit Function
                Case 7, 10
                    SearchFormat = "PHONE"
                    Exit Function
                Case Else
                    SearchFormat = "UNKNOWN"
                    Exit Function
            End Select
            Exit Function
        End If
[green]        
'        If IsNumeric(text) Then
'        Else
'            Exit Function
'        End If
'        If InStr(text, "-") Then
'            SearchFormat = "SSN"
'            Exit Function
'        Else
'            SearchFormat = "PHONE"
'            Exit Function
'        End If
'    End If
    
'    If (Len(Trim(text)) > 7) And (Len(Trim(text)) < 10) Then
'        If IsNumeric(text) Then
'        Else
'            Exit Function
'        End If
'        SearchFormat = "SSN"
'        Exit Function
'    End If
[/green]

End Function

Private Function regCheck(inText As String, pattern As String) As Boolean

    regCheck = True
    Dim re As VBScript_RegExp_55.RegExp
    Set re = New VBScript_RegExp_55.RegExp
    re.pattern = pattern
    regCheck = re.Test(inText)

End Function

Now, of course, I can probably clean up a piece or two here or there. I already commented out some portions, because I do not think they are necessary after all. Just what I've changed in commenting out some sections, and adding in another "Exit Function" here or there which I left out by accident has already probably trippled the speed which I did have it at.

In the mean time, I'll be working on some other non-related items, as well as trying to review the regular expressions info from Microsoft's website which PHV posted earlier.

--

"If to err is human, then I must be some kind of human!" -Me
 
Skip,

You said:
Yes. Insert/Function - User Defined Category -- and then select the function in your PERSONAL.xls workbook

I publish a MODULE to my users that they IMPORT into their PERSONAL.xls. It has a plethora of functions that they use on spreadsheets that return database and screenscrape parameters.

I tried this out, and it worked great. But, in order for the function to work this way, I have to reference it like this:
Code:
    =PERSONAL.XLS!SearchFormat(C4)

Is there a way to make it accessible form the PERSONAL macro workbook apart from this format, to where it just shows as =SearchFormat(C4)? - C4 happens to be the cell reference for the particular row.

I can deal with it the long way of referencing the spreadsheet, but I would like to find a way to do it withotu the spreadsheet reference.

Would the only way to be to create an add-in? If so, I may just hold off on that for now, but I'd be interested in how to get that going. If that's the case, I'll just mabye do some searching on that, as I'm sure it's been referenced here before.

Thanks for all the help!

--

"If to err is human, then I must be some kind of human!" -Me
 




I hope that you're not TYPING...
[tt]
=PERSONAL.XLS!whatever
[/tt]
Insert/Function - User Defined - And SELECT the macro from the drop down.

Skip,

[glasses] [red][/red]
[tongue]
 
What I did to test what you stated (and it did work) was:
Insert Menu->
Function->
then choose User Defined from the drop down->
and then the function was listed as
"PERSONAL.XLS!SearchFormat."

When I had the VBA code in the Workbook where the data was, it just listed it as:
"SearchFormat"

I would like, if possible, for it to show "SearchFormat" just for when the function is listed under the PERSONAL.XLS macro workbook, and thus not have to be imported into every workbook or either the data always imported back into this specific workbook.

I hope this is making some sense. [blush] [smile]



--

"If to err is human, then I must be some kind of human!" -Me
 
Okay, here is the whole procedure, currently. It is doing everything correctly so far. Personally, I think it's pretty cool! [wink]

First, the FindFormat Function (I probably could make it a bit faster with some more regular expressions, but I've not gotten there yet - if any suggestions, fire away, I guess.
Code:
Function SearchFormat(text As String) As String
'For this to work correctly, the Microsoft VBScript Regular Expressions needs to be enabled/checked
'Use most current if set the reference in the future.  Used 5.5 as of 20070215

    Const phonefmt As String = "(\d{3}\D+){2}\d{4}\D*"
    Const ssnfmt As String = "\d{3}\D+\d{2}\D+\d{4}\D*"

    SearchFormat = "ADDRESS"
    
    If Len(Trim(text)) < 7 Then
        SearchFormat = "UNKNOWN"
        Exit Function
    End If

    If regCheck(text, phonefmt) = True Then
        SearchFormat = "PHONE"
        Exit Function
    End If
    
    If InStr(text, "(") Then
        If Len(Trim(text)) < 17 Then
            SearchFormat = "PHONE"
            Exit Function
        End If
    End If
    
    If InStr(text, ")") Then
        If Len(Trim(text)) < 17 Then
            SearchFormat = "PHONE"
            Exit Function
        End If
    End If
    
    If regCheck(text, ssnfmt) = True Then
        SearchFormat = "SSN"
        Exit Function
    End If

        Dim idx As Long
        Dim intNumber As Integer
        Dim intAlpha As Integer
        Dim intDash As Integer
        intNumber = 0
        intAlpha = 0
        intDash = 0
        
        For idx = 1 To Len(text)
            Select Case Asc(Mid$(text, idx, 1))
                Case 48 To 57 'Number characters 0 through 9
                    intNumber = intNumber + 1
                Case 65 To 90, 97 To 122 'Letter characters
                    intAlpha = intAlpha + 1
                Case 45 '- dash
                    intDash = intDash + 1
            End Select
        Next idx
    
        If intAlpha > 0 Then
            If intNumber > 0 Then
                SearchFormat = "ADDRESS"
                Exit Function
            ElseIf intNumber = 0 Then
                SearchFormat = "NAME"
                Exit Function
            Else
                If intNumber = 9 Then
                    SearchFormat = "SSN"
                    Exit Function
                ElseIf intNumber = 10 Or intNumber = 7 Then
                    SearchFormat = "PHONE"
                    Exit Function
                End If
            End If
        Else
            Select Case intNumber
                Case 8, 9
                    SearchFormat = "SSN"
                    Exit Function
                Case 7, 10
                    SearchFormat = "PHONE"
                    Exit Function
                Case Else
                    SearchFormat = "UNKNOWN"
                    Exit Function
            End Select
            Exit Function
        End If

End Function

Private Function regCheck(inText As String, pattern As String) As Boolean

    regCheck = True
    Dim re As VBScript_RegExp_55.RegExp
    Set re = New VBScript_RegExp_55.RegExp
    re.pattern = pattern
    regCheck = re.Test(inText)

End Function

Okay, then this procedure creates a copy of one worksheet onto another, and enters the SearchFormat function into the appropriate cells:
Code:
Public Sub EnterFormula()

    Dim wb As Workbook
    Dim ws As Worksheet
    Dim ws1 As Worksheet
    Dim br As Long 'Row - used to find very last row with actual records
    Dim tr As Long 'Row - used to find where column titles located
    
    Set wb = ActiveWorkbook
    Set ws = wb.ActiveSheet
    ws.Copy After:=ws
    Set ws1 = wb.ActiveSheet

    ws1.Range("H:H").Insert Shift:=xlToRight
    br = ws1.Range("A65536").End(xlUp).Row
    tr = ws1.Cells.Find(What:="Login ID", After:=ws1.Cells(1, 1), LookIn:=xlValues, LookAt:= _
        xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False, _
        SearchFormat:=False).Row
    
    Range("H" & tr + 1).Select
    ActiveCell.Formula = "=PERSONAL.XLS!SearchFormat(C" & tr + 1 & ")"
    ws1.Range("H" & tr + 1 & ":H" & br).FillDown
    
End Sub

In the function, I'm referring to the user defined function as PERSONAL.XLS!SearchFormat. If there is a different way while still having the user function in the PERSONAL macro workbook, I'm all ears. [BIGEARS]

--

"If to err is human, then I must be some kind of human!" -Me
 
Sorry, yet another post. I yet cleaned up my code even a bit more on the EnterFormula procedure by combining a couple lines, and probably saving a few processor cycles - not that it'd be noticeable. [wink]
Here's the new code for the EnterFormula bit:
Code:
Public Sub EnterFormula()

    Dim wb As Workbook
    Dim ws As Worksheet
    Dim ws1 As Worksheet
    Dim br As Long 'Row - used to find very last row with actual records
    Dim tr As Long 'Row - used to find where column titles located
    
    Set wb = ActiveWorkbook
    Set ws = wb.ActiveSheet
    ws.Copy After:=ws
    Set ws1 = wb.ActiveSheet

    ws1.Range("H:H").Insert Shift:=xlToRight
    br = ws1.Range("A65536").End(xlUp).Row
    tr = ws1.Cells.Find(What:="Login ID", After:=ws1.Cells(1, 1), LookIn:=xlValues, LookAt:= _
        xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False, _
        SearchFormat:=False).Row
    
    Range("H" & tr).Formula = "Search Format"
    Range("H" & tr + 1).Formula = "=PERSONAL.XLS!SearchFormat(C" & tr + 1 & ")"
    ws1.Range("H" & tr + 1 & ":H" & br).FillDown
    
End Sub

Of course, I'll have to give credit to PHV in another thread for some of that. I had a real mess before he helped me get it cleaned up and sorted out!

--

"If to err is human, then I must be some kind of human!" -Me
 
Another thought:

In considering about whether to query directly with Microsoft Query instead of SQL Server, I am not sure how I would want to go about this.

You see, I want to be able to do all of this with new workbooks in the future that are set to the same format.

I would say that I could use a template, but if I remember corectly, the reports we receive from the 3rd party are already in Excel format. So, to use the data, the only way possible there would be to import from one workbook to another or copy and paste. That wouldn't be a big deal, but I would think there a better way.

I've not used Microsoft Query other than to just import a particular table, or at least just certain fields of a table from a SQL server table.

Any suggestions here, or am I making any sense? I realize it's possible for me to not make sense on this sort of thing. [wink]

--

"If to err is human, then I must be some kind of human!" -Me
 
I realize it's been a while since discussing this topic, but I've apparently found a hole in one of the SearchFormat possibilities. I'll post below what I have found, and I'm asking if anyone has any ideas as to the why, and as to the how to edit the filter in order to fix the error. Thanks in advance for any advice.

This is where the Error happens, apparently, using the SSN filter as listed regCheck examples from earlier in the thread:
Code:
    If regCheck(text, ssnfmt) = True Then
        SearchFormat = "SSN"
        Exit Function
    End If

Here are some examples that end up showing as SSN, but they are actually addresses! I'm thinking it has something to do with them having several digits in the house number:

[OL]
[LI]12345 12TH PL N WEST HIGH BEACH XX 12345
[LI]12345 SW 13TH ST APPLE XX 12345
[LI]12345 FOURTREE ROAD GREEN VALLEY XX 123-45-6789
[/OL]

The first 2 are JUST addresses, and the last one is an address with a SSN as well. I'm not going to worry about those that have extra stuff besides the address (at least for now). Those can be taken care of manually after the fact.

Does anyone know if it is possible to fix the regex expression to where it will filter out the addresses, or do I need to use something like this to fix it?
Code:
If Len(Trim(Text)) < 12 Then
    If regCheck(text, ssnfmt) = True Then
        SearchFormat = "SSN"
        Exit Function
    End If
End If

Again, thanks for any suggestions. I'd say this is returning a fair amount of addresses as SSNs, and most of them do not have an attached SSN.

Yippee! It's Friday - and it's almost EOB! EOBOEOW!

[WINK]

That stands for "End Of Business Of End Of Week" - just through the 2 together with an "O" in the middle. [smile]

--

"If to err is human, then I must be some kind of human!" -Me
 
Okay, I tried my own suggestion, and it worked great so far. Does anyone have any thoughts as to why the SSN filter was getting tripped up? Also, if there is a way to fix that filter, should I just do that, and do away with the Len() function usage. I'm thinking I might as well leave it sort of as a safe guard in case anything else slips through.

--

"If to err is human, then I must be some kind of human!" -Me
 
The regular expression is imprecise:

"\d{3}\D+\d{2}\D+\d{4}\D*"

This means exactly three digits, then one to any number of non-digits, then 2 digits, then 1+ non-digits, then 4 digits, then 0 to infinite non-digits (this part can be completely left off in any case).

So

12345 12TH PL N WEST HIGH BEACH XX 12345

has
345, 3 digits
1 character, a space which is a non-digit
12, 2 digits,
TH through XX , one ore more non-digits
1234, 4 digits.
Then 0 non-digits.

So how about "\b\d{3}-\d{2}-\d{4}\b"?

which is:
word boundary, 3 digits, a dash, 2 digits, a dash, 4 digits, word boundary?
That should only match real SSNs.

The phone regex string also has similar issues with the + operator.

There are lots of regex help sites online. One useful one to you might be a regular expression tester, such as at regextester.com.
 
Thanks, ESquared, I'll give that a shot and post back.

--

"If to err is human, then I must be some kind of human!" -Me
 
Oh, and ESquared (haven't made the change yet - was working on another little quirk), but I did see the problem you were mentioning with the PHONE format as well, as it's giving me addresses for that as well. So, I'll need to get all that sorted out.

--

"If to err is human, then I must be some kind of human!" -Me
 
Well, the SSN format does seem to work correctly, even if the SSN is mixed in with other stuff! So, now I've got to work in a filter to iliminate some of the other stuff, or either just run my query based on containing the SSN only, and not all the other stuff somehow... this is an interesting conundrum, I believe, as the SSN would be more precise to matching to one real specific person, I would think. More to think about. [banghead]

--

"If to err is human, then I must be some kind of human!" -Me
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top