First I'll provide a description of what I'm trying to do.
When I send an outbound email I want to screen the body of the message and remove any SS# information. So far I think I have this under control with the exception of some of the REGEX patterns I'm trying to find... I've been researching and trying different combinations for the last couple of days and I'm getting to the point where I think I need someone elses input.
Basically I'm looking for patterns in a String that I've pulled from the HTMLBody of the email. The patterns would include standard digit formatting for SS# such as:
xxx-xx-xxxx or xxx xx xxxx. I'm also looking for 9 consecutive digits as long as the first digit is not a "9", so 887654321 would qualify, but 987654321 would not. I'm also looking for any 8 digit consecutive 98765432...
The problem I encounter is that the string is pulling in HTML tags (which I want), so the sequence may look like "<tag>87654321</tag>" or there could be other non-numeric values next to the digit sequence...
I can't figure out the REGEX that I need to use to find 8 digits regarless of leading or trailing (as long as it's not another digit, I don't want to pull an 8 digit number out of a 10 digit number, etc.)
I've tried [0-9]{8} but this will extract 8 digit numbers from larger number sequences, and I've tried ^\d{5}$ but this will not locate the sequences due to leading and trailing characters...
I've placed the code below for those who are interested, if anyone can offer any assistance on building my ".pattern" line with REGEX, please feel free to help! Thanks everyone for reading this!
When I send an outbound email I want to screen the body of the message and remove any SS# information. So far I think I have this under control with the exception of some of the REGEX patterns I'm trying to find... I've been researching and trying different combinations for the last couple of days and I'm getting to the point where I think I need someone elses input.
Basically I'm looking for patterns in a String that I've pulled from the HTMLBody of the email. The patterns would include standard digit formatting for SS# such as:
xxx-xx-xxxx or xxx xx xxxx. I'm also looking for 9 consecutive digits as long as the first digit is not a "9", so 887654321 would qualify, but 987654321 would not. I'm also looking for any 8 digit consecutive 98765432...
The problem I encounter is that the string is pulling in HTML tags (which I want), so the sequence may look like "<tag>87654321</tag>" or there could be other non-numeric values next to the digit sequence...
I can't figure out the REGEX that I need to use to find 8 digits regarless of leading or trailing (as long as it's not another digit, I don't want to pull an 8 digit number out of a 10 digit number, etc.)
I've tried [0-9]{8} but this will extract 8 digit numbers from larger number sequences, and I've tried ^\d{5}$ but this will not locate the sequences due to leading and trailing characters...
I've placed the code below for those who are interested, if anyone can offer any assistance on building my ".pattern" line with REGEX, please feel free to help! Thanks everyone for reading this!
Code:
Private Sub Application_ItemSend(ByVal Item As Object, Cancel As Boolean)
Dim Itm As outlook.MailItem
If TypeName(Item) <> "MailItem" Then
Exit Sub
Else
Set Itm = Item
Itm.HTMLBody = StripPHIFromText(Itm.HTMLBody)
End If
End Sub
Function StripPHIFromText(ByVal RTFString As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "[0-9]{3}[ ][0-9]{2}[ ][0-9]{4}|[0-9]{3}[-][0-9]{2}[-][0-9]{4}|[0-8]{1}[0-9]{8}"
End With
'--------------------------------------
Dim NameFile As String
Dim CENumber As String
NameFile = "C:\Documents and Settings\t329323\Desktop\emailinfo.doc"
Open NameFile For Output As #1
Write #1, RTFString
Close #1
'---------------------------------------
StripPHIFromText = RegEx.Replace(RTFString, "(PHI DELETED)")
Set RegEx = Nothing
End Function