Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex help 1

Status
Not open for further replies.

marduk813

Programmer
Jul 18, 2002
89
US
I've been tasked with fixing someone else's VB.Net application, and this is the first time I've ever used regex. I understand some of the basics, but I'm having trouble doing what I need to do. I have to parse through an email header and extract email addresses, as well as the "From", "To", and "CC" headers. I also need to be able to determine which addresses fall under which headers. (i.e. which email addresses are in the "From" line, which are in the "To" line, and which are in the "CC" line)

So far, I have this:

Code:
(?i)(from:|to:|cc:).*([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})

The email pattern was written by someone else in my company, and it seems to work for what I need. I just added (?i)(from:|to:|cc:).*

I'm having two problems:
[ol]
[li][ignore]I'm getting all the text between the header and the email address, in addition to the two patterns I want. (i.e. From: "John Doe" <john.doe@email.com) I need to be able to get only the "From:" and the "john.doe@email.com" patterns. I know that .* is causing this, but I found that if I left it out, I got no results at all.[/ignore][/li]
[li]I'm only getting one email address per header, and there are multiple addresses per header (or there can be).[/li]
[/ol]

I think if I can get only the patterns I need, then I can do the rest as far as sorting out which addresses are recipients, which are senders, and which are carbon copies.

Any help would be much appreciated.

Jas
 
marduk813,
I'm not real good with regex either. I use a program called regexbuddy to create,test, and debug my regexe's. I pasted your regex into regexbuddy and here's how it broke it down. Hope this helps. Sorry I couldnt help more.

regex.jpg


Regex pic
 
Can you post a dummy email header with multiple addresses? I need to look at the format.
 
bmgmzp, thanks for the info. I downloaded RegexBuddy and am toying with it now. I had Expresso, but RegexBuddy seems to have a few more options.

Drederick, here's a sample header that I'm using:

Code:
MIME-Version: 1.0 
Content-Type: text/plain; 
	charset="us-ascii" 
Content-Transfer-Encoding: quoted-printable 
Content-class: urn:content-classes:message 
Return-Path: <Jas@company.com> 
X-MimeOLE: Produced By Microsoft Exchange V6.5 
X-OriginalArrivalTime: 20 Oct 2006 20:11:39.0389 (UTC) FILETIME=[F3BEAAD0:01C6F483] 
Subject: test 
Date: Fri, 20 Oct 2006 15:11:39 -0500 
Message-ID: <ABB18274684B9034895C8792CDCBF557225646F00@mail.company.com> 
X-MS-Has-Attach:  
X-MS-TNEF-Correlator:  
Thread-Topic: test 
Thread-Index: Acb0g/M9Jf/HoHDTRx+hqbGxUQ7qPA== 
From: "Last, First" <Jas@company.com> 
To: "Helpdesk" <helpdesk@company.com>, 
	"Email, Sample" <sample@company.com> 
Cc: "Copy, Carbon" <carboncopy@email.net>, 
	"Carbon, Second" <secondcarbon@inbox.org>

I finally figured out how to use the regex options in VB.Net, which has helped...I think. Now it returns the entire block below as one match:

Code:
From: "Last, First" <Jas@company.com> 
To: "Helpdesk" <helpdesk@company.com>, 
	"Email, Sample" <sample@company.com> 
Cc: "Copy, Carbon" <carboncopy@email.net>, 
	"Carbon, Second" <secondcarbon@inbox.org

I may be able to work with this, but ideally I'd like to return 3 matches: 1 for the "From" address, 1 for the "To" address(es), and 1 for the "CC" address(es).
 
It's not a one-liner, but it seems to work.

Imports System.Text.RegularExpressions

Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim line As String = "", from_to_cc As String = "", address As String = ""
Dim more_addresses As Boolean = False
Dim re As New Regex("^\s*(From:|To:|Cc:)*.*<(.*)>,*\s*$")

' Input.txt contains the sample header code
FileOpen(1, "Input.txt", OpenMode.Input)
Do While Not EOF(1)
line = LineInput(1)

If Regex.IsMatch(line, "^\s*(From:|To:|Cc:)") Then
from_to_cc = re.Replace(line, "$1")
address = re.Replace(line, "$2")

Debug.Print(from_to_cc & " -> " & address)

' trailing comma indicating more addresses
If Regex.IsMatch(line, ".*,\s*$") Then
more_addresses = True
Else
more_addresses = False
End If
ElseIf more_addresses Then
address = re.Replace(line, "$2")

Debug.Print(from_to_cc & " -> " & address)
End If
Loop
FileClose(1)
End Sub
End Class
 
'Revised ElseIf block

ElseIf more_addresses Then
address = re.Replace(line, "$2")

' Trailing comma indicating more addresses
If Regex.IsMatch(line, ".*,\s*$") Then
more_addresses = True
Else
more_addresses = False
End If

Debug.Print(from_to_cc & " -> " & address)
End If
 
Drederick,

Thanks for the code sample. I wasn't able to get it to work like you have it here. The regex wouldn't return any matches. I was able to come up with another regex that gave me what I needed. I took the results, starting with "From:" and put them into an array. From there it was a matter of sorting out which email addresses were under each heading.

Thanks to both of you guys for your help!

Jas
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top