Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Saving Text from Email _Cliptext to Cursor 8

Status
Not open for further replies.

David Higgs

Programmer
May 6, 2012
392
GB
I receive an Email containing Data to update a Database.

The format of the Email is:

Code:
Date          Number       Callsign Area  Notes
01/01/2020    52931        M9TEK    SP83
05/01/2020    52932        GW9TIP   TL14  Any additional information here

Although the Date and Number Fields are constant width, the others can be of variable length. Another complication is that the number of spaces between each Field can also vary in different Emails.

The Email Data is copied to the Clipboard (_Cliptext). I envisaged using STREXTRACT to populate the Cursor Fields, but how can I deal with the variable spaces?

Any help would be most appreciated.



Regards,

David.

Recreational user of VFP.
 
I have made a few alterations to the original REGEX Code so that it reflects some changes made to the format in the information I receive from a Third Party.

Code:
(\d{2}\/\d{2}\/\d{4})\s+(\d+)\s+([A-Z0-9\/]+)\s+([A-Z0-9\/]+)\s*(.*)$"

The latest code I use is:

Code:
m.RegExp.Pattern = "^\s*(\d{1,}\/\d{1,}\/\d{2,})\s+(\d+)\s+([A-Z0-9\/]+)\s+([A-Z0-9\/]+)\s*(.*)$"

I would like to change ([A-Z0-9\/]+) in the above code to accept e.g. SX86 or a Blank String.

Any help would be most appreciated.


Regards,

David.

Recreational user of VFP.
 
You have two places with ([A-Z0-9\/]+) and SX86 does not tell me which one you could mean.

So what column number (match number) should be allowed to be blank alternatively to a match of its usual expression?

Can you give an example line of data you'd like to match? That would be the simplest way to tell us.

Bye, Olaf.





Olaf Doschke Software Engineering
 
David, assuming that a Blank String cannot be followed by notes and it's only possible for the Area column:

Code:
m.RegExp.Pattern = "^\s*(\d{1,}\/\d{1,}\/\d{2,})\s+(\d+)\s+([A-Z0-9\/]+)[highlight #FCE94F]([/highlight]\s+([A-Z0-9\/]+)\s*(.*)[highlight #FCE94F])?\s*[/highlight]$"

Please note that the grouping of the expression changes, therefore the Area column now receives data from the Submatch(4), and Notes from Submatch(5).
 
If you mean the area code, my idea would be to give blank as the alternative match of the area code:

Code:
m.RegExp.Pattern = "^\s*(\d{1,}\/\d{1,}\/\d{2,})\s+(\d+)\s+([A-Z0-9\/]+)\s+([A-Z0-9\/]+[highlight #FCE94F]|\s+[/highlight])\s+(.*)$"

Match count, grouping/numbering stays the same.

Bye, Olaf.

Olaf Doschke Software Engineering
 
atlopes,

Your revised code did not Import any Data.

I wasn't sure how to modify the Submatches, here is my latest code prior to any modifications. Submatch 3 replaces an Area Code of "N/A" with " " (4 Spaces)

Code:
IF m.Match.Submatches.Count = 5

			INSERT INTO cNewBooks ;
				VALUES (CTOD(m.Match.Submatches.Item(0)), ;
						 VAL(m.Match.Submatches.Item(1)), ;
					     	 m.Match.Submatches.Item(2), ;
					     	 IIF(m.Match.Submatches.Item(3) == "N/A","    ", m.Match.Submatches.Item(3)), ;	
					   	 	 m.Match.Submatches.Item(4))					   	 	 				   	 	 					   	 	 				   	 	 				   	 	 					   	 	 
					   	 	 
		ENDIF

Please be advised, I no longer use the Notes Field so although the Notes are still being received it is ignored.

Sample Data

Date Number Callsign Area Notes
01/01/2020 52931 M9TEK SP83
05/01/2020 52932 GW9TIP TL14 Any additional information here
06/06/2020 62014 MM9ATK

NOTE As previously, the number of spaces between fields can vary.

I would like to change the RegExp which currently accepts 3 or 4chr Area Codes to accept a Blank Area Code. See last entry in above Sample Data.



Regards,

David.

Recreational user of VFP.
 
Olaf,

Your code only imported entries that had 3 spaces between Callsign & Area. All other entries with 1 or 2 spaces were ignored.

Regards,

David.

Recreational user of VFP.
 
David,

You're not taking into account that now there is a different number of groups in the regular expression, and they may differ depending on the incoming data.

My expression gets all your cases.

Code:
LOCAL Source AS String

TEXT TO m.Source NOSHOW
01/01/2020 52931 M9TEK SP83
05/01/2020 52932 GW9TIP TL14 Any additional information here
06/06/2020 62014 MM9ATK
ENDTEXT

SET DATE TO DMY
SET SEPARATOR TO "/"

CREATE CURSOR DataStore (Date Date, Number Int, Callsign Varchar(20), Area Varchar(20), Notes Memo)

LOCAL ARRAY Lines(1)
LOCAL LineIndex AS Integer

LOCAL RegExp AS VBScript.RegExp

m.RegExp = CREATEOBJECT("VBScript.RegExp")
m.RegExp.Pattern = "^\s*(\d{1,}\/\d{1,}\/\d{2,})\s+(\d+)\s+([A-Z0-9\/]+)(\s+([A-Z0-9\/]+)\s*(.*))?\s*$"

FOR m.LineIndex = 1 TO ALINES(m.Lines, m.Source)

	m.Matches = m.RegExp.Execute(m.Lines(m.LineIndex))

	IF m.Matches.Count = 1
	
		m.Match = m.Matches.Item(0)

		[highlight #EDD400]IF m.Match.Submatches.Count >= 3[/highlight]

			INSERT INTO DataStore ;
				VALUES (CTOD(m.Match.Submatches.Item(0)), ;
					VAL(m.Match.Submatches.Item(1)), ;
					m.Match.Submatches.Item(2), ;
					[highlight #EDD400]IIF(m.Match.Submatches.Item(4) == "N/A", "", NVL(m.Match.Submatches.Item(4), "")), ;[/highlight]
					[highlight #EDD400]NVL(m.Match.Submatches.Item(5), "")[/highlight])

		ENDIF

	ENDIF

ENDFOR

BROWSE
 
Correcting my previous post: the number of groups in the regular expression being six, the collection of sub-matches will always be six for a matched expression.
 
Then you have to allow some s* where you had s+ before, as you need to not just allow an intermediate blank, but an early end of line.

I put together this and it extracts 5 times in the sample. You should try one without Area and still a note, even though you don't store them anymore, it should still match the pattern and not be skipped.

Code:
clear

* pattern for Date, Number, Callsign, Area, Notes
Text To lcMail NoShow
Dear David,

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis rhoncus ante eu pellentesque rutrum. Nam sit amet scelerisque
arcu. Maecenas suscipit tincidunt dui id aliquam. Aenean lobortis ullamcorper euismod. Suspendisse at sapien rhoncus,
ultrices lacus imperdiet, porttitor lectus. Maecenas ut urna elementum, ullamcorper magna vitae, fermentum tortor. Quisque
dui mi, lacinia in mollis sed, imperdiet a leo. In in laoreet ante. Aliquam tortor ligula, efficitur ut erat quis, imperdiet
rutrum turpis. Nunc ultrices euismod nibh, in faucibus mauris. Nunc rutrum mauris diam, eget tempus lorem suscipit non. In
malesuada risus vel erat fringilla, sed aliquam risus sodales.

01/01/2020 52931 M9TEK SP83
05/01/2020 52932 GW9TIP TL14 Any additional information here
06/06/2020 62014 MM9ATK
EndText

m.RegExp = CREATEOBJECT("VBScript.RegExp")
m.RegExp.Pattern = "^\s*(\d{1,}\/\d{1,}\/\d{2,})\s+(\d+)\s+([A-Z0-9\/]+)\s*([A-Z0-9\/]+|\s*)\s*(.*)$" 

FOR m.LineIndex = 1 TO ALINES(paLines, m.lcMail)
   m.Matches = m.RegExp.Execute(m.paLines(m.LineIndex))
   IF m.Matches.Count = 1
      m.Match = m.Matches.Item(0)
      IF m.Match.Submatches.Count >0
         ? m.paLines(m.LineIndex)
         ? 'has '+Transform(m.Match.Submatches.Count)+' items:'
         For lnCount = 1 to m.Match.Submatches.Count
             ? m.Match.Submatches.Item(lnCount-1)
         EndFor 
         ? '---'
      ENDIF
   ENDIF
ENDFOR

Bye, Olaf.

Olaf Doschke Software Engineering
 
atlopes said:
You're not taking into account that now there is a different number of groups in the regular expression, and they may differ depending on the incoming data.

Although I've done a little studying on RegExp I need to study your code a little further. I managed to modify your original code to accept dates in dd/mm/yyyy or d/m/yy formats. Small steps!

atlopes said:
My expression gets all your cases.

It does indeed, many thanks for your help. I've inserted the relevant parts of your code into my application and all is working well. The latest data I received today identified another issue in that the callsign field could contain a persons name which contains a period (.) instead of a callsign. eg BANKS.A, Banks.D or Trump.D

Anyway, once again, thank you very much for you help, much appreciated.

Regards,

David.

Recreational user of VFP.
 
Olaf said:
Then you have to allow some s* where you had s+ before, as you need to not just allow an intermediate blank, but an early end of line.

Thank you for your code. I'm compiling a list of the RegExp's and comparing the differences in my learning process, so your input is most appreciated. I break the code down into each field and work out what each piece of code does. I knew of the " | " char to match a pattern or an empty string, but couldn't quite get it to work.


Regards,

David.

Recreational user of VFP.
 
David said:
The latest data I received today identified another issue in that the callsign field could contain a persons name which contains a period (.) instead of a callsign. eg BANKS.A, Banks.D or Trump.D

This,

Code:
m.RegExp.Pattern = "^\s*(\d{1,}\/\d{1,}\/\d{2,})\s+(\d+)\s+([A-Z0-9\/]+[highlight #FCE94F]|[A-Za-z]+\.[A-Z][/highlight])(\s+([A-Z0-9\/]+)\s*(.*))?\s*$"

will accept a name in the form you described instead of a regular call sign.
 
atlopes said:
This Code will accept a name in the form you described instead of a regular call sign.

Thank you so much, you have been a great help to me, not only in this thread but also previous threads where I have had issues with importing and modifying data. Much appreciated.

I have documented all of the RegExp variations I've used and will break them down so that I can learn from it.

Regards,

David.

Recreational user of VFP.
 
Another idea to keep the patterns simpler (also simpler to maintain and understand) is to test them separately and look out for rows where all patterns match as full matches.

It doesn't pay to strip it down to testing all the single item patterns, especially as CALLSIGN and AREA have the same. You'd introduce the task of puzzling together which is what again, but still this can be a nice recipe to "unstir" the soup in two passes instead of figuring out the one regexp for all items.

Atlopes has made that a fine single pattern and I don't mean to degrade this because of its necessary aftermath, but here's the idea in general to easier put together what you need:

Code:
#Define cSTART '^'
#Define cEND '$'

#Define cDATE '\d{1,}\/\d{1,}\/\d{2,}'
#Define cNUMBER '\d+'
#Define cSPACES '\s+'
#Define cOPTIONALSPACES '\s*'
#Define cUPPERALPHANUMERICSLASHES '[A-Z0-9\/]+'
#Define cNAME '[A-Za-z]+\.[A-Z]'
#Define cCALLSIGN cUPPERALPHANUMERICSLASHES+'|'+cNAME
#Define cAREA '[A-Z0-9\/]{4}'

#Define cANYTHING '.*'

TEXT To lcMail NoShow
Dear David,

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis rhoncus ante eu pellentesque rutrum. Nam sit amet scelerisque
arcu. Maecenas suscipit tincidunt dui id aliquam. Aenean lobortis ullamcorper euismod. Suspendisse at sapien rhoncus,
ultrices lacus imperdiet, porttitor lectus. Maecenas ut urna elementum, ullamcorper magna vitae, fermentum tortor. Quisque
dui mi, lacinia in mollis sed, imperdiet a leo. In in laoreet ante. Aliquam tortor ligula, efficitur ut erat quis, imperdiet
rutrum turpis. Nunc ultrices euismod nibh, in faucibus mauris. Nunc rutrum mauris diam, eget tempus lorem suscipit non. In
malesuada risus vel erat fringilla, sed aliquam risus sodales.

01/01/2020 52931 M9TEK SP83
05/01/2020 52932 GW9TIP TL14 Any additional information here
06/06/2020 62014 MM9ATK
13/13/0823 12345 Trump.D SP83 Bulls**t
13/13/0823 12345 Trump.D      F*** ***s
05/06/2020 345789 .................
ENDTEXT

m.RegExp = Createobject("VBScript.RegExp")

* extract Date, Number, Callsign, Area
For m.LineIndex = 1 To Alines(paLines, m.lcMail)
   * PASS 1, detecting DATE and NUMBER
   m.RegExp.Pattern = cSTART + cOPTIONALSPACES + '('+cDATE + ')' + cSPACES + '(' + cNUMBER + ')' + cANYTHING + cEND
   m.Matches = m.RegExp.Execute(m.paLines(m.LineIndex))
   If m.Matches.Count = 1
      m.Match = m.Matches.Item(0)
      If m.Match.Submatches.Count = 2
         lcDate   = m.Match.Submatches.Item(0)
         * You could veryfy valid date here easier than in regexp by trying to cast to date.
         lcNumber = m.Match.Submatches.Item(1)
      Endif
      * remove Date and number for PASS 2 matching
      m.paLines(m.LineIndex) = Strtran(m.paLines(m.LineIndex),lcDate,'',1,1,2)
      m.paLines(m.LineIndex) = Strtran(m.paLines(m.LineIndex),lcNumber,'',1,1,2)

      * PASS 2, CALLSIGN and AREA
      m.RegExp.Pattern = cSTART + cOPTIONALSPACES + '(' + cCALLSIGN + ')' + cSPACES + '(' + cAREA + '|' + cSPACES+ ')' + cANYTHING + cEND
      m.Matches = m.RegExp.Execute(m.paLines(m.LineIndex))
      If m.Matches.Count = 1
         m.Match = m.Matches.Item(0)
         If m.Match.Submatches.Count = 2
            lcCallsign = m.Match.Submatches.Item(0)
            lcArea     = m.Match.Submatches.Item(1)
            ? 'Date:',lcDate, 'Number:',lcNumber,'Callsign:', lcCallsign, 'Area:',lcArea
            * Here you have a full record and can insert it.

            Insert Into DataStore ; && (date, number,. callsigm, area, note)
            Values (Ctod(lcDate), ;
               VAL(lcNumber), ;
               lcCasllsign, ;
               IIF(lcArea == 'N/A', '', Nvl(lcArea, '')), ;
               '')
         Endif
      Else
         * You might do something with Date and Number here anyway.
         * For example, store it with the rest of the information in m.paLines(m.LineIndex) for a Memo for later manual or extended regex parsing
         ? 'Only date and number:', lcDate, lcNumber
      Endif
   Endif
Endfor

It obviously is more code, but perhaps easier to understand. I intentionally didn't put the brackets defining submatches in the constants, because the single expressions can be combined, like it's the case for CALLSIGN and AREA, which each have 2 alternative patterns commbined.

You may want to mend the SQL-Insert, if you don't store the note anymore. What I did there with the constant '' as the note was necessary because the way it is the insert needs values for all fields, including '' for the non-parsed note.

I could say something about dismissed ELSE branches, but the poor guy who has to read everything I write to get angry about all this Bulls**t will already be angry enough.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Hello Olaf,

Thank you for your explanation on RegExp and for your comments in each stage of the code. That will be useful to help me better understand how it all works.

It’s interesting that you make use of #Define, it was my intention to further expand my VFP knowledge by taking a look at #Define. Your code has shown me how useful it is.

To get the code to work, I created a Cursor DataStore and removed the extra “s” from lcCasllsign and all ran as expected. Did I pass the Test?

As you say a little more code but a little easier to understand, which is the main thing. I’ll study your code further at my leisure.

I did notice the unnecessary reference to Bulls**t on the other thread; I’d sooner read ten of your threads than that rubbish.


Regards,

David.

Recreational user of VFP.
 
Thanks, David.

And take your time with going through this, it's still quite convoluted.

In short, the idea of the #DEFINEs is to have the simple single patterns you put together to larger patterns including the definition of where sub-matches are to be found in the normal round brackets. Therefore you find the highest level composition within the code, not within the definitions. For example the extension of the previous area pattern to alternatively also be blank as '(' + cAREA + '|' + cSPACES+ ')'.

And the defines are just simple constants here. Placeholders, which give the regexp a name. If you want to change a definition you can now do it at that place of definitions, and you may add your repository of expressions here.

Regarding constants, the usual name convention is to use all UPPERCASE names for them, because when you finally put constants into include files and their definition becomes "invisible" that way, unless you open the .h include file in parallel, it would not become apparent they are constants when written in normal lower or mixed case. I prefixed them with c, because DATE, NUMBER, and SPACE are reserved words and color the code wrong unintentionally.

Anyway, that part could also be done with variables. There's just no reason to set variables to values that never change, well, at least not within one run, but over time you need to cover more cases like the 'N/A' surely added the need to allow slashes.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Late to the conversation.. just to add a reminder.. in VFPX ( )
you can find a Regex tool I shared to check as you type your regular expressions.

You can test against your sample text and get the matches directly highlited
in editor and the match list.

regex_bcmmzl.jpg







Marco Plaza
@nfoxProject
 
Great tool. I'd like to see a building block concept in there, too. Also naming the groups and submatches would be helpful. Just don't have the time right now.

And a good reminder why I don't like \w for word characters, as I rarely have digits in words, and underscores, too. Otherwise a great shortener for [A-Za-z0-9_]. You can of course live with a little watering of matches when they never happen or you can spot them in aftermath. Allowing slash to also see N/A as callsign or area in David's case also is such watering, if you only want to allow slashes in the context of N/A or n/a and no other combination of letters and digits with slashes. And a date expression rejecting a 31st of a month not having 31 days? Possible, because its proven regexp are turning complete, which means you could program with them theoretically. (see for example
Refining patterns and make them more strict makes them lengthier, though, and less readable, but naming could turn this to the better, even if names are longer than the expressions, they are better human readable/understandable/maintainable.

Anyway, all this doesn't defy my first two words. Great tool.

Bye, Olaf.

Olaf Doschke Software Engineering
 
mplaza said:
Late to the conversation.. just to add a reminder.. in VFPX ( )
you can find a Regex tool I shared to check as you type your regular expressions.

Marco,

Thank you very much for sharing the link to your VFPRegExpTool Utility.

I have just downloaded the utility and can see that it will be a very useful tool in learning how to compile RegExp.

I've only just downloaded the utility so I need to check if there is a description of the various colours used to display the results.

Thanks again for sharing a great utility.

Regards,

David.

Recreational user of VFP.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top