Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extracting a String

Status
Not open for further replies.

white605

Technical User
Jan 20, 2003
394
US
This code uses regexp to extract a string, in this case a domain without the dot com. I will eventually use this type of code to extract a large string (newsstory) from an even longer string (source of webpage)
1. is there a way to get at the submatches without the for/endfor loops as I dont know how to pass the (0).SubMatches(0) via vfp directly but can get at them with the for/endfor loops
2. is there a better approach to extracting a string from within a larger string using only vfp

Code:
LOCAL RegEx, strTemp

RegEx = CREATEOBJECT("VBScript.RegExp")

RegEx.Pattern = "@(\w+)\."

Regex.Global = .t.

RegEx.IgnoreCase = .t.

strTemp = "emailaddress@domain.com"

oMatches = RegEx.Execute(strTemp)

FOR EACH thing IN oMatches
    FOR EACH foo IN thing.submatches
        ?foo   &&returns  "domain"
    ENDFOR
ENDFOR


*!*	vbscript code
*!*    Dim RegEx : Set RegEx = New RegExp
*!*    RegEx.Pattern = "@(\w+)\."
*!*    RegEx.Global = True
*!*    RegEx.IgnoreCase = True
*!*    Dim strTemp : strTemp = "anyaddress@anydoamin.com"
*!* ->   WScript.Echo RegEx.Execute(strTemp) (0).SubMatches(0) <-
thanks in advance from a confused wjwjr as these regexp make my head hurt a bit!

This old world keeps spinning round - It's a wonder tall trees ain't layin' down
 
I am looking at this code from foxite
Code:
CREATE CURSOR xx (title c(254))
INSERT INTO xx VALUES ("Unable to determine IP address from host name")
INSERT INTO xx VALUES ("am having some issues trying to determine the status of a host within a task.")
INSERT INTO xx VALUES ("host is not after determine in this one or doesn't exist")
INSERT INTO xx VALUES ("This one is to DETERMINE if it can catch case free Host case")

LOCAL oRX
oRX = Createobject("VBScript.RegExp")
With oRX
  .Pattern = "\bdetermine\b(.+)\bhost\b"
  .IgnoreCase = .T.
  .Global = .T.
ENDWITH

SELECT CAST( GetFirstOccurence(oRX,title) as c(254)) FROM xx WHERE oRX.test(title)

FUNCTION GetFirstOccurence(toRX,tcString)
RETURN toRX.Execute(m.tcString).Item(0).Value
and from help
Code:
Retrieves a string between two delimiters. 

 
STREXTRACT(cSearchExpression, cBeginDelim [, cEndDelim [, nOccurrence
[, nFlag]]]])
As i said these things make my head hurt

This old world keeps spinning round - It's a wonder tall trees ain't layin' down
 
Assuming that all of the final extensions would be ".COM"
you could utilize something like:

Code:
lcString = "JohnJones@MyDomain.com"
lcDomain = SUBSTR(lcString,at("@",lcString)+1, (AT(".COM",UPPER(lcString))-1) - at("@",lcString))
?lcDomain

From that you can modify it as needed from here and incorporate it as needed.

Good Luck,
JRB-Bldr
 
[&nbsp;]

One way would be to use a UDF;

[TT][BLUE]
lcString = "JohnJones@MyDomain.com"
lcDomain = ATXRIGHT(lcString, "@", ".com")
?lcDomain
[/BLUE]
[/TT]

This assumes that you are looking for everything between "@" and ".com"

Code for this UDF is found at: faq184-5975

mmerlinn


"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond
 
Thanks for the replies. We've gotten the regex working (with the help of my son) but I will look at each method as they are regular vfp commands. I will post the code we end up with.

Is there a downside to using the regular expressions thru the object, or are they just harder to get a handle on?
wjwjr

This old world keeps spinning round - It's a wonder tall trees ain't layin' down
 
Regular Expressions are more powerful, as you can define eg a certain group of chars that can match, in general patterns, not only concrete strings.

This flexibility is paid by a harder understanding of the topic and how to build working regular expressions.

As you say you want to extract some text of a html page, there would be more ways to do so, eg loading the html into an automatable browser, eg IE, where it will build up the document object model. Then you can simply stripe off all html tags and get a net tet by inspecting the ...body.innertext

If the text is always within a certain html tag, eg some DIV class or DIV id, you can easily use STREXTRACT with the left and right delimiter, eg

STREXTRACT(lcHTML,'<div id='text'>','</div>') will extract what's inside the div tag with id 'text'.

The advantage of the DOM and innertext is, you get all text, no matter how the html may change.

I wouldn't use RegExp for such a job, but I'm no regular expressions guy.

For the email, the STREXTRACT() was already given by mmerlin. But it would only work with '.com' as the ending, you'd need to change to '.net', '.org' etc. to make these TLDs work for email domain parsing. But then you could also simply set the right delimiter to '.' - unfortunately there may be two dots in the domain part of an email adress too, and you'd only want to stripe off the TLD, after the most right dot. In this case, RegExps can do wonders, no DOM or StrExtract can do.

Bye, Olaf.
 
Just to add to Olaf's good advice ....

I can't recall ever needing to use regular expressions in VFP. The FoxPro language's string-handling functions can handle just about any job that can be thrown at them. I'd strongly suggest looking for a solution with the native functions before turning to external tools.

Note that I make no comment on which approach is more efficient or requires less coding. It's a question of using the tools that you are most comfortable with.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top