Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

extracting data from web site 1

Status
Not open for further replies.

aarrgghh

Technical User
Sep 5, 2002
60
US
Hey all,

Does anyone know how I can search a paragraph for a particular string and delete all text that occured before that particular string?

Thanks in advance!
 
Yes, we do. Regular expressions are the way to go - it might look like black magic in the beginning, however, once you get into it it's extremely powerful.

Just post an example for the type of paragraph. I'll help you with the regex.
 
Hello Dr.,

I want to erase everything before the word checkmark below.

<-------------------- // ----------------------->

You were notified of this activity because you requested it. To turn
off notification, log in to the Tek-Tips site click on the red checkmark
next to the &quot;extracting data from web site&quot; thread. Contact
sitesupport@tek-tips.com if you have any problems or questions
regarding this feature.

Thanks for the help!
 
Ok, here it is:
Code:
preg_match('/^.*(checkmark.*)/s',$text,$mArray);

There is an explanation necessary:
This expression uses the 's' pattern modifier. That means the dot (.) also includes newline characters. If you remove the 's', it will stay within a line. Now, if that's a paragraph or not, I don't know. It depends on the formatting of your graphic.

You will find the extracted text in $mArray[1], which is the first subexpression of the regex.
 
Great! Thanks. How would it work if I wanted to erase everything after checkmark?
 
It just has to do how you set the sub-expressions. Every sub-expression (I'm not talking sandwiches) is delimited by (). So all you do is to move them:
Code:
preg_match('/^(.*checkmark).*/s',$text,$mArray);
Again, $mArray[1], where [1] stands for the content of the first sub-expression, will hold the result.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top