Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

substr to the nearest word 1

Status
Not open for further replies.

jedel

Programmer
Jan 11, 2003
430
AU
Hi all,

Just working on a project of mine and I've been learning how to capture a segment of text using the substr function.
Problem with this is it counts characters.

a few questions:

1. Will this code ignore HTML tags?
2. Is there a way where you can capture to the nearest word?
3. If substr is not the right function, what is?

Cheers

Dean

-------------------------------------------------------------
"The most overlooked advantage of owning a computer is that if they foul up there's no law against whacking them around a bit."
 
Strip_tags is definately one of the functions I need, but then I need to get say the first paragraph, or the first 20 words from a text so that my site will not grows disproportionately. The phrase does not need to make to much sense (although, the users will soon learn to keep their phrases shorter), it just has to be around 20 words long.

-------------------------------------------------------------
"The most overlooked advantage of owning a computer is that if they foul up there's no law against whacking them around a bit."
 
Hi

You want kind of excerpt from articles/comments, suitable to put in a list/table of contents ?
Code:
$excerpt=preg_replace("/((\S+\s+){1,20}).*/s","\\1",strip_tags($text));

Feherke.
 
BANG!!! That was perfect! Excatly what I wanted. I'm very raw with the regex and still trying to get my head around it. Seems like its a whole language of it own.

Can you break it down for me? What I have is
match one or more of the:
whitespace characters (S+)
non whitespace characters(s+)

upto 20 characters{1,20}

as well as 0 or more of any other character except a new line (.*)

That's about it. can't figure out the rest


-------------------------------------------------------------
"The most overlooked advantage of owning a computer is that if they foul up there's no law against whacking them around a bit."
 
Hi

jedel said:
match one or more of the:
whitespace characters (S+) [gray]nope, that means non-whitespace character[/gray]
non whitespace characters(s+) [gray]nope, that means whitespace character[/gray]

upto 20 characters{1,20} [gray]nope, that means 1..20 of previous entity, in our case, the previous group[/gray]
See the PCRE documentation.

Feherke.
 
If you are working with words, this function may in some cases be preferable to REGEXs. It takes of word boundaries for you. Note the '2' parm.

Code:
<?PHP
  $str1 = 'hello today is nice';
$SomeWords = str_word_count($str1, 2);  //put each word in an element of array 
echo '<br> $SomeWords=';
var_dump($SomeWords);
?>

OUTPUT:

$SomeWords=array(4) { [0]=> string(5) "hello" [6]=> string(5) "today" [12]=> string(2) "is" [15]=> string(4) "nice" }
 
sen5241b

I have read your post with jpadie and I did try this method and successfully placed my text into an array of some 100 or so words. I also did exactly the same thing with the explode function.

My problem came when i wanted to just take the first 20 words of the array and implode them. I couldn't split up the array.

I'd be happy to try a new method if you could explain how it is done?

I have just had a thought... is it as simple as just placing the first 20 keys into another array variable?

Code:
$shorttxt = array('1'=> $strArray[0] - $artArray[19]);
$newstr = implode(" ",$shorttxt;)

-------------------------------------------------------------
"The most overlooked advantage of owning a computer is that if they foul up there's no law against whacking them around a bit."
 
or you could get the string of (say) 100 words and in a loop read along this string and keep count of how many spaces you encounter (but probabbly ignore multiple spaces).
For each character as you go append it to an output string, when you get to 20 spaces you have 20 words in your output string
 
feherke,

Just tried that array_slice code and that worked just as well. In fact, it may have been a little better than using the regex. I found with the regex, any punctuation in the text messed up the code a little. here is the fucntion I set up:

Code:
	function strip_text($str,$start,$end)
	{
		$body = explode(" ",$str);
		$intstr = array_slice($body,$start,$end);
		$newstr = implode(" ", $intstr);
		return $newstr;
        }


-------------------------------------------------------------
"The most overlooked advantage of owning a computer is that if they foul up there's no law against whacking them around a bit."
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top