mountainbiker
Programmer
Are the any good modules or nice ways to strip HTML tags, entities, multiple whitespace, and leading/trailing space from a string.
I want to take the first 1024 characters of the first paragraph of text, strip the unwanted data and stuff it into a meta element
<meta name="description" content="...." />
and an RSS element
<description>....</description>
For example, my unfiltered string is 'here is tom & jerry'. If I tried to place this into the above RSS description element above, it would cause an invalid XML format to be created.
I could wrap the string with "CDATA", but this robs me of characters I could be using in the description.
I could do a bunch of regular expressions, but I would think that this has already been tackled--so there must be a refined, robust, elegant solution.
I want to take the first 1024 characters of the first paragraph of text, strip the unwanted data and stuff it into a meta element
<meta name="description" content="...." />
and an RSS element
<description>....</description>
For example, my unfiltered string is 'here is tom & jerry'. If I tried to place this into the above RSS description element above, it would cause an invalid XML format to be created.
I could wrap the string with "CDATA", but this robs me of characters I could be using in the description.
I could do a bunch of regular expressions, but I would think that this has already been tackled--so there must be a refined, robust, elegant solution.