Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

word counter? 1

Status
Not open for further replies.

GUJUm0deL

Programmer
Jan 16, 2001
3,676
US
Is there a JS version of a word counter that ignores html tags like <strong> and <i> or html tags like <div id="something"> or <p>

_____________________________
Just Imagine.
 
Hi

Theoretically that could be reduced to :
JavaScript:
document[teal].[/teal]body[teal].[/teal]textContent[teal].[/teal][COLOR=darkgoldenrod]match[/color][teal]([/teal][fuchsia]/\w+/g[/fuchsia][teal]).[/teal]length
But unfortunately that will not work correctly :
[ul]
[li]tag pair content like between [tt]<script>[/tt] and [tt]</script>[/tt] is kept[/li]
[li]words previously separated by tags get concatenated[/li]
[/ul]
( Beside those, Explorer does not know [tt]textContent[/tt], it uses the proprietary [tt]innerText[/tt]. )

So you will have to do it the hard way :
JavaScript:
document[teal].[/teal]body[teal].[/teal]innerHTML
  [teal].[/teal][COLOR=darkgoldenrod]replace[/color][teal]([/teal][fuchsia]/<script[\s\S]*?>[\s\S]*?<\/script\s*>/gi[/fuchsia][teal],[/teal][green][i]' '[/i][/green][teal])[/teal] [gray]// remove scripts[/gray]
  [teal].[/teal][COLOR=darkgoldenrod]replace[/color][teal]([/teal][fuchsia]/<style[\s\S]*?>[\s\S]*?<\/style\s*>/gi[/fuchsia][teal],[/teal][green][i]' '[/i][/green][teal])[/teal]   [gray]// remove styles[/gray]
  [teal].[/teal][COLOR=darkgoldenrod]replace[/color][teal]([/teal][fuchsia]/<\/?\w+[\s\S]*?>/gi[/fuchsia][teal],[/teal][green][i]' '[/i][/green][teal])[/teal]                      [gray]// remove tags[/gray]
  [teal].[/teal][COLOR=darkgoldenrod]replace[/color][teal]([/teal][fuchsia]/&.+?;/g[/fuchsia][teal],[/teal][green][i]''[/i][/green][teal])[/teal]                                   [gray]// remove character entities[/gray]
  [teal].[/teal][COLOR=darkgoldenrod]match[/color][teal]([/teal][fuchsia]/\w+/g[/fuchsia][teal])[/teal]                                          [gray]// get the words[/gray]
  [teal].[/teal]length                                                 [gray]// get their count[/gray]
But even will also fail as the character entities may evaluate to word characters or not :
[ul]
[li]El Ni&ntilde;o - word character, should be kept[/li]
[li]Hello&nbsp;World - non-word character, should be removed[/li]
[/ul]
Personally I know about no good ( are at least satisfactory ) way to handle the character entities.

Maybe you can get some hints from there anyway.


Feherke.
 
with jQuery you can get the text?

Code:
var renderedText = $('#element').text();

googling "jquery word counter" may help you out?

Cheers,
Daddy

-----------------------------------------------------
What You See Is What You Get
Never underestimate tha powah of tha google!
 
Thanks. I looked at some examples for jQuery word counter and saw that it did still counted the <a href> tag as part of the wording.

For example:
hello my name is <a href="a.html">bob</a>

Came back with 6 words, and not 5.

_____________________________
Just Imagine.
 
Code:
<div id="test">
 hello my name is <a href="a.html">bob</a>
</div>

alert($('#test').text());

gives me "hello my name is bob".
5 words?

/Daddy



-----------------------------------------------------
What You See Is What You Get
Never underestimate tha powah of tha google!
 
One approach you can take is to loop through each node, and count the words if it's a text node, or loop through the child nodes if it's not.

This requires a few extra details, for example some elements you want to skip (<script> tags, for example), some you want the "value" instead, etc. But with a few minor ammendments this algorithm seems to work pretty well.

I've made a testing version of it here:

It uses the regular expression (/\w+/g) which means find a sequence of word characters (\w), of any length (+) and don't stop after the first match (g for global).

There's more explanation in the source-code at the link. Let me know if anything is unclear
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top