Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

count string 2

Status
Not open for further replies.

hunt00

Technical User
Mar 6, 2005
79
US
Is there any way to count string byte...?
 
thanks... doesnt str.length count character instead of the byte?
 
Hi

A very ugly workaround which works with a mixture of one and two byte characters.
Code:
[blue]>>> s='tek-?????? forums'[/blue]
[red]"tek-?????? forums"[/red]
[blue]>>> s.length[/blue]
[COLOR=#008]17[/color]
[blue]>>> escape(s.replace(/%/g,'.')).replace(/%u..../g,'..').replace(/%../g,'.').length[/blue]
[COLOR=#008]23[/color]
Seems to work in Mozillas, Opera, Safari and Explorer.

Feherke.
 
Hi, Thanks very much for this one line of code! I am afraid probably will need your explaination to help me understand about it...if possible

Thanks greatly!
 
Hi

First of all, I asked Google Translate about "tips" in Russian. Regarding the FireBug Console, I am sure you guessed that my inputs are displayed in [blue]blue[/blue], while the String results in [red]red[/red]. ( In my previous code the numeric result were in dark blue. ) Beside the Console features I will insert my comments in [COLOR=gray #eee]gray[/color].
Code:
[blue]>>> s='tek-?????? forums'[/blue]
[red]"tek-?????? forums"[/red]
[COLOR=gray #eee]// the escape() function URLEncodes the String; this means for characters other than allowed [sup](*)[/sup] are replaced with percent sign ( % ) and their hexadecimal code[/color]
[blue]>>> escape(s)[/blue]
[red]"tek-%u0441%u043E%u0432%u0435%u0442%u044B%20forums"[/red]
[COLOR=gray #eee]// Unicode characters are represented as "%u...." where the 4 dots represents the 4 hexadecimal digits of the 2 byte character code; so we replace all those with 2 regular characters for easier counting[/color]
[blue]>>> escape(s).replace(/%u..../g,'..')[/blue]
[red]"tek-............%20forums"[/red]
[COLOR=gray #eee]// normal but not allowed characters are represented as "%.." where the 2 dots represents the 2 hexadecimal digits of the 1 byte character code; so we replace all those with 1 regular character for easier counting[/color]
[blue]>>> escape(s).replace(/%u..../g,'..').replace(/%../g,'.')[/blue]
[red]"tek-.............forums"[/red]
(*) - characters allowed in URLEncoded text : upper- and lowercase letters, digits, dash ( - ), underscore ( _ ), period ( . ) and tilde ( ~ ).

Because the character codes are prefixed with percent sign, the percent signs of the original String have to be replaced with any "harmless" character, to avoid messing of the [tt]replace()[/tt]s later. ( I skipped this step in the explanation for keep the code cleaner. )

Note that the order of replacing the character codes is important : first the larger Unicode characters have to be processed.

Feherke.
 
Thanks so much for the clear explaination, Feherke!! I hope you have a great day!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top