Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

count string 2

Status
Not open for further replies.

hunt00

Technical User
Mar 6, 2005
79
0
0
US
Is there any way to count string byte...?
 
thanks... doesnt str.length count character instead of the byte?
 
Hi

A very ugly workaround which works with a mixture of one and two byte characters.
Code:
[blue]>>> s='tek-?????? forums'[/blue]
[red]"tek-?????? forums"[/red]
[blue]>>> s.length[/blue]
[COLOR=#008]17[/color]
[blue]>>> escape(s.replace(/%/g,'.')).replace(/%u..../g,'..').replace(/%../g,'.').length[/blue]
[COLOR=#008]23[/color]
Seems to work in Mozillas, Opera, Safari and Explorer.

Feherke.
 
Hi, Thanks very much for this one line of code! I am afraid probably will need your explaination to help me understand about it...if possible

Thanks greatly!
 
Hi

First of all, I asked Google Translate about "tips" in Russian. Regarding the FireBug Console, I am sure you guessed that my inputs are displayed in [blue]blue[/blue], while the String results in [red]red[/red]. ( In my previous code the numeric result were in dark blue. ) Beside the Console features I will insert my comments in [COLOR=gray #eee]gray[/color].
Code:
[blue]>>> s='tek-?????? forums'[/blue]
[red]"tek-?????? forums"[/red]
[COLOR=gray #eee]// the escape() function URLEncodes the String; this means for characters other than allowed [sup](*)[/sup] are replaced with percent sign ( % ) and their hexadecimal code[/color]
[blue]>>> escape(s)[/blue]
[red]"tek-%u0441%u043E%u0432%u0435%u0442%u044B%20forums"[/red]
[COLOR=gray #eee]// Unicode characters are represented as "%u...." where the 4 dots represents the 4 hexadecimal digits of the 2 byte character code; so we replace all those with 2 regular characters for easier counting[/color]
[blue]>>> escape(s).replace(/%u..../g,'..')[/blue]
[red]"tek-............%20forums"[/red]
[COLOR=gray #eee]// normal but not allowed characters are represented as "%.." where the 2 dots represents the 2 hexadecimal digits of the 1 byte character code; so we replace all those with 1 regular character for easier counting[/color]
[blue]>>> escape(s).replace(/%u..../g,'..').replace(/%../g,'.')[/blue]
[red]"tek-.............forums"[/red]
[sup](*)[/sup] - characters allowed in URLEncoded text : upper- and lowercase letters, digits, dash ( - ), underscore ( _ ), period ( . ) and tilde ( ~ ).

Because the character codes are prefixed with percent sign, the percent signs of the original String have to be replaced with any "harmless" character, to avoid messing of the [tt]replace()[/tt]s later. ( I skipped this step in the explanation for keep the code cleaner. )

Note that the order of replacing the character codes is important : first the larger Unicode characters have to be processed.

Feherke.
 
Thanks so much for the clear explaination, Feherke!! I hope you have a great day!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top