Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding DoubleByte Characters in a word document.

Status
Not open for further replies.

Ziggurat

Programmer
Jun 6, 2001
81
GB
Hi All,

I don't think that this question has been asked, but if it has my appologies.

I am currently doing some translation work in Japan. I have to translate a Japanese Word document into an English Word document and then create a help file and pdf file. The help and pdf files should not contain any Japanese characters otherwise a person who opens the document will need to download a language update to display them.

I have tried to change all the characters in the document to an English font by selecting all the characters but I still keep finding that some Japanese double byte characters remain. The most difficult ones to find are the double byte space characters.

So my question is: Does anyone know of a good way to locate all the Double byte characters in a word document ?
or is there another way that I can find the location of the double byte characters ?

There is another alternative which is to convert the fonts to Unicode on an English operating system but I don't want to do this if possible.


Thanks in advance



#
###
#####
#######
Ziggurat
 
This is a complete guess but can you do a Find for the Shift-In and Shift-Out characters (if I recall, on IBM they are x'0D' and x'0E' - I presume they're the same in Word)

Enjoy,
Tony

--------------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.
Excel VBA Training and more Help at VBAExpress[
 
Hi Ziggurat,

I can understand the potential problem with the help file, but why the pdf? If your pdf distiller is set up to embed the used fonts, or the used sub-sets of them, there should be no need to "download a language update to display them".

Cheers
 
Hi again,

Here's some code you might like to try. It tests a selection and, whenever a Unicode Character is found, replaces it with asterisks either side of the charater's unicode value.

Sub FindUnicode()
Dim oChr
For Each oChr In Selection.Characters
If Asc(oChr.Text) <> AscW(oChr.Text) Then oChr.Text = "*" & AscW(oChr.Text) & "*"
Next
End Sub

Cheers

 
Thank you all for your replies.

To try to answer them:-

I have tried searching through the file to find non ASCII characters this works for characters in the main document but this doesn't seem to work for the text that is displayed in text boxes, in the header, footer, etc.

Concerning the pdf file. I didn't realize that you could embed the used fonts in the pdf file. This is a useful piece of information but this still doesn't solve the problem of Japanese (DBCS) characters being present in the document.

My objective is to try to find all the cases where DBCS are located and then replace them.

Thank you again for your replies.
It is always a pleasure to participate in Tek-Tips







#
###
#####
#######
Ziggurat
 
Hi Ziggurat,

The following macro processes all Unicode characters in the document, regardless of whether they're in the body or a header, footer, textbox or other object.

As coded, the Unicode characters will be deleted. If you want to change them to something else, type that character or string between the quotes.

Sub ClearUnicode()
Dim oRange As Word.Range, oChr
For Each oRange In ActiveDocument.StoryRanges
Do
For Each oChr In oRange.Characters
If Asc(oChr) <> AscW(oChr) Then oChr.Text = ""
Next
Set oRange = oRange.NextStoryRange
Loop Until oRange Is Nothing
Next
End Sub
 
macropod,

Isn't it the point that DBCS characters are not Unicode?

Ziggurat,

If the Find works in the main document then code can be written to make it work in headers and footers. Textboxes can be more of a problem. I will post some code later - no time right now.

Enjoy,
Tony

--------------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.
Excel VBA Training and more Help at VBAExpress[
 
Hi Macropod and TonyJollas

I really appreciate the time taken in replying to my question.

I have tried the function that macropod had written and it does indeed find the Japanese in the text. A very educational piece of software. As a side issue. Could you tell me where you learnt about the StoryRanges etc. It is an area of Word that I know nothing about.

I also did a little more searching on the internet and found the site :-

which I would like to share with other members of the forum.
It contains word macros for translators and has the following macros freely available:-
Doc2Html2Doc (Converts Word document text into html text, and vice versa.)

Wide2Narrow (Converts double-byte letters, numbers, and symbols (e.g. parentheses/punctuation) into their single-byte equivalents. This is useful if you are translating a Japanese document into English, and need to remove double-byte characters. The built-in "Change Case - Half-width" feature has the side effect of converting Katakana into half-width characters as well.)

FindNextJ (Finds the next Japanese character in the document. Useful for finding "invisible" Japanese characters (such as full-width spaces and punctuation), and for finding small amounts of Japanese text in a mostly European-language document. )

Hope these help anyone else who has a similar problem.

Also looking forward to seeing the code for searching text boxes from TonyJollas.

Long live Tek-Tips


#
###
#####
#######
Ziggurat
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top