Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with German Characters 1

Status
Not open for further replies.

sheila11

Programmer
Dec 27, 2000
251
US
Hi All,

I am generating a Word document from contents in an XML file. An XSLT converts the XML to WordML and saves as a Word HTML file, and then I open that HTML file with Word, and save it as a Doc file.

It works fine except when the content includes special characters that are not in English.
e.g. NETTOVERMÖGENSAUFSTELLUNG becomes NETTOVERMA-GENSAUFSTELLUNG

I tries setting lang="DE" from lang="EN-US" in the body tag and the span tag in HTML file. But that didn't work.

I also tried doing this:
Selection.WholeStory
Selection.Range.LanguageID = wdGerman

Is there another way to specify the language in Word? Or any other way to achive what I am trying?

TIA,
Sheila



 
You have a multi-stage process (not that I quite follow all the stages - what is the HTML for?). At which stage are you losing the characters?

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
The characters appear correctly in the HTML file.(e.g. Ö).

Then using C# code I open Word application, which opens the html file, and saves it back as a Word doc file. At this stage Word corrupts it to Ö.

Note: I had first created a sample Word doc file using the german content, and saved it as HTML file manually. This file gave me the asci file that I use for conversion from XML to HTML. During XSLT, I just replace the content, retaining all the tags as they are.

Hope I explained everything.

Sheila
 
I'm not sure I have an answer, but a couple more questions:

Do you have a code page specified in the HTML?

Can you open the HTML with a browser? And, if so, does it display correctly?

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
This is how the HTML looks:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<meta name="Generator" content="Microsoft Word 11 (filtered)" />
</head>
<body lang="DE">
<div class="Section1">KONSOLIDIERTE NETTOVERMÖGENSAUFSTELLUNG

<table lang="DE" class="MsoTableGrid" border="1" cellspacing="0" cellpadding="0" style="border-collapse:collapse;border:none" xmlns:eek:="urn:schemas-microsoft-com:eek:ffice:eek:ffice" xmlns:xsi=" xmlns:xsd="
<tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
<td width="487" valign="top" style="width:365.4pt;border:solid white 1.0pt;padding:0in 0in 0in 0in'">
<p class="MsoNormal" style="mso-layout-grid-align:none;text-autospace:none">
<b>
<span lang="DE" class="GramE" style="font-size:9.0pt;font-family:AArialNarrowMT-Bold;&#xD;&#xA; mso-bidi-font-family:AArialNarrowMT-Bold">
per 31. Oktober 2008</span>
</b>
</p>
</td>

No, when I open the .html file with browser or Word, the characters look corrupted. Only in Notepad they appear correct.

Sheila
 
That all displays correctly for me both in IE and Word, and the Ö is maintained when I save it in .doc format, so the problem is somewhere in your system.

Unfortunately, I don't know what it may be - part of me suspects it is a code page issue but I don't know what, and you are specifying the Windows standard Western European one which does include Ö, so shouldn't really have a problem on that front. Is your system a German one?

Sorry. Maybe someone else can help.

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
Tony, Thanks for discussing the issue with me.

I opened the html file in Visual Studio, and also in Browser. Then I saved the file in VS, without making any change to it, and refreshed the browser.

The first time when I opened in browser, the characters were corrupted. But after the file was saved in VS, and browser was refreshed, the characters corrected themselves.

Also, the html file appeared as a web-page in explorer at first. But after saving in VS, it appears as a Word file with html extension.

What is it that VS did to it?

Any ideas?
TIA,
Sheila
 
VS has a tendency to stick a UTF-8 byte order mark at the front of the file. Do your XML files have an encoding specified?

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
Thanks, Tony. You solved my problem.

While doing the XSL transormation I was not using Encoding.Unicode. I added it, and now the characters are appearing correctly.

Thanks a million!
Sheila
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top