Hi,
I would like to remove some HTML code from am HTML file that has been created with MS Word. That file will contain some language specific characters (specifically, Italian ones, i.e. accented characters). It seems that if I "simply" read the file (using "open...r"), remove the unwanted HTML code, and re-save the file (using "open...w"), the specific charcters are scrambled in some way.
Is there a way to handle this trasformation by specifing that the encoding of the file is special in some way?
Thanks for any help
I would like to remove some HTML code from am HTML file that has been created with MS Word. That file will contain some language specific characters (specifically, Italian ones, i.e. accented characters). It seems that if I "simply" read the file (using "open...r"), remove the unwanted HTML code, and re-save the file (using "open...w"), the specific charcters are scrambled in some way.
Is there a way to handle this trasformation by specifing that the encoding of the file is special in some way?
Thanks for any help