Hi, I am looking for help with xml and Microsoft.XMLDOM object. I have xml document encoded in utf8.
When I load document and read an element with selectSingleNode, I get text automatically converted into my native 1250 encoding. But sometimes text include characters from other codepages e.g. foreign names. So for example name Menière I get as Meniere but Kovár I get correctly as Kovár.
I know that I can't display more codepages at once, but I would like to keep name correctly in my database for further correct exporting. Is it possible to process that somehow with XMLDOM object, or only way is to make my own parser?
Thanks for help.
Vlado
I've never done what you are trying to do but I would think if you can detect what language the original request was coming from, tie that back to the correct codepage and then switch to that before you selectSingleNode, you would be looking at the data correctly.
One other thought, is the data you are looking at really changed because the wrong codepage is loaded? Maybe by changing the codepage to the original codepage after the data is loaded will cause the information to display as it did originally.
(Note: do you need to consider double byte characters ? )
Here are a couple of links that might help:
whats-the-difference-between-an-encoding-a-character-set-and-a-code-page
Hope this helps...
Thanks for reply.
Document was encoded in utf-8 so there were used more codepages. But finnally I found the solution thanks to this Rick Strahl's document Using Unicode in Visual FoxPro Web and Desktop Applications.
I have to just use SYS(3101,65001)
before
utf8name = oxml.selectSingleNode(".//name")
So I have string in utf-8 and when I want display it in foxpro I can use STRCONV(utf8name,11)
and string will be in my system's default codepage, and that is exactly what I wanted.
As soon as data with invalid codepage info gets into VFP's ANSI characters and of the extended properties are lost. You can't pass an invalid codepage character to a different codepage without the character getting mangled or translated in some way.
The only way to do this is to try and keep the data in UTF-8 format and only change to a specific codepage when you display the result in a COM component (for UI components) or in HTML using UTF-8 encoded content.
If you need to do multi-lingual FoxPro with multiple active languages at the same time it's just about impossible to do and I would advise using a different dev tool that supports Unicode (which is just about anything else).
+++ Rick ---