FoxPro Programming
how to read utf8 encoded data from xml without conversion
Gravatar is a globally recognized avatar based on your email address. how to read utf8 encoded data from xml without conversion
  VladoS
  All
  Jan 29, 2018 @ 06:01am

Hi, I am looking for help with xml and Microsoft.XMLDOM object. I have xml document encoded in utf8.

When I load document and read an element with selectSingleNode, I get text automatically converted into my native 1250 encoding. But sometimes text include characters from other codepages e.g. foreign names. So for example name Menière I get as Meniere but Kovár I get correctly as Kovár.

I know that I can't display more codepages at once, but I would like to keep name correctly in my database for further correct exporting. Is it possible to process that somehow with XMLDOM object, or only way is to make my own parser?

Thanks for help.

Vlado

Gravatar is a globally recognized avatar based on your email address. re: how to read utf8 encoded data from xml without conversion
  Harvey Mushman
  VladoS
  Jan 29, 2018 @ 06:20am

I've never done what you are trying to do but I would think if you can detect what language the original request was coming from, tie that back to the correct codepage and then switch to that before you selectSingleNode, you would be looking at the data correctly.

One other thought, is the data you are looking at really changed because the wrong codepage is loaded? Maybe by changing the codepage to the original codepage after the data is loaded will cause the information to display as it did originally.

(Note: do you need to consider double byte characters ? )

Here are a couple of links that might help:

whats-the-difference-between-an-encoding-a-character-set-and-a-code-page

MS Code Page Identifiers

Hope this helps...

Gravatar is a globally recognized avatar based on your email address. re: how to read utf8 encoded data from xml without conversion
  VladoS
  Harvey Mushman
  Jan 29, 2018 @ 06:40am

Thanks for reply.

Document was encoded in utf-8 so there were used more codepages. But finnally I found the solution thanks to this Rick Strahl's document Using Unicode in Visual FoxPro Web and Desktop Applications. I have to just use SYS(3101,65001) before
utf8name = oxml.selectSingleNode(".//name")

So I have string in utf-8 and when I want display it in foxpro I can use STRCONV(utf8name,11) and string will be in my system's default codepage, and that is exactly what I wanted.

Gravatar is a globally recognized avatar based on your email address. re: how to read utf8 encoded data from xml without conversion
  Rick Strahl
  Harvey Mushman
  Jan 29, 2018 @ 05:00pm

As soon as data with invalid codepage info gets into VFP's ANSI characters and of the extended properties are lost. You can't pass an invalid codepage character to a different codepage without the character getting mangled or translated in some way.

The only way to do this is to try and keep the data in UTF-8 format and only change to a specific codepage when you display the result in a COM component (for UI components) or in HTML using UTF-8 encoded content.

If you need to do multi-lingual FoxPro with multiple active languages at the same time it's just about impossible to do and I would advise using a different dev tool that supports Unicode (which is just about anything else).

+++ Rick ---

© 1996-2024