Sunday, April 13, 2008

Don't Put Non-ASCII In Your Scenery Files!

I don't know how much of a problem this is yet, or how much of a mess it's going to make of people's scenery. Here's the background:
  • ASCII defines 128 character values, mostly letters like A-Z. With ASCII, you can write English and that's just about it.
  • The byte that ASCII is stored in on all modern computers can store 256 values.
  • Clever people got the idea to put some more letters in the other 128 values to create characters like é and å.
  • People defined different "codepages" that have different sets of charcters in those "upper 128" slots. So one code page might be good for French, another for Russian.
  • Modern software uses unicode characters, which have a lot more than 256 values, and can thus hold all sorts of characters in one string.
Code pages were around for a while, but they're not a good idea. The problem with code pages is that the same numeric values are used for different letters. The result is that a correctly written Russian document, when converted to a different code page, looks like gibberish. And if you want one document with both French and Russian, well, one code page doesn't do you much good.

Now X-Plane's handling of non-ASCII characters is pretty poor in version 9.00 (and all previous versions). It will draw ASCII and take keyboard input from ASCII but not much else. If you hit the é key on your foreign keyboard, probably nothing will work.

But it turns out there is one way to use foreign characters in X-Plane - I just discovered it tonight. If you use Windows and your system's codepage* is set for a foreign language, you can use those foreign language characters in an OBJ file to name a file on disk with the same name. In other words, you can have textures named été.png and it will work.

Sort of. If you then change your system to work in Russian (which changes the code page) your texture will stop working. The reason things stop working is that the file system uses unicode; that is, the OS knows that été requires a Latin character set that's French friendly, but X-Plane is using Russian since the system's set that way. The result is that the file system has no way name the file in Russian and we fail to load the texture.

So using the "high 128" characters from your system's code page to make non-ASCII characters is a bad idea because your scenery won't work on other people's computers.

But it's going to get worse in the future. X-Plane is going to start using UTF8 in a lot of places. UTF8 encodes unicode into one byte characters by using more than one byte for non-ASCII characters, but as a result it uses the "high 128" character codes for very different things. été.png in UTF8 comes out quite different.

I'm not sure how we'll handle this yet (use UTF8 in the scenery system or have some kind of backward compatibility). But for now I can only advise one things: use ASCII only for your file names. In fact, a good guideline for filenames for the scenery system is to use only numbers, letters, and the underscore.

2 comments:

Dan31 said...

I have find this a while ago, for I am french ;-)

Austin G said...

Wow, what fantastic timing for this post! I was just adding some Mexican liveries to OpenSceneryX and was considering using the correct accented character in the folder name (and it would therefore appear in the file path in the library file). Ofc it would probably have failed in cross-platform testing, but even so it's good to know the official line.

Do you have mind readers working on X-Plane?