Part III: wandering through unicode, legacy fonts, and browsers

Unicode — Supplementary Planes

Back to Unicode. Even Unicode needs room to expand. The Basic Multilingual Plane (BMP, also known as Plane 0) has codepoints 0 through FFFF hex (0 through 65,535) and contains, ignoring various special mathematical characters, historical quirks, and just plain oddities, most of the current alphabets and symbols in use today, even including the vast array of Chinese kanji. But of course there’s more room needed once historical alphabets and other special character sets are also considered. Supplementary planes such as Plane 1 are simply sets of 65,535 code points following Plane 0 (some of us refer to these as astral planes :-) ). Using hexadecimal notation, it’s easy to spot Plane 1: all Plane 0 codepoints use four hexadecimal digits; all Plane 1 use five, etc.

Here is an example of a Plane 1 character: 𝔊 which renders as 𝔊 (marks a septaugint reference). In case anyone viewing this is having trouble, here’s an image:

To set things up to view this can be an interesting process. First of all it’s necessary to find a unicode font that supports plane 1 characters. Not all such will actually show the above, for example Code2001, an otherwise excellent unicode font, does not include historical Greek musical notation however Cardo works nicely. Just because a font is unicode based does not mean it will be “complete” (such a font would be staggeringly large). In particular, when choosing among unicode fonts, pay attention to which version of unicode is being supported, and what the target languages are. Plane 1 characters start to appear in Unicode versions 3.2 and up.

Second, the operating system might need slight adjusting to see Plane 1. I’m talking, of course, about Windows, XP and earlier versions. This page discusses how to do the registry edits necessary both for the operating system and for IE6 8-|.

Linux/*nix, Mac, and Vista are all presently able to handle Plane 1 without modifications. My understanding is that for Windows Me and anything prior to NT, it’s hopeless.

Third, applications in general and browsers in particular may need slight tweaking to see Plane 1 characters. As usual, IE6 is the worst offender in this regard requiring not just setup but also a registry edit. In general, once the browser is set up with a Plane 1 aware font, it’s good to go. Firefox, Safari, Opera, and Konqueror all fall under this category.

A few notes: IE6 needs to be set to user-defined encoding plus the extended font needs to be listed under User Defined (rather than any of the specific regions listed, which refer back to non-unicode encodings as discussed in part 1). Opera actually reverses from the general MO of other browsers: under Tools->Preferences->Advanced->International Fonts, it’s possible to set a particular unicode font to a particular language (going by Unicode’s code blocks). This is the direction other browsers should no doubt take in the future.

del.icio.us:Part III: wandering through unicode, legacy fonts, and browsers  digg:Part III: wandering through unicode, legacy fonts, and browsers

Leave a Comment

Bad Behavior has blocked 1095 access attempts in the last 7 days.