<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Digital Ramble &#187; standards</title>
	<atom:link href="http://www.digitalramble.com/tags/standards/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.digitalramble.com</link>
	<description>surveyor of the foothills, valleys and occasional sheer cliff drops of the world of computer programming...</description>
	<lastBuildDate>Wed, 30 May 2007 05:37:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<image>
  <link>http://www.digitalramble.com</link>
  <url>http://www.digitalramble.com/wordpress/favinit.gif</url>
  <title>Digital Ramble</title>
</image>
		<item>
		<title>Part II: wandering through unicode, legacy fonts, and browsers</title>
		<link>http://www.digitalramble.com/2006/07/25/57/</link>
		<comments>http://www.digitalramble.com/2006/07/25/57/#comments</comments>
		<pubDate>Tue, 25 Jul 2006 23:46:41 +0000</pubDate>
		<dc:creator>Cindy</dc:creator>
				<category><![CDATA[compatibility]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[konqueror]]></category>
		<category><![CDATA[msie6]]></category>
		<category><![CDATA[msie7]]></category>
		<category><![CDATA[opera]]></category>
		<category><![CDATA[safari]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[utf-8]]></category>

		<guid isPermaLink="false">http://www.digitalramble.com/2006/07/25/57/</guid>
		<description><![CDATA[Precomposed versus Combining
In the course of putting together the encodings (called code points) in Unicode, a number of decisions had to be made regarding the current existing encodings, particularly well known and/or well established ones.  In some cases, even though the Unicode Consortium has a particular policy regarding some encode vs render issues, there [...]]]></description>
			<content:encoded><![CDATA[<h3>Precomposed versus Combining</h3>
<p>In the course of putting together the encodings (called code points) in Unicode, a number of decisions had to be made regarding the current existing encodings, particularly well known and/or well established ones.  In some cases, even though the Unicode Consortium has a particular policy regarding some encode vs render issues, there are inconsistent inclusions due to this grandfathering of prior established encodings and (to be quite honest) outright mistakes on the part of the Consortium.   The question of precomposed characters versus combining characters is a classic one.</p>
<p><span id="more-57"></span></p>
<p>As a simple example, let&#8217;s take the greek letter alpha &#8212; <font face="Cardo">ά</font> &#8212; with an acute accent.  Should this be represented as <code>&amp;#x1F85;</code> or as <code>&amp;#x03B1;&amp;#x0314;&amp;#x0301;&amp;#x0345;</code>?  (See <a target="window" href="http://www.unicode.org/charts/PDF/U0370.pdf">here</a> and <a target="window" href="http://www.unicode.org/charts/PDF/U0300.pdf">here</a> for the code charts.) In other words, should a single fixed code point be used to represent something that&#8217;s really a combination of a letter and an accent, or should there be a code point for the letter and a code point for a <i>combining</i> accent which is then combined with the previous letter?</p>
<p>My two cents worth is that since the <i>concept</i> of an accent applies to more than one character, it is an independent concept, and thus combining is the way to go.  Not to mention more economical &#8212; with this approach, to add an accent to any other character requires only one additional code point in the charts.  Implementing it the other way means for each character that might be accented, a second code point must be reserved &#8212; and as that cannot possibly be comprehensive, it will be by nature reserved to &#8220;legal&#8221; or &#8220;existing&#8221; combinations at the very least.  So why are both present?  That&#8217;s the grandfather clause at work, since the concept of combining characters postdates the establishment of many of the old encodings.  But philosphically, the Unicode Consortium&#8217;s wise enough to agree with me <img src='http://www.digitalramble.com/wordpress/smilies/yahoo_wink.gif' alt='&#59;&#45;&#41;' class='wp-smiley' width='18' height='18' title='&#59;&#45;&#41;' /></p>
<p>Now suppose a web developer needs to represent some manuscript online.  The texts may themselves contain &#8220;mistakes&#8221; which are intended to be reproduced as is.  For example, in (ancient) Greek, an epsilon will never be combined with a circumflex due to the rules of the language.  But perhaps an idiosyncratic author did so anyway, or a manuscript is badly marked up, etc.  If the only encoding at hand for ancient Greek contained only the &#8220;legal&#8221; stuff, I&#8217;d be out of luck if I needed to show this.  Given this, clearly the use of combining diacritics makes every sense.  While there is no precomposed character for epsilon-with-circumflex, I could still use <code>&amp;#x03B5;&amp;#x0342;</code> to represent this &#8212; <span style="font-family:Cardo; font-size:100%;">&#x03B5;&#x0342;̂</span> &#8230;</p>
<p>But wait!  That&#8217;s not quite the end of the story!  Not all browsers are good about combining diacritics.  It&#8217;s actually something of an artform, and the positionings will depend whether the base letter is small or caps, and whether there are other diacritics also being combined with it.  Frankly, most browsers <i>don&#8217;t</i> cope so well.  So to get a usable display in as many cases as possible, I find it worth scanning ahead in the text to find all the associated diacritics and checking for the existence of a precomposed character before attempting to do it through building it up.</p>
<h3>Test Case</h3>
<p>Let&#8217;s take an interesting example here: <img src="http://digitalramble.com/public/unex/alpha.png" /><br />
That is an alpha with a rough breathing, an acute accent, and an iota subscript.<br />
If the alpha is uppercased, it looks like this: <img src="http://digitalramble.com/public/unex/Alpha.png" /><br />
Notice how the positioning of the iota subscript changes when alpha is capital, and both the breathing and the accent sidle a little to the side to get out of the base letter&#8217;s way.</p>
<p>I&#8217;m going to demonstrate how this pair of letters looks in precomposed versus combining notation in assorted browsers and operating systems.  First of all here&#8217;s the table I used to build up my examples.  The first column has the precomposed character for each of the above images.  The second has the base character plus the combining diacritics for the above images.  The diacritics are in &#8220;proper&#8221; order.  This order is particular to each language, and in this case the order is supposed to be breathings-accents-iotas and that is what drives the order you see in the final representation &#8212; the breathing will always be to the left of an accent, and so on.  The third column has the diacritics in reversed order.  This is a very interesting situation that I don&#8217;t believe has yet been well enough assessed.  I don&#8217;t even know that it <i>should</i> be: just as it&#8217;s reasonable to expect words to only  make sense if their letters are ordered properly, it could be just as reasonable to expect diacritics to be listed in proper order.  On the other hand, it might be reasonable for some programs (especially a word processor) to order them properly &#8212; these <i>are</i> computers and can churn out a few more CPU cycles, after all.  I found it very interesting how badly handled this third column was, so I included it for interest&#8217;s sake.</p>
<table class="unicodetest" cellpadding="5">
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<td class="unicodetest"><font face="Cardo">&#x1F85;</font></td>
<td class="unicodetest">&#x03B1;&#x0314;&#x0301;&#x0345;</td>
<td class="unicodetest">&#x03B1;&#x0345;&#x0301;&#x0314;</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td class="unicodetest">&#x1F8D;</td>
<td class="unicodetest">&#x0391;&#x0314;&#x0301;&#x0345;</td>
<td class="unicodetest">&#x0391;&#x0345;&#x0301;&#x0314;</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
</table>
<p>There are several things to note here.  First of all, particular diacritics, depending on the language of course, have particular positions.  Some combine below, some along the top, some to either side and some as &#8220;overstrikes.&#8221;  In addition, if there are multiple diacritics that are positioned similarly there is usually an order of precedence.  Finally, if the letter is capitalised, that often affects the placement of the diacritic.  In the images above (and in the table, if your browser is working correctly) the iota subscript moves to the side of the capital alpha.</p>
<p>First up on the chopping block: Internet Explorer (IE6 and IE7):</p>
<table border="1">
<tr style="background-color:#e6f2f2;">
<td></td>
<td>IE6 (SP2)</td>
<td>IE7 (beta 3)</td>
</tr>
<tr>
<td>Windows XP SP2 w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/ie6-windows-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/ie7-windows-cardo.jpg" /></td>
</tr>
<tr>
<td>Windows XP SP2 w/regedits &#038; w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/ie6-windows-regedits-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/ie7-windows-regedits-cardo.jpg" /></td>
</tr>
<tr>
<td>Vista (beta 2) w/Cardo</td>
<td align="center">n/a</td>
<td><img src="http://digitalramble.com/public/unex/ie7-vista-cardo.jpg" /></td>
</table>
<p>I will admit I was surprised at how well Internet Explorer did here when it&#8217;s usually so miserable at dealing with odd things, <i>especially</i> internationalization issues.  However, do notice that it&#8217;s clear MS didn&#8217;t consider the case where the combining characters might not be &#8220;in order&#8221;.    In the third column, the diacritics along the top haven&#8217;t been properly spaced from each other, nor are they shifted slightly in the capital letter version.  And the iota fails to adjust to the capital letter.</p>
<p>Moving along to Firefox and Opera, we have the following table:</p>
<table border="1">
<tr style="background-color:#e6f2f2;">
<td></td>
<td>Firefox (1.5)</td>
<td>Opera (9)</td>
</tr>
<tr>
<td>Windows XP SP2 w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/firefox-windows-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/opera-windows-cardo.jpg" /></td>
</tr>
<tr>
<td>Windows XP SP2 w/regedits &#038; w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/firefox-windows-regedits-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/opera-windows-regedits-cardo.jpg" /></td>
</tr>
<tr>
<td>Vista (beta 2) w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/firefox-vista-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/opera-vista-cardo.jpg" /></td>
</tr>
<tr>
<td>Linux (Ubuntu 6.06) w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/firefox-linux-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/opera-linux-cardo.jpg" /></td>
</tr>
<tr>
<td>Mac 10.4.7 w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/firefox-mac-cardo.jpg" /></td>
<td><img src="http://digitalramble.com/public/unex/opera-mac-cardo.jpg" /></td>
</tr>
</table>
<p>What I find very interesting is how the registry edits for Windows XP improve both Firefox and Opera&#8217;s ability to display combining characters properly.  However, this ability immediately disappears in the tricky third column.  </p>
<p>I&#8217;m surprised by Opera&#8217;s complete failure to try combining the characters on the Mac, especially since it did combining in both Windows and Linux.  Note that it simply listed each after the base letter in the order given in the table.  Opera lets you define which font you want to use for the &#8216;Greek and Coptic&#8217; and the &#8216;Greek Extended&#8217; blocks. Now, I&#8217;ve set Cardo as my default font for everything, so that should override anything that&#8217;s there.  &#8220;Should&#8221; being the operative word.  Here&#8217;s the problem: I go to Preferences &#8211;> Web Pages &#8211;> Normal Font and I select Cardo there.  However, this doesn&#8217;t seem to affect the display of Greek text in Mac Opera. This is odd, since it does affect the display of Greek in Linux and Windows Operas.  I can instead change the display of Greek text in MacOpera by going through Preferences &#8211;> Advanced &#8211;> Fonts &#8211;> International Fonts and there selecting &#8216;Greek&#8217; as the &#8216;writing system&#8217;. However, Cardo is not listed here. And since &#8216;Greek&#8217; is actually Opera&#8217;s name for the &#8216;Greek and Coptic&#8217; set, to make sure I can render all of Polytonic Greek correctly I also have to select &#8216;Extended Greek&#8217; as the writing system. Cardo is not listed there either. Why is it not listed there when it&#8217;s clearly installed and is available system-wide? I do not know. </p>
<p>Let&#8217;s check the last two browsers of interest: Konqueror and Safari:</p>
<table border="1">
<tr style="background-color:#e6f2f2;">
<td></td>
<td>Konqueror (3.5.2)</td>
<td>Safari (2.0.4)</td>
</tr>
<tr>
<td>Linux (Ubuntu 6.06) w/Cardo</td>
<td><img src="http://digitalramble.com/public/unex/konqueror-linux-cardo.jpg" /></td>
<td align="center">n/a</td>
</tr>
</tr>
<tr>
<td>Mac OS 10.4.7</td>
<td align="center">n/a</td>
<td><img src="http://digitalramble.com/public/unex/safari-mac-cardo.jpg" /></td>
</tr>
</table>
<p>I&#8217;m giving Konqueror bonus points for the creativity it shows in vertically stacking the rough breathing and accent.  Unfortunately, it&#8217;s completely invalid on technical merits.  The iota subscript isn&#8217;t even underneath the base letter, either.  Konqueror needs to completely fix this aspect of their rendering engine.  Surprisingly, though (and I say that because Konqueror and Safari both share the KHTML rendering engine) Safari comes out a winner here, rendering the combining characters correctly. </p>
<p>So what I see here is IE6 (shockingly enough), IE7, and Safari handling the combining characters the best.  I say that because both of them are able to render the test cases properly without any outside modifications.  Firefox&#8217;s correct rendering on Windows with regedits but <i>not</i> on any other operating system, nor on Windows without regedits leaves me to believe the credit lies not with the browser but with whatever effect the registry edits has on Windows.  Still it did a decent job and it handled the reordered combining letters no worse than the properly ordered combining letters.  Opera was neck to neck with Firefox for all the same reasons, but fell behind for its odd handling of the test cases on the Mac.  </p>
<p>Konqueror and Safari wind up being the most puzzling since they use the same rendering engine.  In theory, then, if one works, so should the other.  I may try to rustle up other versions of Konqueror on other Linux distributions and if anything else might be going on.</p>
<h3>Microsoft&#8217;s Registry Edits</h3>
<p>Microsoft details up to three registry edits that are necessary to to set up Windows NT, 2K and XP for unicode functionality.  (Vista comes already set up, and does seem to actually work.) The first edit is the most important.  It enables <a target="window" href="http://en.wikipedia.org/wiki/Uniscribe">Uniscribe</a> support which is what Windows applications use in order to be able to render Unicode characters.     It&#8217;s responsible for rendering input text, for substituting character variations according to context, and for ordering displayed text based on text flow direction.  </p>
<p>The second registry edit adds support for the supplementary plane characters in IE (which I will cover in the next post).  The third adds the ability (in Win XP only) to specify a default font for supplementary plane characters.</p>
<p>Presumably the first edit is what enables Firefox and Opera to be able to handle combining diacritics.  I&#8217;m at a loss to explain IE6&#8217;s behavior, unless a fairly recent patch to it enables the same thing.  Unfortunately, I don&#8217;t have an earlier unpatched IE6 to test that theory out with (if I&#8217;m correct, such an earlier version of IE6 would need this registry edit to display properly).  Installing certain language packages will also install the Uniscribe module &#8220;behind the scenes,&#8221; but as I had the same language packs before and after adding the registry edits, that doesn&#8217;t explain this either.</p>
<p>The full instructions for doing the registry edits may be found <a target="window" href="http://www.i18nguy.com/surrogates.html">here</a>.</p>
<h3>Excellent Resources</h3>
<p>For much more informative commentary, check these pages out:</p>
<ul>
<li><a target="windo" href="http://www.i18nguy.com/UnicodeBenefits.html">Benefits of the Unicode™ Character Standard</a>
<li><a target="window" href="http://www.i18nguy.com/unicode/codepages.html">Character Sets And Code Pages At The Push Of A Button</a>
<li><a target="window" href="http://www.cs.tut.fi/~jkorpela/chars.html#characters">More about the character concept</a> (the entire article is excellent as well)
<li><a target="window" href="http://www.alanwood.net/unicode/combining_diacritical_marks.html">Combining Diacritical Marks</a>
</ul>
<hr/>Copyright &copy; 2010 <strong><a href="http://www.digitalramble.com">Digital Ramble</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact legal@www.digitalramble.com so we can take legal action immediately.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://www.digitalramble.com/2006/07/25/57/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Part I: wandering through unicode, legacy fonts, and browsers</title>
		<link>http://www.digitalramble.com/2006/07/18/54/</link>
		<comments>http://www.digitalramble.com/2006/07/18/54/#comments</comments>
		<pubDate>Wed, 19 Jul 2006 05:01:46 +0000</pubDate>
		<dc:creator>Cindy</dc:creator>
				<category><![CDATA[browsers]]></category>
		<category><![CDATA[compatibility]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[konqueror]]></category>
		<category><![CDATA[msie6]]></category>
		<category><![CDATA[opera]]></category>
		<category><![CDATA[safari]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[utf-8]]></category>

		<guid isPermaLink="false">http://www.digitalramble.com/2006/07/18/54/</guid>
		<description><![CDATA[In the beginning was ASCII, at seven bits.  And it was good, until someone noticed a few missing characters.  In this way, ASCII with eight bits was born.  But alas!  There were even more characters to be respresented.  And  thus began the exodus in search of ways to show [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p><i>In the beginning was ASCII, at seven bits.  And it was good, until someone noticed a few missing characters.  In this way, ASCII with eight bits was born.  But alas!  There were even more characters to be respresented.  And  thus began the exodus in search of ways to show these missing characters.</p>
<p>And the way split and conmingled between encoding and rendering.  After all, since no encoding existed for some characters, those attempting to render them on the screen made up their own encodings as they went.  And things continued in this sorry mess for years, until Unicode was born&#8230;</i></p></blockquote>
<p>Of course at this point, now various different programs such as mailers, forums, word processors, internet browsers and even programming languages all are in the process of being updated to understand unicode.  It&#8217;s a giant bear of a mess along many fronts, although I do believe Unicode is the way forward, the essential problem is that it&#8217;s been introduced at this point instead of at, say, the point ASCII originally came on the scene.  But no matter, things will sort themselves out.  Over the years and years, but hey.</p>
<p>So what, exactly, am I talking about?  In order to represent the alphabet, two things are needed.  First is a kind of representation that says which character we&#8217;re talking about.  So for example (decimal) 65 is used to represent the letter &#8216;a&#8217;.  The second kind of thing is whatever is used to <i>render</i> &#8216;a&#8217;.  It could be <tty>a</tty>, or <i>a</i> or <b>a</b>.   These two things are the encoding and the rendering of a given character.  Conceptually these two properties of a character should be distinct.  In practice, of course, it&#8217;s not always been that cleanly handled, and there are some issues where the lines are legitimately blurred.</p>
<p>The problem was, of course, that the original encoding table (<a target="window" href="http://www.asciitable.com">ASCII</a>) was much too limited to handle languages other than English.  To address this, a number of <a target="window" href="http://en.wikipedia.org/wiki/ISO_8859">ISO 8859&#8217;s</a> were developed to cover additional characters such as ß or ñ and other marks and symbols such as © and £.  However, since rendering (or typography) was not considered in these representations. a number of <i>legacy fonts</i> developed that used additional proprietary (and conflicting) encodings for additional information not covered in the standards.  And in all of this, languages that did not even use the Latin alphabet (such as Greek, Russian, Arabic, Hebrew, Japanese, Chinese, and so on and on) were definitely ill-represented overall.  In most cases there are several possible representations and standards to use, which results in a nightmare for anyone trying to represent extended or other character sets in programs that make use of them.  (Which basically potentially includes any program which ever tries to communicate with its user in anything other than international symbols, but I digress.)</p>
<h3>Unicode</h3>
<p>The idea behind <a target="window" href="http://en.wikipedia.org/wiki/Unicode_Consortium">Unicode</a> is to create one giant encoding standard for all of this (leaving the typography alone and up to whatever rendering a particular program wishes to use, or whatever font set the user has installed).  Sounds simple enough although even this idea is fraught with complexity and inconsistencies.  For example Unicode absorbed many of the original encoding standards in order to ensure backward compatibility; and has made various inconsistent decisions on the inclusion of other characters in different ways.   However. the underlying concept is sound, and if it takes another twenty years to refine it, the end result should still be better than the cacaphony there is now.</p>
<p>Enough with the background.  I want to discuss most of this in the context of browser rendition, since this is a good deal of what I work on anyhow.</p>
<p><span id="more-54"></span></p>
<p>Let&#8217;s take a quick look at this:<br />
<img src="http://digitalramble.com/public/unex/sample.jpg"><br />
I have it up here as an image to guarantee that everyone here can see this word.  (I should note, by the way, that I picked this word out at random, but I do not know ancient Greek.  When I looked it up, it turns out it&#8217;s an adjective meaning &#8220;eager to be of service&#8221;, which I find vastly amusing.)</p>
<p>In Unicode, I could encode this as follows:<br />
<code>&amp;#x03C0;&amp;#x03C1;&amp;#x03CC;&amp;#x03B8;&amp;#x03C5;&amp;#x03BC;&amp;#x03BF;&amp;#x03C2;</code><br />
I am using <a target="window" href="http://www.w3.org/TR/html401/charset.html#h-5.3.1">Numeric Character References</a> here, instead of raw UTF-8, mostly because IE6 expects such representation although in theory either should be acceptable and certainly Firefox, Safari, and Opera are all happy with either.  The codes can be found <a target="window" href="http://www.unicode.org/charts/PDF/U0370.pdf">here</a> and <a target="window" href="http://www.unicode.org/charts/PDF/U1F00.pdf">here</a>.</p>
<p>In any case the above should render similar to the image above:<br />
&#x03C0;&#x03C1;&#x03CC;&#x03B8;&#x03C5;&#x03BC;&#x03BF;&#x03C2; (NCR)<br />
I could also put it down like this:<br />
πρόθυμος  (direct UTF-8)<br />
but Internet Explorer may not render this correctly.</p>
<p>Now if it did not, there are several considerations to check.</p>
<p>The first will be whether or not the page itself is correctly set up for viewing Unicode.  Since I&#8217;m the one running this show, of course it is.  There needs to be a declaration at the top something like this:<br />
<pre><code>
&lt;meta 
&nbsp;&nbsp;&nbsp;&nbsp;http-equiv=&quot;Content-Type&quot; 
&nbsp;&nbsp;&nbsp;&nbsp;content=&quot;text/html; charset=UTF-8&quot; 
/&gt;
</code></pre><br />
where &#8220;<a target="window" href="http://www.utf-8.com/">UTF-8</a>&#8221; tells the browser to expect Unicode encodings in the following text.  By the way, it&#8217;s a good habit to start using this in web pages, because this also encompasses the original ASCII and legacy encodings, so it will not break existing pages, but easily allow them to expand to cover encodings found only in Unicode.  Although, I&#8217;m going to <a target="window" href="http://unicode.org/faq/utf_bom.html">leave alone the entire issue of byte representation</a> that&#8217;s actually tucked away here, with UTF-16 and UTF-32 lurking around the corner.</p>
<p>Since I can eliminate that as a problem off the bat, the next consideration is the browser being used to view this.  All the modern browsers (meaning: Firefox (all OS), Opera (all OS), Safari, Internet Explorer 6 (latest patches), Netscape 7.2 plus and derivatives) should understand this.  Some of the older ones may not.  In particular IE on the Mac never contained Unicode support.  In many cases, setting several menu options may be necessary to enable the support &#8212; check the documentation for the particular browser for &#8220;character encoding&#8221;.  For IE, check the &#8220;user-defined&#8221; options.   View->Encoding needs to be set to user-defined, and then in the Internet Options, a suitable font needs to be selected for the &#8220;user-defined&#8221; font (eg, not the Latin, etc fonts).</p>
<p>Different browsers will support unicode at different levels.  For example, on IE6, it&#8217;s not only necessary to enable the character encoding support, but also to do certain registry edits for both the browser and the operating system (IE7 appears to contain more support for Unicode, fortunately).  Firefox and Opera only need to be informed of a compatible font.  Safari actually breaks down the &#8220;unicode&#8221; fonts into the different regions on the (very reasonable) assumption that one might use different unicode fonts for different languages and not some &#8220;universal&#8221; unicode font.  So in Safari, setting the unicode &#8220;Greek&#8221; to the correct font will allow the above to display.</p>
<p>A good font for the above display that I would recommend <a target="window" href="http://scholarsfonts.net/cardofnt.html">the Cardo font</a>.  There are <a target="window" href="http://www.alanwood.net/unicode/fonts.html">several out there</a>, including Arial Unicode MS, but I do not recommend this or any of the older ones, as support is only to version 2.0 of the Unicode standard, and we&#8217;re well past that now.  I&#8217;ll return to this point later on.  For now, I&#8217;m going to detour briefly into legacy fonts, just to illustrate why they were such a bad idea.</p>
<h3>Legacy Fonts</h3>
<p>I&#8217;m not even going to render them on this page <img src='http://www.digitalramble.com/wordpress/smilies/yahoo_smiley.gif' alt='&#58;&#45;&#41;' class='wp-smiley' width='18' height='18' title='&#58;&#45;&#41;' />.  Actually, I cannot render them in this page, because they require different meta declarations, i.e.:<br />
<pre><code>
&lt;META http-equiv=&quot;Content-Type&quot; 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;content=&quot;text/css; charset=ISO-8859-1&quot;
/&gt;
</code></pre><br />
For Polytonic Greek, there are several well known fonts out there, including WinGreek, GreekKeys, and Ismini.  The first thing to note off the bat, is that some of these are actually OS dependent.  The second is that they can be broken, even within their original scope.   Take Ismini, for example.  ί is supposed to be represented as <code>¼</code> in this font.  But regardless of the browser, regardless of proper installation of the font, regardless of proper encapsulation (well, I&#8217;m staying away from the content/presentation arguments in css/html &#8212; enough principles are violated as it is <img src='http://www.digitalramble.com/wordpress/smilies/yahoo_wink.gif' alt='&#59;&#45;&#41;' class='wp-smiley' width='18' height='18' title='&#59;&#45;&#41;' /> ) within a <code>&lt;font face=&quot;Ismini&quot;&gt;</code> element it still displays in the Latin-1 representation of ¼.  GreekKeys only currently supports Macintosh; there was a Windows version offered several years back, but support for that has been discontinued.  In its defense, though, I should note that a <a target="window" href="http://ist-socrates.berkeley.edu/~pinax/greekkeys/GreekKeys.html">unicode version is now offered</a>.</p>
<p>In any case, the representations for the above word in these fonts, respectively, are<br />
<pre><code>
&lt;font face=&quot;Greek Old Face 98&quot;&gt;prÒqumoj&lt;/font&gt;
&lt;font face=&quot;Athenian&quot;&gt;prñyumow&lt;/font&gt;
&lt;font face=&quot;Ismini&quot;&gt;pr¿uymoq&lt;/font&gt;
</code></pre></p>
<p>Note the need to explicitly switch font face in order to obtain the results I want.  Unless the entire page will be rendered in one of these fonts, I have to switch around as needed.  Unicode simplifies all of this, and theoretically given sufficient support (a unicode keyboard, unicode display in the editing software, etc) I would not even need to use the numerical character representation of specific characters.  However, as I cannot guarantee all visitors to this site would be similarly equipped, this is a fallback for now (and indeed, since I &#8212; nor anyone &#8212; will never have a keyboard that includes all the possible alphabets and symbols for non-alphabet languages, this kind of workaround will always be needed anyway).</p>
<p>Interestingly, some of the newer browsers themselves have bugs in attempting to handle legacy fonts.  For example Firefox had trouble with the style font-family method of switching between fonts, but was able to display correctly once it was changed to inline font face notation.<br />
So <pre><code>
&lt;font face=&quot;Greek Old Face 98&quot;&gt;prÒqumoj&lt;/font&gt;</code></pre> worked but <code>&lt;span style=&quot;font-family: Greek Old Face 98;&quot;&gt;prÒqumoj&lt;/a&gt;</code> did not.   Trying to get this kind of stuff to display among all the browser/operating system combos out there can really take quite a bit of detective work.</p>
<h3>Minor note on WordPress</h3>
<p>This is getting lengthy and I still have more stuff I want to discuss, so I&#8217;m going to wrap up here and finish the rest in the next post.  But I thought I&#8217;d point out that somewhere along the line in the course of editing and re-editing this post, the NCR&#8217;s I was using to illustrate the various unicode words and such above kept getting rendered into their UTF-8 representations.  I wanted NCR&#8217;s for several reasons, partly because it allows me to reference the Unicode Manual on which ones they are, but also because some browsers require NCR.  But somehow, and I&#8217;m not sure if it was internally in WordPress or some combination of cut and paste in either Windows XP or Ubuntu that did it, but it&#8217;s a classic example of the annoyances that are still present when dealing with &#8220;exotic&#8221; character sets.</p>
<hr/>Copyright &copy; 2010 <strong><a href="http://www.digitalramble.com">Digital Ramble</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact legal@www.digitalramble.com so we can take legal action immediately.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://www.digitalramble.com/2006/07/18/54/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
