Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [WMASTERS] Re: Old Orthography, response to Kalyan

At 11:03 AM 9/16/97 +0800, Leong Kok Yong wrote:

>it seems like there seems to be 2 camps of thoughts..
>(1) Keep old styles out of the proposed code set.
>(2) Allocated codepoints for the old styles in the proposed codeset.
>Camp 1 says that by encoding two glyphs (old & new form) as two
>codepoints will cause duplication especially when it comes to sorting and
>other software operation.
>Camp 2 says that the old styles is very important especially for ancient
>Tamil literature works. So we need one codeset (or rather fontset) that
>that can allows a publisher to represent both old and new glyphs. Keeping
>them separate cause much confusion when one needs to use more than one
>fonts to see old and new style letters.
>The only way out of this deadlock situation is to ask ourselves this
>Are we trying to encode glyphs or characters?
>If we choose to encode only characters, 
>A character can have many glyphs - new style, old style.
>but it will still be encoded as one unique codepoint only.
>(One codeset using two fonts)
>However, if we choose to encode glyphs,
>old and new style letters will have different codepoints.
>(One codeset using only one font)
>>From a desktop publishing point of view, using two fonts for the same
>codeset to write old and new style letters does not seems too difficult. 
>Most wordproccessor allows you to specify different fonts for different
>text paragraphs. 
>But when you bring this to the web, the user can still switch the fonts
>accordingly when you view the html doc. The only drawback here is one may
>need to resort to using the <FONT FACE> tag to indicate the preferred font
>face to use to display a certain part of the document when both styles are
>used in the same document. Is this solution too much of a compromise in
>the eyes of many people? 
>There is no right or wrong way here actually.  If anyone here is aware.
>Chinese comes in Traditional and Simplfied forms.  Traditional = old form.
>Simplified = new form. Because both forms are so entrenched in its use in
>China and Taiwan, even Unicode, which say its should encoded characters
>only, does not do so for Chinese. A character with new and old form are
>given two code point in Unicode!

Kok Yong,

The premises are sound but with due respect, I question the application of
the logic.

First a recap. I spent some time digging up the basis of Unicode somemore.
And as you "dead-on" point out, the  Unicode rule is to encode characters
not glyphs. This makes a lot of sense to me.

Unicode didn't (? see next para)  follow its own rules for commonly used
forms (yes the plural "s" is key here) of _modern_  Chinese, but it sure
did follow its rules, even for Chinese vis-a-vis, "unused today" ancient
forms of Chinese versus contemporary Chinese. In fact it goes one step
further - it bunches together Chinese, Japanese Kanji and Korean characters
of similar nature and has font programs to display the different glyphs. 

One could very easily argue that Unicode is actually following its rules -
because Simplified Chinese and Traditional Chinese are _both_ commonly used
(in PRC and Taiwan respectively) it is actually dealing with them as two
contemporary well-used languages - PRC Chinese and Taiwan Chinese. 

Now let's look at Tamil - the case is completely different. The ORNL is not
used in contemporary Tamil anywhere. Fortunately because it was only a
small change to the Tamil character set, it has become universally
accepted. Tamil is fortunate in that there are not 2 contemporary sets of
Tamil characters - there's one (and boy am I happy about that - knowing
Tamil ethos this would normally not be easy to achieve!)

Given that, and given Unicode's rule I believe we should deal with ORNL at
the displayable font level. Exceptions to UNICODE rules (or any standards)
should only be made if there's a very very strong case. On the face of it,
I do not see how we can, or should argue for an exception. The more
exceptions, the less useful the standard, the more "the standard" won't
work with processing applications - simple as that.

The very very few who need to see Tamil script in its old form (much much
fewer than is the case with Japanese kanji users of UNICODE for example
because former is ancient and latter is contemporary) can deal with it at
the font display level, as users of contemporary Japanese kanji will have to.


bala pillai* bala@sydney.net*the asia pacific internet co, sydney
V I R T U A L   C O M M U N I T Y   E X P E R T S
<http://apic.net>   <http://sydney2000.net> <http://malaysia.net>
<http://tamil.net>      for info send blank <mailto:info@apic.net>
ph:+61 2 9419 5333		           fax: + 61 2 9419 5155

Home | Main Index | Thread Index