Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[WMASTERS] Re. 8-bit scheme and Unicode 2.0 Tamil


This week's sponsors -The Asia Pacific Internet Company (APIC)
  @  Nothing Less Than A Tamil Digital Renaissance Now   @
<http://www.apic.net> Click now<mailto:info@apic.net> for instant info

Srinivasan wrote:
>I suggest that we propose this scheme (version 1.4) for the
>UNICODE Tamil set. Positions U+0B80 to U+0BFF
>I feel that suggesting a standard where none exist
>is more urgent than arguing about where there are
>already many.

First a small clarification:
It is true that we do not have any standards as far as the
7-bit and 8-bit font based DTP packages are concerned.
But Unicode 2.0 tamil segment is an established 16-bit standard. 
There are already a handful of packages that have implemented
this current version of Unicode tamil . If we are not happy with
its contents, there is scope for revision. I gather it will be a
long drawn out process to introduce revisions into Unicode.
Apparently for indic languages the official organ of contact
for Unicode is Govt. of India. So for tamil, even TN Govt 
has to go through DOE, Govt of India! Revisions come about
initially with revision of ISCII (Indian Standard Code II) and
this is followed up incorporation of these revisions in Unicode.
I do not know if any revisions were introduced for tamil during
the current (1997) revision of ISCII. Earlier revisions of ISCII
were in 1988 and 1992.  (Anyway this was the
picture Anbarasan gave during the last TamilNet'97 conference).

As some of the participants have pointed out, the philosophy
of unicode standard is quite different from that of present approach.
The former is based on character encoding and the latter is
first glyph encoding and character encoding coming out of this
I will try to illustrate the differences in the two approaches by
discussing an analogy here (I would appreciate much if the
unicode experts in the forum correct me wherever my presentation
goes wrong). 
The analogy is an architect trying to build a house consisting of
many walls. Unicode scheme is like the architect defining the
dimensions of the wall without really get into the details of
how you construct the wall (the size and the number of bricks
that you use). The architect defines only a minimum of foundation
blocks/bricks. The walls can be made out of red clay, marbles
or concrete, a handful of large ones or many small bricks.
Unicode tries to define the basic characters that constitute the
alphabets of the language but leaves the details of its implementation
(the type and the number of glyphs) to the software developers.
Like large bricks, you can use a single glyph to represent each
character. Or like small bricks, you can use many glyphs to 
construct the same single character.
Unicode sees the different indic languages under one umbrella
like architect sees many houses that constitute a housing colony.
In an attempt to provide aesthetics, the architect can impose some
uniformity in the way all the houses of the colony are built.
Unicode starts with Devanagari as the most complex house to
build and sees all others (incl. Tamil ) as simplified versions of
So we have many basic/foundation bricks in Devanagari but only
a handful in tamil. (only one ka in tamil instead of four ks ka, kha, 
ga, gha ).

To assist all of us, I have put up a gif under the URL
This gives examples of how various tamil words will be
displayed on screen with corresponding storage formats 
under the proposed 8-bit encoding (version 1.4) and in 
Unicode 2.0 formats. Please consult this gif and comment.

Muthu gave a nice presentation at the last TamilNet'97 on how
Unicode scheme is envisaged. You can read his paper at



Sponsors/Advertisers  needed -  please email bala@tamil.net
Check out the tamil.net web site on <http://tamil.net>
Postings to <webmasters@tamil.net>. To unsubscribe send
the text - unsubscribe webmasters - to majordomo@tamil.net

Home | Main Index | Thread Index