A proposal for font encoding scheme for tamil

To: RANGARAJAN S <sujatha@md2.vsnl.net.in>
Subject: A proposal for font encoding scheme for tamil
From: "Dr.K. Kalyanasundaram" <kalyan@igcsun3.epfl.ch>
Date: Mon, 01 Sep 1997 14:45:03 +0000
CC: tamilnet@tamilnews.org.sg, swami@pondiuni.ren.nic.in, Naa Govindasamy <govind@irdu.nus.sg>, Tan Tin Wee <tinwee@irdu.nus.sg>, ananda@md2.vsnl.net.in, Muthu Nedumaran <muthu@murasu.com>, Bala Pillai <bala@apic.net>, webmasters@tamil.net
Content-Length: 7068
Content-transfer-encoding: 8bit
Content-Type: text/plain; charset="iso-8859-1"
Organization: Swiss Federal Inst. of Technology
Reply-To: kalyan@igcsun3.epfl.ch
Reply-To: "Dr.K. Kalyanasundaram" <kalyan@igcsun3.epfl.ch>
Sender: owner-tamilnet@irdu.nus.sg

Dear Sujatha:
You may recall that early this year, there were extensive discussions in
tamil.net
on possible standards for font encoding and keyboard layouts. Even
though
I am not a computer scientist in any sense, I participated in many of
these 
discussions, giving my viewpoints evolved over the last three years with
my interests in tamil computing.  Topics discussed
were: 7-bit (128 characters) vs. 8-bit (256 character) encoding schemes,
keeping
or leaving out old style tamil characters such as the forward kokki in
lai/Nai, Ra,..
the possibe limitations to kerning in point-sale systems and a limited
character choice
that can impose reforms in the way tamil is currently written. Standards
for 
Transliterated/romanized form of writing tamil were not discussed even
though, in
my opinion, this issue must be discussed concurrently with font encoding
for
tamil scripts. Many of the points are indicated/summarized in my
presentation 
at the recent singapore TamilNet'97 Conference. 

In tamil.net discussions and in my presentation at Singapore, I
suggested 
a possible font encoding scheme that is modelled on ISO 8859-X 
schemes currently used widely for handling all the european languages. 
Such a scheme will be easily understood by all in the computing world, 
even if they are not conversant with tamil. It can be implemented at
very 
short notice and can co-exist happily even with possible Unicode
standard.

After giving due considerations to discussions held in tamil.net email
discussion
group and also those expressed publicly or privately during Singapore
Conference,
I would like to make the following propositions for possible character
choices
for a tamil font encoding scheme in a 8-bit encoding scenario. 
After presentation of the character choices as such, I elaborate a bit
on the 
motivations behind this choice. I think this character set is a
reasonable,
viable one, acceptable to majority of tamils irrespective of the nature
of the
tamil dtp software/font being used.  I would appreciate much if you can
bring this proposal to the attention of the Tamilnadu Advisory
Committee.
I humbly request the distinguished members of the  Committee to consider 
this proposal in their deliberations. It may not be the perfect one for
the
committee to adopt as such but it can serve certainly as the starting
point for
further refinement if necessary. Needless to say, I am at the
disposition 
of the committee for any clarifications or follow-up. 

A PROPOSAL FOR POSSIBLE FONT ENCODING 
SCHEME FOR TAMIL
(you need to have anjal/inaimathi font to see tamil alphabets in
tamil!!) 

 SCHEME: 
  8-BIT (256 CHARACTER SLOTS) with the standard
  roman characters occupying the first 128 slots as in Latin-1 or 
  Lower ASCII scheme.
  The scheme is modelled on ISO 8859-X schemes currently in use
  (such as 8859-1/Latin -I, 8859-2/Latin-II, ..)
  I leave open the issue of actual assignment of tamil characters to 
  the upper ASC II slot (128-255 ) for the moment.

CHARACTER CHOICES
vowels: 12   
        (  ‚  ƒ  „  …  †  ‡  ˆ  ‰ Š ‹ Œ )
consonants:  18 
       ( š  œ £ ¥ «  °  µ  º  À  Æ Ì Ò × Ý é ã î )
modifiers:  10 
      (virama dot, O‘, O ’ , O “, O”, O•, —O, þO, €O, 
            and the kokki/ hook for old style lai/Lai/Nai/nai  )
unique uyirmeis
   akaram eRRiya iyir  18
                 ( ™ › ¢ ¤ ª ¯ ´ ¹ ¿ Å Ë Ñ Ö Ü â í è  ò  )
   aakara varisai   3       ( old style Raa, Naa, naa )
   ikara varisai   1     (  ¦ )
   iikaara varisai  1  (  § )
   ukara varisai   16   (ngu, NYu are omitted )
  uukara varisai  16   (ngU, NYU are omitted )
   aikaara varisai:  0   
           (use € and modifier for old style lai, Nai, nai and Lai !! )
   grantha:  6  ( ÷, õ, û, ù, ó, sri ) 
 Note: for all grantha ones use the modifiers to get ikara, 
                 iikara, ukara.  varisais !!
and
  diacritical markers:  4  (two dashes one above and one below the
                        character,  two dots one above,  one below) 
total:   105

FEATURES OF THIS PROPOSED CHARACTER SET
The scheme accommodates almost all the points raised in the
tamil.net discussions. In addition, it has the follwing key features:

a) In the present proposal,  kerning is invoked ONLY FOR TWO SERIES
  ikara varisai using  ’  modifier and iikaara  varisai using  “
modifier. 
  All other unique tamil characters (uyir, uyirmeis) are kept as such!

b)  has all key grantha characters. It is proposed to use modifiers to
   get all the required compound ones in the currently used form

c) has provisions to get old style lai/Lai/Nai/nai and also old style
          Raa, Naa and naa.

d)  has the four diacritical markers required to use along with roman
      letters to write transliterated tamil in the Library of Congress 
     transliteration scheme

e) has still more than a dozen (12) empty slots / can include tamil
      numerals or leave empty for future revisions (preferred choice)
   (On windows there have been problems using characters placed
   at 14-144, 160, double quotes, bullets etc...)

Muthu Nedumaran in his proposals prefer to keep as many of the 
uyirmeis as such, due to difficulties in implementing kerning on
simple LCD displays/point sale systems and the need for high quality
production of tamil texts comparable to current printing. 
In the present proposal,  kerning is invoked ONLY FOR TWO SERIES
ikara varisai using  ’  modifier and iikaara  varisai using  “ modifier. 
(All other unique tamil characters (uyir, uyirmeis) are kept as such!)
Since it is a right  end modifier, there should not be any problem in
 implementation. Secondly, demanding requirements of professional
printing houses can be readily met by storing high quality versions
of the entire uyirmeis in the software and calling them during the 
printing process. In fact many of the tamil DTP softwares (incl.
those that use romanized/transliterated input) are of the "interpreted"
type where a given sequence of typed characters are replaced by
equivalent tamil characters. Even the displays of LCD screens in
point-scale systems are not permanent. The screen is constantly
re-written and so the complex tamil characters can be called and
displayed, as is currently done for many south asian languages.
Many of the computer professionals I talked to, confirm that this
is indeed feasible in present/today technology.
In short I do not see any serious problems in delivering high quality
outputs using the above character choices in font encoding scheme.

I may also add that, the above proposed scheme of mine is very 
similar to Dr. Nandasara's proposals for tamil presented at the 
recent TamilNet'97 conference held in Singapore. 

With best regards,
Kalyan
(K. Kalyanasundaram)

--
*******************************************************************
Dr. K. Kalyanasundaram,            |
Institute of Physical Chemistry,   | Tel: 41-21-693 3622 (off)
Swiss Federal Inst. of Technology  | Fax: 41-21-693 4111
CH-1015 Lausanne, Switzerland      | Email:kalyan@igcsun3.epfl.ch
*******************************************************************

Follow-Ups:
- Re: A proposal for font encoding scheme for tamil
  - From: RANGARAJAN S <sujatha@md2.vsnl.net.in>

Next by Date: tamil character set choice for a font encoding scheme
Next by thread: Re: A proposal for font encoding scheme for tamil
Index(es):
- Date
- Thread

Home | Main Index | Thread Index