Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tamil character set choice for a font encoding scheme



Dear Sujatha:
In my earlier email with the proposal for a possible font
encoding scheme for tamil, I used "inaimathi/anjal" font
to display the tamil characters choices. I failed to realise
that many of the members of the Tamilnadu Advisory
Committee may not be using Anjal/Inaimathi font .

So I am reproducing herewith the possible character
set alone but this time the choices are indicated using 
a plain ASCII transliteration scheme. This way the proposal
can be read in all terminals including plain dumb ones.
So you can replace the earlier one with the present one.

Sorry for this lapse/short sighted action. My sincere
apologies.

Kalyan


----
Dear Sujatha:
You may recall that early this year, there were extensive discussions 
in tamil.net on possible standards for font encoding and keyboard 
layouts. Even though I am not a computer scientist in any sense, 
I participated in many of these discussions, giving my viewpoints 
evolved over the last three years with my interests in tamil computing. 

Topics discussed were: 7-bit (128 characters) vs. 8-bit (256 character)
 encoding schemes, keeping or leaving out old style tamil characters
 such as the forward kokki in lai/Nai, Ra,.. the possibe limitations 
to kerning in point-sale systems and a limited character choice
that can impose reforms in the way tamil is currently written. 
Standards for Transliterated/romanized form of writing tamil 
were not discussed even though, in my opinion, this issue must 
be discussed concurrently with font encoding for tamil scripts. 
Many of the points are indicated/summarized in my presentation 
at the recent singapore TamilNet'97 Conference. 

In tamil.net discussions and in my presentation at Singapore, I
suggested a possible font encoding scheme that is modelled on 
ISO 8859-X schemes currently used widely for handling all 
the european languages. Such a scheme will be easily understood 
by all in the computing world, even if they are not conversant 
with tamil. It can be implemented at very short notice and can 
co-exist happily even with possible Unicode standard.

After giving due considerations to discussions held in tamil.net 
email discussion group and also those expressed publicly or 
privately during Singapore Conference, I would like to make 
the following propositions for possible character choices
for a tamil font encoding scheme in a 8-bit encoding scenario. 

After presentation of the character choices as such, I elaborate 
a bit on the motivations behind this choice. I think this character
set is a reasonable, viable one, acceptable to majority of tamils 
irrespective of the nature of the tamil dtp software/font being 
used.  I would appreciate much if you can bring this proposal 
to the attention of the Tamilnadu Advisory Committee.
I humbly request the distinguished members of the  Committee 
to consider this proposal in their deliberations. It may not be 
the perfect one for the committee to adopt as such but it can 
serve certainly as the starting point for further refinement if 
necessary. Needless to say, I am at the disposition 
of the committee for any clarifications or follow-up. 

A PROPOSAL FOR POSSIBLE FONT ENCODING 
SCHEME FOR TAMIL
(the tamil character choices are indicated in romanized format!) 

 SCHEME: 
  8-BIT (256 CHARACTER SLOTS) with the standard
  roman characters occupying the first 128 slots as in Latin-1 or 
  Lower ASCII scheme.
  The scheme is modelled on ISO 8859-X schemes currently in use
  (such as 8859-1/Latin -I, 8859-2/Latin-II, ..)
  I leave open the issue of actual assignment of tamil characters to 
  the upper ASC II slot (128-255 ) for the moment.

CHARACTER CHOICES
vowels: 12   
       (a, aa/A, i, ii/I, u, oo/U, e, ee/E, ai, o, O, au, ak)
consonants:  18 
      (ka, nga, ca, nya, da/ta, Na, tha, n^a, pa, ma, ya, 
     ra, la,  va, zha, Ra, La, na )
modifiers:  10 
      (virama dot, kaal for aakara varisai as in paa, kokki for
      ikara varisai as in pi, kokki for iikara varisai as in pii,
      left addon for ekara varisai as in pe, left addon as in
       Ekara varisai as in pE, left addon for aikara varisai as
      in pai and the kokki/ hook for old style lai/Lai/Nai/nai  )
unique uyirmeis
   akaram eRRiya iyir  18
       (ik, ing, ic, iny, it/id, iN, ith, in^, ip, im, iy, ir,
           il, iv, izh, iR, iL, in )
   aakara varisai   3       ( old style Raa, Naa, naa )
   ikara varisai   1     ( di   )
   iikaara varisai  1  ( dii  )
   ukara varisai   18   
  uukara varisai  18  
   aikaara varisai:  0   
           (use € and modifier for old style lai, Nai, nai and Lai !! )
   grantha:  6  ( ÷, ő, ű, ů, ó, sri ) 
 Note: for all grantha ones use the modifiers to get ikara, 
                 iikara, ukara.  varisais !!
and
  diacritical markers:  4  (two dashes one above and one below the
                        character,  two dots one above,  one below) 
total:   109


FEATURES OF THIS PROPOSED CHARACTER SET
The scheme accommodates almost all the points raised in the
tamil.net discussions. In addition, it has the follwing key features:

a) In the present proposal,  kerning is invoked ONLY FOR 
TWO SERIES  ikara varisai using  ’  modifier and iikaara 
 varisai using  “modifier. 
  All other unique tamil characters (uyir, uyirmeis) are kept as such!

b)  has all key grantha characters. It is proposed to use modifiers to
   get all the required compound ones in the currently used form

c) has provisions to get old style lai/Lai/Nai/nai and also old style
          Raa, Naa and naa.

d)  has the four diacritical markers required to use along with roman
      letters to write transliterated tamil in the Library of Congress 
     transliteration scheme

e) has still more than a dozen (12) empty slots / can include tamil
      numerals or leave empty for future revisions (preferred choice)
   (On windows there have been problems using characters placed
   at 14-144, 160, double quotes, bullets etc...)

Muthu Nedumaran in his proposals prefer to keep as many of the 
uyirmeis as such, due to difficulties in implementing kerning on
simple LCD displays/point sale systems and the need for high quality
production of tamil texts comparable to current printing. 
In the present proposal,  kerning is invoked ONLY FOR TWO SERIES
ikara varisai using  ’  modifier and iikaara  varisai using  “ modifier. 
(All other unique tamil characters (uyir, uyirmeis) are kept as such!)
Since it is a right  end modifier, there should not be any problem in
 implementation. Secondly, demanding requirements of professional
printing houses can be readily met by storing high quality versions
of the entire uyirmeis in the software and calling them during the 
printing process. In fact many of the tamil DTP softwares (incl.
those that use romanized/transliterated input) are of the "interpreted"
type where a given sequence of typed characters are replaced by
equivalent tamil characters. Even the displays of LCD screens in
point-scale systems are not permanent. The screen is constantly
re-written and so the complex tamil characters can be called and
displayed, as is currently done for many south asian languages.
Many of the computer professionals I talked to, confirm that this
is indeed feasible in present/today technology.
In short I do not see any serious problems in delivering high quality
outputs using the above character choices in font encoding scheme.

I may also add that, the above proposed scheme of mine is very 
similar to Dr. Nandasara's proposals for tamil presented at the 
recent TamilNet'97 conference held in Singapore. 

With best regards,
Kalyan
(K. Kalyanasundaram)

--
*******************************************************************
Dr. K. Kalyanasundaram,            |
Institute of Physical Chemistry,   | Tel: 41-21-693 3622 (off)
Swiss Federal Inst. of Technology  | Fax: 41-21-693 4111
CH-1015 Lausanne, Switzerland      | Email:kalyan@igcsun3.epfl.ch
*******************************************************************


Home | Main Index | Thread Index