Tamil Discussion archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
tamil character set choice for a font encoding scheme
In my earlier email with the proposal for a possible font
encoding scheme for tamil, I used "inaimathi/anjal" font
to display the tamil characters choices. I failed to realise
that many of the members of the Tamilnadu Advisory
Committee may not be using Anjal/Inaimathi font .
So I am reproducing herewith the possible character
set alone but this time the choices are indicated using
a plain ASCII transliteration scheme. This way the proposal
can be read in all terminals including plain dumb ones.
So you can replace the earlier one with the present one.
Sorry for this lapse/short sighted action. My sincere
You may recall that early this year, there were extensive discussions
in tamil.net on possible standards for font encoding and keyboard
layouts. Even though I am not a computer scientist in any sense,
I participated in many of these discussions, giving my viewpoints
evolved over the last three years with my interests in tamil computing.
Topics discussed were: 7-bit (128 characters) vs. 8-bit (256 character)
encoding schemes, keeping or leaving out old style tamil characters
such as the forward kokki in lai/Nai, Ra,.. the possibe limitations
to kerning in point-sale systems and a limited character choice
that can impose reforms in the way tamil is currently written.
Standards for Transliterated/romanized form of writing tamil
were not discussed even though, in my opinion, this issue must
be discussed concurrently with font encoding for tamil scripts.
Many of the points are indicated/summarized in my presentation
at the recent singapore TamilNet'97 Conference.
In tamil.net discussions and in my presentation at Singapore, I
suggested a possible font encoding scheme that is modelled on
ISO 8859-X schemes currently used widely for handling all
the european languages. Such a scheme will be easily understood
by all in the computing world, even if they are not conversant
with tamil. It can be implemented at very short notice and can
co-exist happily even with possible Unicode standard.
After giving due considerations to discussions held in tamil.net
email discussion group and also those expressed publicly or
privately during Singapore Conference, I would like to make
the following propositions for possible character choices
for a tamil font encoding scheme in a 8-bit encoding scenario.
After presentation of the character choices as such, I elaborate
a bit on the motivations behind this choice. I think this character
set is a reasonable, viable one, acceptable to majority of tamils
irrespective of the nature of the tamil dtp software/font being
used. I would appreciate much if you can bring this proposal
to the attention of the Tamilnadu Advisory Committee.
I humbly request the distinguished members of the Committee
to consider this proposal in their deliberations. It may not be
the perfect one for the committee to adopt as such but it can
serve certainly as the starting point for further refinement if
necessary. Needless to say, I am at the disposition
of the committee for any clarifications or follow-up.
A PROPOSAL FOR POSSIBLE FONT ENCODING
SCHEME FOR TAMIL
(the tamil character choices are indicated in romanized format!)
8-BIT (256 CHARACTER SLOTS) with the standard
roman characters occupying the first 128 slots as in Latin-1 or
Lower ASCII scheme.
The scheme is modelled on ISO 8859-X schemes currently in use
(such as 8859-1/Latin -I, 8859-2/Latin-II, ..)
I leave open the issue of actual assignment of tamil characters to
the upper ASC II slot (128-255 ) for the moment.
(a, aa/A, i, ii/I, u, oo/U, e, ee/E, ai, o, O, au, ak)
(ka, nga, ca, nya, da/ta, Na, tha, n^a, pa, ma, ya,
ra, la, va, zha, Ra, La, na )
(virama dot, kaal for aakara varisai as in paa, kokki for
ikara varisai as in pi, kokki for iikara varisai as in pii,
left addon for ekara varisai as in pe, left addon as in
Ekara varisai as in pE, left addon for aikara varisai as
in pai and the kokki/ hook for old style lai/Lai/Nai/nai )
akaram eRRiya iyir 18
(ik, ing, ic, iny, it/id, iN, ith, in^, ip, im, iy, ir,
il, iv, izh, iR, iL, in )
aakara varisai 3 ( old style Raa, Naa, naa )
ikara varisai 1 ( di )
iikaara varisai 1 ( dii )
ukara varisai 18
uukara varisai 18
aikaara varisai: 0
(use € and modifier for old style lai, Nai, nai and Lai !! )
grantha: 6 ( ÷, ő, ű, ů, ó, sri )
Note: for all grantha ones use the modifiers to get ikara,
iikara, ukara. varisais !!
diacritical markers: 4 (two dashes one above and one below the
character, two dots one above, one below)
FEATURES OF THIS PROPOSED CHARACTER SET
The scheme accommodates almost all the points raised in the
tamil.net discussions. In addition, it has the follwing key features:
a) In the present proposal, kerning is invoked ONLY FOR
TWO SERIES ikara varisai using ’ modifier and iikaara
varisai using “modifier.
All other unique tamil characters (uyir, uyirmeis) are kept as such!
b) has all key grantha characters. It is proposed to use modifiers to
get all the required compound ones in the currently used form
c) has provisions to get old style lai/Lai/Nai/nai and also old style
Raa, Naa and naa.
d) has the four diacritical markers required to use along with roman
letters to write transliterated tamil in the Library of Congress
e) has still more than a dozen (12) empty slots / can include tamil
numerals or leave empty for future revisions (preferred choice)
(On windows there have been problems using characters placed
at 14-144, 160, double quotes, bullets etc...)
Muthu Nedumaran in his proposals prefer to keep as many of the
uyirmeis as such, due to difficulties in implementing kerning on
simple LCD displays/point sale systems and the need for high quality
production of tamil texts comparable to current printing.
In the present proposal, kerning is invoked ONLY FOR TWO SERIES
ikara varisai using ’ modifier and iikaara varisai using “ modifier.
(All other unique tamil characters (uyir, uyirmeis) are kept as such!)
Since it is a right end modifier, there should not be any problem in
implementation. Secondly, demanding requirements of professional
printing houses can be readily met by storing high quality versions
of the entire uyirmeis in the software and calling them during the
printing process. In fact many of the tamil DTP softwares (incl.
those that use romanized/transliterated input) are of the "interpreted"
type where a given sequence of typed characters are replaced by
equivalent tamil characters. Even the displays of LCD screens in
point-scale systems are not permanent. The screen is constantly
re-written and so the complex tamil characters can be called and
displayed, as is currently done for many south asian languages.
Many of the computer professionals I talked to, confirm that this
is indeed feasible in present/today technology.
In short I do not see any serious problems in delivering high quality
outputs using the above character choices in font encoding scheme.
I may also add that, the above proposed scheme of mine is very
similar to Dr. Nandasara's proposals for tamil presented at the
recent TamilNet'97 conference held in Singapore.
With best regards,
Dr. K. Kalyanasundaram, |
Institute of Physical Chemistry, | Tel: 41-21-693 3622 (off)
Swiss Federal Inst. of Technology | Fax: 41-21-693 4111
CH-1015 Lausanne, Switzerland | Email:firstname.lastname@example.org
Main Index |