Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: font encoding -possible slot assignments for glyphs

Dear Kalyan and friends,

--- Kalyan wrote ---
>Dear Friends:
>To keep advancing in our discussions for a possible standard 
>font-encoding scheme for tamil, I have now put up a gif image file 
>that shows possible tamil character glyphs that can go in the
>future (standard) font encoding scheme (font) for tamil. The 
>URL reference is:
>    http://www.geocities.com/Athens/5180/charset1.gif

First - thanks for a nice job !  We finally have a first cut that
we can now tune :-)

>a) To ensure high quality output, kerning process is kept to the
>bare minimum and majority of the tamil character glyphs are 
>kept as such.  Only the ikara and iikara varisai uyirmeis are 
>to be typed using the modifier keys. (thus only 36 out of 256
>are generated this way). Since these are right-end
>modifiers there should not be problems in implementation.
>Also most of the DTP packages that allow romanized/phonetic
>input are "interpreted output" type. For these, high quality 
>versions of the above set of uyirmeis can be stored and 
>called up when necessary.

I find this acceptable.  It's consistent and logical.  Though
'di' and 'dI' are the only odd ones (from the ikara and ikaara
varisai) - it's still acceptable as it will not be presentable
to represent the kokkis falling away from 'da'.

>b) old style characters (for lai/Nai/nai, Ra, Naa, naa) are kept
>mainly to ensure that, electronic archiving of ancient tamil 
>literature in the original form in which they were first written 
>is possible. We do not have to sacrifice anything for this.
>In Singapore conference, many including Prof. H. Schiffman
>emphasised that only if we make provisions for old  style
>characters it will be possible to electronically reproduce/publish
>literatures that are still in palm-leaf manuscripts. TamilNet'97
>conference held recently in Singapore officially decided to 
>keep these old  style characters in font encoding scheme.
>No one is obliged to use them if they do not want. 
>I propose that DTP packages be written in such a way that the
>default option is the modern version. Old version output given
>as a possible pull-down menu option.

I don't find this acceptable and would appreciate comments and/or
education from the team ;-). Including the old style set (I just call
those a set for simplicity) in a character set will certainly cause
ambiguity as there is duplication in the *value* of the character.
(i.e. kaiyakaram and vaathu are of the same *value*).  We *should not*
see a character encoding *purely* from a publishing perspective. 
Retaining palm-leaf manusripts in their original forms means *scanning*
them and storing them as images.  The moment we decide to store them
as etext, the originality of appreance is lost.  If they must be rendered
in the old form, this can be (and I suggest *has* to be) done with
electronic manupulation.  The new and emerging OpenType font standard
(which most o/s vendors have committed support for - even java)
allows for *glyph substitution*.  In fact this can already be done
in Windows NT today !

By *not* including the old set into the character set, we avoid any
kind of *overloading* in the way we represent a word.  For example,
we can be absolutely sure that "nandraaka" (I'm not using any translit.
std here) can only be represented in exactly *one* sequence of characters
- whichever the platform may be.  I need not explain the benefits again
in being able to do this :-).

(BTW - the old-style RA is missing ;-) )

>c) grantha characters: are kept for the same reasons indicated
>above under (b). The modifiers are to be used to get the ikara,
>iikara, ukara, uukara varisais. Since these grantha characters
>are rarely used, one can accept some medium quality output
>for these grantha ones. Here again, softwares can be written
>in such a way to provide high quality output required for 
>commercial publishing houses.

I see a space after sri.  Is this to place the 'Sa' character ?  
If no - I'm fine, if yes - I'll have questions that may become
sensitive ;-)

>d) four diacritical markers are included that will allow typing
>tamil in the classical transliterated format, familiar and widely
>used by indologists. This way we can have one single integrated,
>bilingual font that allow typing tamil in tamil script and in
>transliterated format (romanized with or without diacritical
>markers) all at the same time!

I have some questions running in my mind on the *usefulness* of
this.  Hope to see (hear) some light from the team :

1. My original idea of a romanised (Sujatha - please allow me to
   use this word here as it is (technically) correct) in this
   context) input method was to facilitate input of Tamil through
   any non-graphical terminal/system that simply understands only
   7-bit ASCII.  I am of the opinion that if a transliteration
   scheme is based on this (plain 7-bit) ASCII, that scheme will
   help in both Tamil input to roman keys (using a roman
   keyboard) and storing of Tamil text in roman characters.  I 
   just see tremendeous benefits is doing this.  

2. Having these marks, again, is opening up an oportunity to
   store Tamil etext in two formats - Transliterated and Tamil.
   Correct ?  I see some rough-edges with this.  Can someone help
   convince me if this is OK ?  Couldn't we decide on just one
   format (preferably Tamil) and electronically transliterate
   the text on the fly if we need to render it transliterated
   (or romanised ;-))

3. I think it is *possible* to have transliterated Tamil text
   *without* diacritical marks - I see more disagreements to this
   - but would like to hear some deep thoughts ;-)

>e) space is still there to accommodate tamil numerals. 
>Yes, no body uses them these days. But these are there
>for reasons listed under (b) and also, most importantly, to 
>have the unicode/iscii standard set for tamil as a sub-set.
>If we can keep the unicode set as a sub-set of integrated font, 
>it will be possible to write up a one-to-one mapping table and allow 
>softwares to save tamil text files in these unicode/iscii format.
>This way we can make the present scheme co-exist happily
>in the unicode world and also facilitate smooth transitions
>at a later date. 

I think including Tamil numerals *are* important and there should
be no problem doing this :-)

Though I understand and appreciate the urgency for us to have this
done like *now*, I think we have to think through the issues (like
*yesterday* ;-)) before we press the button.  I really lookforward
to hearing your thoughts - this is keeping me awake at all times;-)



Home | Main Index | Thread Index