Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tamil computing - what it is all about



Dear Prof. Hart, Nagu, Muthu and friends:
>George Hart wrote:
>> ------Stuff deleted----
>> In any event, I do not understand Kalyan's insistence that works be
>> published with their original "glyphs."  Why in the world would we want to
>> do that?  The original Tolkappiyam was probably written in a variant of
>> Brahmi -- only a handful of people could even decypher it!  The fact is, we
>> do NOT need the old-style script or the numerals.  This is for a very good
>> reason: as I have stated many times, a standard should be invariant.  Users
>> should not have 2 choices about how to encode the same text.  There is
>> absolutely no need for the old-style letters -- the new ones produce
>> exactly the same combination of signs.  I should note that if someone is
>> uncomfortable with the new writing system, there is nothing to prevent him
>> or her from designing a font that has the old one and converting from the
>> new to the old with a simple Word macro.
>> 
>> George Hart
>> 
>Thank you very much for such a nice explanation. Except for **showing**
>that: "see, we had letters with such styles", I really do not see any
>use of them.
>
>As you explained further, if at all somebody desparately needs that to
>be shown in old style, we still have the data, they can always do some
>manipulation and show it.

Frankly I am very much disappointed with Prof. Hart's statement and
Nagu's
seconding of it . Prof. Hart, I know is anxious like to me to round
up these discussions and agree on a standard as soon as possible. 
But please, let us not sacrifice objectives on the way and
go for a scheme with known handicaps. As far as the topic of having or
not having ORNL,  "I do not use them, so do not put them"; "I do not
like 
anyone using them, so we should not put them" etc are pure emotional 
statements. They do address the technical issue. Even if we do not
want to put them in the encoding scheme, we should address the question
of how can one any access them. The only sensible tread has been that 
of Muthu, trying to focus on an alternative - if we do not have have 
them in the scheme how can we bring them for the interested parties. 
Glyph substitution was one proposed approach. 
I will take up the technical part in a separate posting so that those
not interested in technical details can skip them.
I would to emphasise that I am interested to pursue technical
discussions
of various issues involved and not in political/emotional debate.
Please bear with me when I would like to point out the risks
we run into going for a quick consensus. It is like 'cutting arms 
to fit the newly bought shirt'. It is only we humans we impose
restrictions. With computers we do not have to cut corners. Let us a 
design a shirt that we can all be proud of wearing.

No one can deny the fact that the language script and the words evolve
all the time. If I am not mistaken, the whole subject of linguistics is
nothing
but study of this interesting human-related phenomenon. The only place
you can find any evidence for this evolutionary nature of the language
is
its literature. Now if the archived electronic texts do not reflect them
by
sticking as much as possible to the original version, where on earth one
can find the evidence for the evolution of the language? In my opinion,
any electronic text that uses the modern script to REWRITE old
literature
is nothing but an edited version and can never claim to be an AUTHENTIC
replica. Just for argument sake let us consider the scenario "TN Govt 
decides to go for the script scheme Anu proposed earlier of dropping off
all unique uyirmei glyphs of tamil and use only uyirs and meis to write 
tamil (no no, I donot want to open up this topic here again - this is
just 
for argument), should be archive early literature ONLY in this new
format?
Then it suffices to have an encoding scheme that has only the uyirs
and meis. In evolutionary topic of the present kind, there can always be
additional options to have versions "tailored to the day's needs" . But
we can never run away from going for a true "authentic " replica. Else
knowingly or unknowingly we will be burying all the history associted 
with the language.

Even the UC Berkeley Library requirement for any unicode-
compatible font states that, it should have the feasibility to handle
glyphs
that are no longer in use. Librarians involved in electronic archiving
are
very much aware of this necessity. Can we stretch these arguments
(let us not use the glyphs we do not use) a bit further and say let us
also
not use the words we not use them any more. Can we ask  Dr. Malten
and others to archive the early Sangam literature using words that are
currently in use only? In such a case we will be producing
"commentaries"
at most that are evolutionary markers/milestones but not, I repeat
"authetintic
reproductions".  Prof. Hart cites Tolkappiyam. Only to accommodate
cases of this kind, in one of my very early posts I raised the
possibility
of just leaving 8-10 slots as "user-defined" so that special needs are
met readily this way. Unicode authorities have assigned a big block
for uses of this kind.

 I find even Prof. Hart's statements on etexts a bit contradictory.
In earlier posts he made a strong case to keep grantha glyphs because
Bharathiar, Ramalinga Adigal and many others have used them. Now
for a couple of ORNL glyphs he suggest that we replace them by the
current ones. I am not sure if he has revised his opinion. 

Early this year when I was talking to (notable tamil scholars based in
Chennai), Profs. John Samuel and M. Shanmugam Pillai,
I was asking them about the sources I can use to prepare etexts of the
entire thirumarais. They warned me that I should be extremely careful
in choosing the right one. Tamil scholars, particularly the "purists"
are very particular about how one reproduces an early literature. Strict
observance of sandhi rules is one such example. (Alas, the reference
ones for thimurais they suggested for me are no longer in print !!).
So electronic archiving is a serious business and there is no place for 
sloppiness.  Any language encoding scheme that does not seriously
consider the needs associated with electronic archiving can be used 
only as a font for the general public and with very restricted usage. 
Right from the beginning of this, I have only the multi-usage font 
in my mind. From all the discussions up till now, it appears most 
are interested only in the later type - one for routine use, tailored
for the day.

Let me now come to the second point raised regarding tamil numerals 
and associated two standards in the same scheme. This is a case
of misleading statements leading to wrong conclusions.
If Muthu or Nagu have not stated explicitly, let me make it clear
here. Tamil numerals are not the same as roman numerals in 
the scheme. They are plain glyphs as kuu or cuu etc and
do not have any functional role whatsover. You cannot use them
to do any arithmatic with the keypad as much as you cannot do
any arithmatic when we write X to represent ten and IV to 
represent IV. The numeric keypad for example will not accept
input of these as much as your inability to do X + IV = ?.
The same argument will hold in full sense with old style
lai, Raa etc. If these glyphs have unique codes there are 
functionally very different from the corresponding modern
versions. (On the contrary only if we do not assign unique
codes, phonetic/romanized inputs methods can have difficulty 
interpreting what to give to the user. With unique codes,
one can easily define default options and provide the correct ones).
Let us get this clear picture: neither tamil numerals nor
ORNL duplicate the corresponding roman numeral in any
practical sense. Hence we DO NOT HAVE TWO standards.
Period.
In the 8-bit scheme we are currently discussing, if at all there 
are two standards they will refer to tamil script standard and 
romanized tamil. But within the tamil script scenario, there 
could be two representations possible but not two standards.

I am prepared to go for a scheme without having ORNL and others
with specific codes. But we should be clear on how these
can be made available even on primitive machines. So let us
discuss technical merits of alternative propositions.

Kalyan


Home | Main Index | Thread Index