Tamil Discussion archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tamil computing - what it is all about
Dear Prof. Hart, Nagu, Muthu and friends:
>George Hart wrote:
>> ------Stuff deleted----
>> In any event, I do not understand Kalyan's insistence that works be
>> published with their original "glyphs." Why in the world would we want to
>> do that? The original Tolkappiyam was probably written in a variant of
>> Brahmi -- only a handful of people could even decypher it! The fact is, we
>> do NOT need the old-style script or the numerals. This is for a very good
>> reason: as I have stated many times, a standard should be invariant. Users
>> should not have 2 choices about how to encode the same text. There is
>> absolutely no need for the old-style letters -- the new ones produce
>> exactly the same combination of signs. I should note that if someone is
>> uncomfortable with the new writing system, there is nothing to prevent him
>> or her from designing a font that has the old one and converting from the
>> new to the old with a simple Word macro.
>> George Hart
>Thank you very much for such a nice explanation. Except for **showing**
>that: "see, we had letters with such styles", I really do not see any
>use of them.
>As you explained further, if at all somebody desparately needs that to
>be shown in old style, we still have the data, they can always do some
>manipulation and show it.
Frankly I am very much disappointed with Prof. Hart's statement and
seconding of it . Prof. Hart, I know is anxious like to me to round
up these discussions and agree on a standard as soon as possible.
But please, let us not sacrifice objectives on the way and
go for a scheme with known handicaps. As far as the topic of having or
not having ORNL, "I do not use them, so do not put them"; "I do not
anyone using them, so we should not put them" etc are pure emotional
statements. They do address the technical issue. Even if we do not
want to put them in the encoding scheme, we should address the question
of how can one any access them. The only sensible tread has been that
of Muthu, trying to focus on an alternative - if we do not have have
them in the scheme how can we bring them for the interested parties.
Glyph substitution was one proposed approach.
I will take up the technical part in a separate posting so that those
not interested in technical details can skip them.
I would to emphasise that I am interested to pursue technical
of various issues involved and not in political/emotional debate.
Please bear with me when I would like to point out the risks
we run into going for a quick consensus. It is like 'cutting arms
to fit the newly bought shirt'. It is only we humans we impose
restrictions. With computers we do not have to cut corners. Let us a
design a shirt that we can all be proud of wearing.
No one can deny the fact that the language script and the words evolve
all the time. If I am not mistaken, the whole subject of linguistics is
but study of this interesting human-related phenomenon. The only place
you can find any evidence for this evolutionary nature of the language
its literature. Now if the archived electronic texts do not reflect them
sticking as much as possible to the original version, where on earth one
can find the evidence for the evolution of the language? In my opinion,
any electronic text that uses the modern script to REWRITE old
is nothing but an edited version and can never claim to be an AUTHENTIC
replica. Just for argument sake let us consider the scenario "TN Govt
decides to go for the script scheme Anu proposed earlier of dropping off
all unique uyirmei glyphs of tamil and use only uyirs and meis to write
tamil (no no, I donot want to open up this topic here again - this is
for argument), should be archive early literature ONLY in this new
Then it suffices to have an encoding scheme that has only the uyirs
and meis. In evolutionary topic of the present kind, there can always be
additional options to have versions "tailored to the day's needs" . But
we can never run away from going for a true "authentic " replica. Else
knowingly or unknowingly we will be burying all the history associted
with the language.
Even the UC Berkeley Library requirement for any unicode-
compatible font states that, it should have the feasibility to handle
that are no longer in use. Librarians involved in electronic archiving
very much aware of this necessity. Can we stretch these arguments
(let us not use the glyphs we do not use) a bit further and say let us
not use the words we not use them any more. Can we ask Dr. Malten
and others to archive the early Sangam literature using words that are
currently in use only? In such a case we will be producing
at most that are evolutionary markers/milestones but not, I repeat
reproductions". Prof. Hart cites Tolkappiyam. Only to accommodate
cases of this kind, in one of my very early posts I raised the
of just leaving 8-10 slots as "user-defined" so that special needs are
met readily this way. Unicode authorities have assigned a big block
for uses of this kind.
I find even Prof. Hart's statements on etexts a bit contradictory.
In earlier posts he made a strong case to keep grantha glyphs because
Bharathiar, Ramalinga Adigal and many others have used them. Now
for a couple of ORNL glyphs he suggest that we replace them by the
current ones. I am not sure if he has revised his opinion.
Early this year when I was talking to (notable tamil scholars based in
Chennai), Profs. John Samuel and M. Shanmugam Pillai,
I was asking them about the sources I can use to prepare etexts of the
entire thirumarais. They warned me that I should be extremely careful
in choosing the right one. Tamil scholars, particularly the "purists"
are very particular about how one reproduces an early literature. Strict
observance of sandhi rules is one such example. (Alas, the reference
ones for thimurais they suggested for me are no longer in print !!).
So electronic archiving is a serious business and there is no place for
sloppiness. Any language encoding scheme that does not seriously
consider the needs associated with electronic archiving can be used
only as a font for the general public and with very restricted usage.
Right from the beginning of this, I have only the multi-usage font
in my mind. From all the discussions up till now, it appears most
are interested only in the later type - one for routine use, tailored
for the day.
Let me now come to the second point raised regarding tamil numerals
and associated two standards in the same scheme. This is a case
of misleading statements leading to wrong conclusions.
If Muthu or Nagu have not stated explicitly, let me make it clear
here. Tamil numerals are not the same as roman numerals in
the scheme. They are plain glyphs as kuu or cuu etc and
do not have any functional role whatsover. You cannot use them
to do any arithmatic with the keypad as much as you cannot do
any arithmatic when we write X to represent ten and IV to
represent IV. The numeric keypad for example will not accept
input of these as much as your inability to do X + IV = ?.
The same argument will hold in full sense with old style
lai, Raa etc. If these glyphs have unique codes there are
functionally very different from the corresponding modern
versions. (On the contrary only if we do not assign unique
codes, phonetic/romanized inputs methods can have difficulty
interpreting what to give to the user. With unique codes,
one can easily define default options and provide the correct ones).
Let us get this clear picture: neither tamil numerals nor
ORNL duplicate the corresponding roman numeral in any
practical sense. Hence we DO NOT HAVE TWO standards.
In the 8-bit scheme we are currently discussing, if at all there
are two standards they will refer to tamil script standard and
romanized tamil. But within the tamil script scenario, there
could be two representations possible but not two standards.
I am prepared to go for a scheme without having ORNL and others
with specific codes. But we should be clear on how these
can be made available even on primitive machines. So let us
discuss technical merits of alternative propositions.
Main Index |