Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[WMASTERS] Anbu Arasan mail-repost




________________________________________________

This week's sponsors -The Asia Pacific Internet Company (APIC)
  @  Nothing Less Than A Tamil Digital Renaissance Now   @
<http://www.apic.net> Click now<mailto:info@apic.net> for instant info
________________________________________________


Mani wrote:
>>on Sep 17 AnbuArasan clearly explain, that the proposed encodings should
>>be based on basic characters  not glyphs, and he gave the reasons. At
>>this point we have to look at his mail seriously. can any one of you
>>repost that mail? 
>
>Unfortunately I don't have that post.  I second this request.

Here I repost the posting of Anbu Arasan referred to above.

-----REPOST------

Received-Date: Wed, 17 Sep 97 19:55:53 +0530
Posted-Date: Wed Sep 17 19:29:09 1997
Message-Id: <9709171359.AA07487@axcess.nbanglore.axcess.net.in>
To: tamilnet@tamilnews.org.sg
X-Sent-To-Axcess: sujatha@md2.vsnl.net.in(1|NN| )
X-Sent-To-Axcess: govind@irdu.nus.sg(1|NN| )
X-Sent-To-Axcess: ananda@md2.vsnl.net.in(1|NN| )
X-Sent-To-Axcess-Cc: ghart@socrates.berkeley.edu(1|NN| )
X-Sent-To-Axcess-Cc: haroldfs@ccat.sas.upenn.edu(1|NN| )
Date: Wed Sep 17 19:29:09 1997
X-Mailer: aXcess Mail (version 2.0)
Subject: Thought provoking on Tamil encoding
Sender: owner-tamilnet@irdu.nus.sg
Precedence: bulk
Reply-To: anbu.arasan@axcess.net.in

--------
It is my sincere effort to make very clear that in no way I intend to
hurt  
or lament anyone and my interest is purely and solely to put my views
for   
the enshrinement of the sweet Tamil.  If at all, in any way at any place
of 
my writing puts someone doing their mighty service for the enshrinement
of  
TAMIL or those using it, I once again repeat and highlight it to forego
and 
forgive it for the sake of prosperity of the language for which such
an     
unprecedented discussions are taking place.  No doubt, these
discussions    
would pave means for better understandings and the best solutions for
the   
TAMIL.

Tamil language is evolved and reformed over a period of time immemorial.

Since, we believe in what we are seeing, some of the participants in
the    
discussion believe that representing (displaying and printing) of Tamil
on  
computers is Tamil encoding. 

Tamil has witnessed and withstood many changes in its script form as
well as
 in its character set.

Before starting of encoding of Tamil glyphs, a requirement (aim) has to
be  
formulated about what is that is going to be encoded, without which it
is   
not advisable to select, segregate the Tamil glyphs as per ones taste.

I want to make one point very clear. There appears to be some
confusion     
somewhere within the ambit of discussion between font/glyph and
character   
set. Set of glyphs is not a Tamil character set, itself. The character
set  
of Tamil is "uzhir eluthukkal and Mei eluthukkal". These thirty letters
are 
the basis for Tamil and the combinations of these letters forms hundreds
of 
characters and it is not possible to encode all these characters
on         
computers. The basic common characters are considered as character set
of   
Indian languages and encoded in ISCII (new standard).

It is because of some misinterpretation of some people involved in
the      
earlier versions of ISCII Standards, an unnecessary coding appears to
have  
been done for matra characters. These matras (vowel signs) are
indicating   
the corresponding vowel present in the "uyir mei eluthukkal". These
vowel   
signs could be a just one sign or two or three amongst the Indian
languages 
(In Tamil, only upto two signs are used). It can come only on right side
as 
in "kA,ki,kI" etc. or on only left side as in 'kai,ke,kE" etc. or on
both   
the sides as in "ko,kO, kou" etc. It is not so only in Tamil, but also
in   
some of the other Tamil influenced languages like Malayalam,
Oriya,         
Bengali(Bangla), Assamese, etc.

Even though we call the composite characters as "Uyir mei" its
actual       
composition stands out to be consonant (Mei) and vowel (uyir). Using
this as
 a basis, Indian scripts being coded on the computers. This is
applicable   
even to earlier ISCII Standards ISCII-91 (called as level 1). In
ISCII-91   
consonants are followed by matras. It is the same in Unicode also.
KANNAL   
Kanpadthum poi....... theera vicharippadhe mei.  It seems that most of
the  
participants didn't understood the encoding followed in ISCII and as
well as
 in Unicode.

I humbly repeat, character encoding and font design are two different
issues
 these are not be mixed up together.

It is widely misunderstood by someone as the current discussion on
encoding 
glyphs as encoding Tamil on computers and is the basis for enshrining
Tamil 
electronically. This appears to be a wrong conception and false
image       
engulfed in the discussion.

Font encoding cannot solve many issues like, sorting, searching,
indexing   
and preserving Tamil itself. Font encoding is just one way of
displaying    
(rendering) Tamil on computers (since lot of maturing desired on
software   
development).

Regarding "glyph substitution" (wrongly stated as font substitution - a
font
 substitution means substituting one font, say 'arial' in
Windows           
environment with 'times new roman'), I feel, we can think as one of
the     
option. Since glyph substitution already implemented in windows NT
and      
windows 95, True Type fonts (this is not open type) 


is the best option. It is all depending on our requirement (all of us -
we  
have not yet decided to what environment, we are discussing the issue).
If  
we are talking about the future including the present day computers
capable 
of running windows 95 for PCs or system 7.x on Apple, we can
definitely     
adopt "Glyph substitution method". If our target is something else,
glyph   
substitution will fail to support us.  "Future international extensions
to  
True type may require a unique Glyph" is as mentioned in the True
type      
documentation "True type 1.0 font files - Technical specification
Version   
1.66" by Microsoft. Since True type is being promoted by both Microsoft
and 
Apple, it seems that Glyph substitution will continue.

The glyph ordering followed by Dr Kalyan seems to be illogical, to
arrive at
 correct order just follow the thamizh nedunkanakku.

I feel the Glyph encoding has to be discussed, whether we need 8 bit or
7   
bit, whether to support only GUI computers or atleast from AT 286 (most
of  
the Government offices still use these outdated machines in India) or
to    
cater to all electronic gadgets as someone pointed about POS.  I
have       
implemented few Indian languages on Pagers.

Someone may wonder to raise a query as to why we cannot use 128-160.
These  
128-160 is just a replication of 0-32. It means the 160 (no break space)
has
 to be same as 32 (space) with the same advance width.

Regarding Dr Herald Schiffmans' requirement and like-minded linguists
(the  
old Tamil letters are nothing but different 'varivadivam' for the same
basic
 constituents) is taken care in ISCII Standard. That is, any
Tamil          
literature could be stored using ISCII encoding scheme to preserve
Tamil.   
Since, no common interface softwares are available yet, the
current         
developers can provide a kind of converters to store in ISCII
Standard,     
(Apple has implemented ISCII - level in their machine and Microsoft is
going
 for ISCII level 2).

I remember, Dr Herald Schiffman was referring to quote marks. I would
like  
to present my view here. I feel his requirement is for us to have the
quote 
marks as used in Tamil texts (and in Indian languages and English in
India) 
that is the single quote will look like as if the comma is shifted to
match 
the ascender of the character. Since the Glyphs encoding is round about
8   
bit encoding retaining English, now, it is to accommodate in the upper
slot.
  Quote marks used in Tamil is different from the one used in
English.      
There are two different single quote marks as open quote and close
quote    
marks.  They are similar to inverted comma and comma as seen at
ANSI        
character position 145 and 146 in Arial fonts used in Windows.

In India, the Indian language numerals are seen to gain its
popularity      
(except Tamil) because of the pushing effort and as it is being
recognised  
as part of the language itself (I feel, a language cannot be
complete       
without its own numbering system).

I have not seen the romanised keyboard which is proposed by the
Tamilnadu   
Standardisation Committee (has it been finalised). If, it is finalised,
does
 it uses only English alphabets or even diacritic marks. If it is only
based
 on the English alphabets it provides a keyboarding without any 'extras'
it 
is the end of transliteration subject.  I feel the transliteration
scheme   
should facilitate to key in tamil without any extra font or softwares.

In conclusion of my views, I suggest to encode Tamil based on its
basic     
character set.  Tamil is not like English having one to one
relationship    
between character coding and display.  Tamil has to be handled by two
level.
 I.e., an encoding based on Tamil characters and a font to render
(display) 
Tamil Script.  In the present scenario, It is not possible to have a
single 
character encoding scheme and single font encoding scheme to cater to
all   
the living computers and its operating systems.  ASCII in the
DOS           
environment and ANSI in the WINDOWS ( and other) environment are
two        
different encoding scheme.

ANBU arasan.


________________________________________________

Sponsors/Advertisers  needed -  please email bala@tamil.net
Check out the tamil.net web site on <http://tamil.net>
Postings to <webmasters@tamil.net>. To unsubscribe send
the text - unsubscribe webmasters - to majordomo@tamil.net
________________________________________________



Home | Main Index | Thread Index