Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [WMASTERS] Design goals of 8-bit Tamil encoding standard


This week's sponsors -The Asia Pacific Internet Company (APIC)
  @  Nothing Less Than A Tamil Digital Renaissance Now   @
<http://www.apic.net> Click now<mailto:info@apic.net> for instant info

Mani M. Manivannan wrote:
> ________________________________________________
> This week's sponsors -The Asia Pacific Internet Company (APIC)
>   @  Nothing Less Than A Tamil Digital Renaissance Now   @
> <http://www.apic.net> Click now<mailto:info@apic.net> for instant info
> ________________________________________________
> Since September we have been discussing the original draft proposed
> standard of Dr. Kalyan that has been revised four times as a result of this
> discussion.  Recently Dr. Selvaa proposed a serious alternative standard.
> During the discussion of this draft standard, we have considered
> linguistic, political, historical as well as emotional issues.  The
> discussions have at times wandered into areas that have contributed little
> to the standards process.  This is perhaps inevitable with a truly
> multinational group very interested in the welfare of the Tamil language
> and particularly since the design goals of the standard have not been
> clearly spelled out.
> Standards bodies have a tough task.  They should be conservative and desist
> from any attempt to innovate, reform or otherwise tamper with an
> established convention. At the same time they should anticipate the users
> needs. Otherwise the standard risks getting rejected by the users of the
> standard.  We will not be discussing a new standard if we are happy with
> UNICODE or ISCII for our current needs.

Mani, thanks for pulling me in to give my comments. you posted a valid
question. you are right. The Draft version of the proposed  version of
ISCII -1997, which is prepared by the Deparment of Electronics -
committee, will take care of all the things that we are discussing- with
slight modification.
just for info, the 1997 ISCII draft is aim at to arrive at an encoding
that satisfies the following important requirements:
1.  An encoding scheme that is independent of the input
process(keyboard, speech, OCR, hand writing, ect.)
2.  An encoding scheme that is independent of the output or rendering
process (display, print, speech, etc.)

the committee, which was set up by Government of India, in November
1996, has taken into accounts the followings, before drafting ISCII
1.  The use of Internet as a vast source of electronic information 
2.  the increasing demands on information management applications, and 
3.   the interest expressed by a number of International Computer system
manufacturers to provide Indian language support in their system.
 Mani, how many of us involved in this discussion has given thought for
the proposed ISCII-97,  and the implication it will bring to UNCODE? The
present Unicode is implemented with the recommendation of DoE and BIS
(Bureau of Indian Standards) based on ISCII 91 version. If this Draft
version of ISCII-97 is accepted as a standard, the next version of
UNICODE, will have the new code. UNICODE is tied with ISCII, DoE & BIS. 
 Eventhough I am in TNSCommitte, I know pretty well, that the Tamil Nadu
government cannot overrule DoE’s decision. TNSC can recommend to Tamil
Nadu government, to put a proposal to make slight changes in the
ISCII-97, provided we have a strong case to argue.  Tamil Nadu
government cannot make new standard on encoding. 
On 11 Sep you posed some very good questions. (What the recommended
process for character encoding standards creation and acceptance by
international standard bodies? ) I hope somebody will answer you.

> I am glad that most of the Tamil font and software developers are active in
> this discussion.  I am not a Tamil software developer and I have no vested
> interest in the outcome except as a Tamil software user.  I hope the group
> will not be offended with my attempt to focus our efforts by explicitly
> stating the design goals.  I have used the Unicode standard and the archive
> of the Tamil standard discussions that Dr. Kalyan kindly provided me.  I
> expect the participants of this discussion to augment and refine these
> statements.
> ------------------------------------------------------------------------------
> Tamil character encoding standard design goals:
> 1. Establish a consistent international Tamil character standard that can
> be  used by software publishers, Web publishers, newspaper adn book
> publishers,  bibliographic information services, and academic researchers.
> 2. Encode characters, not glyphs.

on Sep 17 AnbuArasan clearly explain, that the proposed encodings should
be based on basic characters  not glyphs, and he gave the reasons. At
this point we have to look at his mail seriously. can any one of you
repost that mail? 
> 3. Must be universal. The standard must be include all characters that are
> likely to be used in everyday Tamil text interchange.
> 4. Must be efficient. "Plain text, composed of a sequence of fixed-width
> characters, provides an extremely useful model because it is simple to
> parse: software does not have to maintain state, look for special escape
> sequences, or search forward or backward through text to identify
> characters." From Unicode 2.0 standard.  ( Note: This goal may be
> problematic for Tamil character standard. When we violate this standard, we
> must be very careful to consider alternatives and explicitly state the
> reasons, if any for vioalting this goal.)
> 5. Must be Uniform.  "A fixed character code allows efficient sorting,
> searching, display, and editing of text." From Unicode 2.0. (Note: Once
> again, Tamil standard violates this goal in some places by encoding some
> characters with a single byte, but other characters with a modifier.
> However, Unicode itself does exactly that for Indic languages. When
> considering exceptions, we can use Unicode as a useful guide for
> implementation.)
> 6. Must be Unambiguous.  Any 8-bit value should always represent the same
> character. (Note:  This pretty much kills ORNL.)
> 7. Must be usable and coexist with popular software until Unicode compliant
> software becomes available.
> 8. Must be Unicode compatible.  (i.e.) Shall not use characters that cannot
> be saved in Unicode format. (Note:  The TNC/DOE-India may require an ISCII
> compatibility as well. )
> 9. Must be in the public domain. The character encoding standard will have
> no restrictions on its use.  It can be used freely for both commercial and
> private purposes.  Enhancements, alterations and other changes to the
> standard will be done only by the standards body.
> ------------------------------------------------------------------------------
> I will include another extract from the Unicode standard that can guide us
> to focus on the task at hand and bring it to a reasonable conclusion:
> "The .. standard does not encode idiosyncratic, personal, novel, rarely
> exchanged, or private-use characters, nor does it encode logos or graphics.
>  Artificial entities, whose sole function is to serve transiently in the
> input of text, are excluded.  Graphologies unrelated to text, such as
> musical and dance notations, are outside the scope of the .. standard.
> Font variants are explicitly not encoded.  The .. standard includes a
> private use  area, which may be used to assign codes to characters not
> included."
> It is my belief that the current draft proposed standard and Dr. Selvaa's
> alternative are identical in essential details, except those that Dr.
> Selvaa  himself has noted.  I notice that there is some consensus around
> the proposed standard from such font creators as Dr. Kalyan, Muthu
> Nedumaran, Dr. Srinivasan and Prof. Hart.  Prof. Schiffman had expressed a
> willingness to accept the draft proposed standard.  Mr. Ravi Paul has
> raised specific problems with the draft standard but seems to support the
> standard otherwise.  Prof. Naa. Govindasamy, a pioneer of the Tamil on the
> Internet effort and a member of the TNC has not commented on the draft
> proposed standard. 

Mani, I am not a prof. just address me as Govind. 
In fact I don’t want to participate in this discussion.. I preferred to
be a silent observer and a silent worker. since I was dragged in, I may 
give my comments in a few days.  

 I am eagerly waiting for his technical documentation of
> TamilNet Font's character encoding

There is a paper (INET96, Montreal, June 1996) lying in this site (
http://aua.am/aua/auacs/inet'96/a5/ ) you will get some kind of
technical explanation on  TamilNet font encodings

 and his comparison of TamilNet encoding
> with alternate encoding conventions.

Mani, I have not thought of that. If there is need, I may consider it. 

Naa Govindasamy
> Mani M. Manivannan
> Fremont, CA, USA.
> P.S.
> I am sorry for this very long post.  I hope this nudges everybody to focus
> on the goal and iron out differences using technical guidelines.  By
> training I am a technician not a writer.  I hope I haven't offended anybody
> with my writing style.  If I did, I apologize to all in advance.  And if
> anybody wants to correct my style, please, please send me personal e-mail.
> ________________________________________________
> Sponsors/Advertisers  needed -  please email bala@tamil.net
> Check out the tamil.net web site on <http://tamil.net>
> Postings to <webmasters@tamil.net>. To unsubscribe send
> the text - unsubscribe webmasters - to majordomo@tamil.net
> ________________________________________________


Sponsors/Advertisers  needed -  please email bala@tamil.net
Check out the tamil.net web site on <http://tamil.net>
Postings to <webmasters@tamil.net>. To unsubscribe send
the text - unsubscribe webmasters - to majordomo@tamil.net

Home | Main Index | Thread Index