Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[WMASTERS] Design goals of 8-bit Tamil encoding standard


This week's sponsors -The Asia Pacific Internet Company (APIC)
  @  Nothing Less Than A Tamil Digital Renaissance Now   @
<http://www.apic.net> Click now<mailto:info@apic.net> for instant info

Since September we have been discussing the original draft proposed
standard of Dr. Kalyan that has been revised four times as a result of this
discussion.  Recently Dr. Selvaa proposed a serious alternative standard.
During the discussion of this draft standard, we have considered
linguistic, political, historical as well as emotional issues.  The
discussions have at times wandered into areas that have contributed little
to the standards process.  This is perhaps inevitable with a truly
multinational group very interested in the welfare of the Tamil language
and particularly since the design goals of the standard have not been
clearly spelled out.

Standards bodies have a tough task.  They should be conservative and desist
from any attempt to innovate, reform or otherwise tamper with an
established convention. At the same time they should anticipate the users
needs. Otherwise the standard risks getting rejected by the users of the
standard.  We will not be discussing a new standard if we are happy with
UNICODE or ISCII for our current needs.

I am glad that most of the Tamil font and software developers are active in
this discussion.  I am not a Tamil software developer and I have no vested
interest in the outcome except as a Tamil software user.  I hope the group
will not be offended with my attempt to focus our efforts by explicitly
stating the design goals.  I have used the Unicode standard and the archive
of the Tamil standard discussions that Dr. Kalyan kindly provided me.  I
expect the participants of this discussion to augment and refine these


Tamil character encoding standard design goals:

1. Establish a consistent international Tamil character standard that can
be  used by software publishers, Web publishers, newspaper adn book
publishers,  bibliographic information services, and academic researchers.

2. Encode characters, not glyphs.

3. Must be universal. The standard must be include all characters that are
likely to be used in everyday Tamil text interchange.

4. Must be efficient. "Plain text, composed of a sequence of fixed-width
characters, provides an extremely useful model because it is simple to
parse: software does not have to maintain state, look for special escape
sequences, or search forward or backward through text to identify
characters." From Unicode 2.0 standard.  ( Note: This goal may be
problematic for Tamil character standard. When we violate this standard, we
must be very careful to consider alternatives and explicitly state the
reasons, if any for vioalting this goal.)

5. Must be Uniform.  "A fixed character code allows efficient sorting,
searching, display, and editing of text." From Unicode 2.0. (Note: Once
again, Tamil standard violates this goal in some places by encoding some
characters with a single byte, but other characters with a modifier.
However, Unicode itself does exactly that for Indic languages. When
considering exceptions, we can use Unicode as a useful guide for

6. Must be Unambiguous.  Any 8-bit value should always represent the same
character. (Note:  This pretty much kills ORNL.)

7. Must be usable and coexist with popular software until Unicode compliant
software becomes available.

8. Must be Unicode compatible.  (i.e.) Shall not use characters that cannot
be saved in Unicode format. (Note:  The TNC/DOE-India may require an ISCII
compatibility as well. )

9. Must be in the public domain. The character encoding standard will have
no restrictions on its use.  It can be used freely for both commercial and
private purposes.  Enhancements, alterations and other changes to the
standard will be done only by the standards body.


I will include another extract from the Unicode standard that can guide us
to focus on the task at hand and bring it to a reasonable conclusion:

"The .. standard does not encode idiosyncratic, personal, novel, rarely
exchanged, or private-use characters, nor does it encode logos or graphics.
 Artificial entities, whose sole function is to serve transiently in the
input of text, are excluded.  Graphologies unrelated to text, such as
musical and dance notations, are outside the scope of the .. standard.
Font variants are explicitly not encoded.  The .. standard includes a
private use  area, which may be used to assign codes to characters not

It is my belief that the current draft proposed standard and Dr. Selvaa's
alternative are identical in essential details, except those that Dr.
Selvaa  himself has noted.  I notice that there is some consensus around
the proposed standard from such font creators as Dr. Kalyan, Muthu
Nedumaran, Dr. Srinivasan and Prof. Hart.  Prof. Schiffman had expressed a
willingness to accept the draft proposed standard.  Mr. Ravi Paul has
raised specific problems with the draft standard but seems to support the
standard otherwise.  Prof. Naa. Govindasamy, a pioneer of the Tamil on the
Internet effort and a member of the TNC has not commented on the draft
proposed standard.  I am eagerly waiting for his technical documentation of
TamilNet Font's character encoding and his comparison of TamilNet encoding
with alternate encoding conventions.

Mani M. Manivannan
Fremont, CA, USA.

I am sorry for this very long post.  I hope this nudges everybody to focus
on the goal and iron out differences using technical guidelines.  By
training I am a technician not a writer.  I hope I haven't offended anybody
with my writing style.  If I did, I apologize to all in advance.  And if
anybody wants to correct my style, please, please send me personal e-mail.


Sponsors/Advertisers  needed -  please email bala@tamil.net
Check out the tamil.net web site on <http://tamil.net>
Postings to <webmasters@tamil.net>. To unsubscribe send
the text - unsubscribe webmasters - to majordomo@tamil.net

Home | Main Index | Thread Index