Re: [WMASTERS] Re: glyph choices for char.encoding -version 1.2

To: Muthu Nedumaran <MNEDUMAR@sg.oracle.com>, webmasters@tamil.net
Subject: Re: [WMASTERS] Re: glyph choices for char.encoding -version 1.2
From: Nagarajan Chinnasamy <nagu@ncms1.cb.lucent.com>
Date: Thu, 11 Sep 1997 09:45:16 -0400
Cc: kalyan@igcsun3.epfl.ch, tamilnet@tamilnews.org.sg, muthu@murasu.com
Content-Length: 9811
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Organization: Wipro Infotech Group, Bangalore, India
Original-CC: kalyan@igcsun3.epfl.ch, tamilnet@tamilnews.org.sg, muthu@murasu.com
References: <199709110925.RAA25451@sgdcn.sg.oracle.com>
Reply-To: Nagarajan Chinnasamy <nagu@ncms1.cb.lucent.com>
Sender: owner-tamilnet@irdu.nus.sg

Dear Muthu,

Was really waiting for this mail :-)

Please read further...


Muthu Nedumaran wrote:
> 
> >> Also, I see no problems in starting the first character
> >> (kaal) at slot 128. Any objections ? (and why please ;-))
> 
> >In case you have missed, Dr. Srinivasan wrote this monday
> >specifically the following:
> 
> Thanks for including this Kalyan - yes, I missed this.  Must
> have come in while I switched mailers during my last travel.
> 
> >"The positions 128, 129, 130, 141, 142, 143, 144, 157, 158,
> >222, 234 do not display in either MS WORD or in COREL WORD
> >PERFECT. That covers about 90% of the MS Windows word
> >processor market. That is why I left those few places
> >empty in the 8 Bit Roman-Tamil bilingual ADHAWIN.TTF.
> >Of course, we can standardize some other layout like
> >ISO-8859-X, ignore the 90% of existing softwares and
> >develop "our own softwares"  :-)  "
> >Ravi Paul also expressed similar reservations on putting
> >key characters in the slots 128-159. These two have
> >extensive track experience writing softwares for PCs
> >and for Windows. So I wanted to implement their advice.
> >So, I masked off these slots in the version 1.2 and placed
> >only unused tamil characters ngu, nguu, nyu and nyuu.
> >Probably in Anjal you use all slots 128-255.
> >Are you asking for opinions from others for clarifications?
> 
> Yes, I'd like to hear more on this as I have not seen problems
> with these slots with Word or Word Perfect.  Also, could
> Dr. Srinivasan and/or Ravi comment on why you think this is
> happening ? It may take us to the root of the problem.  I
> have experienced problems with slot 160 - which I do not
> understand why to date.  I also see other character sets
> avoiding 160 (almost all the time).
> 
> >ii) I am not sure which is the best way to decide on
> >to have or not to have a certain sets of glyphs - be
> >it diacritical markers or old style or grantha ones:
> >No objections were raised for tamil numerals and so
> >the case was clear.
> 
> I'm not sure if this is how it sould be concluded.  IMHO
> the decision should be based on logical diagnosis on the
> implications. These take up space so it has an effect on
> the encoding at large. If it does, we sould identify and
> work on solutions or workarounds. I took it that we place
> low priority on this (am I right ?)


In my opinion we can leave Tamil Nuermals. Its something
99.99 % out of Tamil. Even if its needed in some special
cases (assuming that the user will use some GUI systems),
it can be done using different "fonts".

Also because, it does not look good to "encode" two number
system in an "ecnoding standard". One number can have only
one code. How does it matter for calculations whether its
Tamil number or Roman number.

We have the code already. Answer for "Whether it will be rendered as
a Tamil Number or Roman Number", we will leave it to the
Tamil community. And the answer is "Trivial".

About diacritical marks...please read further...


> 
> >For grantha ones, amongst those who cared to let know
> >their views, majority are for keeping them.
> 
> Yeap, agreed, and IMHO it's rather incomplete without them.


Personally I too support the "Thani Thamiz" Concept. Because
as Manivannan put it, Tamil has the grammar to pronounce them.
Just that we have forgotten the rules of the game. We even
started thinking that those rules are wrong. We agreed that
we will pronounce "ka" as "ha" "ga" "ka" "kha". Just that
after looking at "sh", "bushes" etc.. some people got the
real "Moham" on them.

Atleast to keep Sanskritization(!) to the minimum we can easily
leave out "Ksha" and "Sri" without any problem. And Yes it should
be done by Tamil Nadu Government as step towards "Cleaning/Recovering"
Tamil.
 

> 
> >For the old-style tamil characters, the best approach
> >appears to be assigning them the lowest priority.
> >For the diacritical markers, I listed several reasons
> >why it is useful to have them. You have not questioned
> >any of these nor gave any specific objections as to
> >why they should not be there. No one else made any
> >statements one way or other.
> 
> I think I have said it enough several times. You may
> want to check my last posting.


Me too have written enough on diacritical marks.
I vote against them.


> 
> >I tend to agree with your preferences to have transl.
> >tamil using plain/lower ASCII roman without diacritics.
> >But I am not sure what the majority opinion is.
> 
> Let's hope we hear them too - not just a *vote* of yes
> or no - but with some supporting notes :-)



I really dont know any technical points to SELECT one
from the given choices. But I am SURE it IS a TRANSLITERATION
issue NOT an ENCODING issue.



> 
> >It would be better if we debate specifically
> >this point, viz standards for transliterated form
> >of writing tamil before we go further. Vasu
> >Ranganathan raised this issue (and also Sujatha)
> >earlier and listed several questions to be answered.
> >
> >i) Should be go for transliteration schemes that are
> >    based on plain ASCII without diacritics or
> >    adopt a scheme with diacritics ?
> >ii) what should be the actual scheme under either of
> >   the above two possibilities?
> >
> >If we decide to go for plain ASCII without diacritics,
> >then there is no need to keep these markers in the
> >character encoding scheme.
> >If we agree for keeping transliteration scheme with
> >diacritics as the standard for translit. tamil, then
> >we cannot have a second font just for handling these.
> >I would like to hear from you and others, why not have
> >such standards in solid grounds with specific encoding
> >in a single font.  I am not in favour of leaving
> >it as a playground for software developers to choose the
> >way they want to treat it. Having a code assigned
> >for a marker makes its standardised.
> >I am sure Profs. Hart, Schiffman, Vasu and many others
> >on this net have something to say on standards for
> >transliterated tamil. I request them to post their
> >views.
> 
> I think we *can* take the issue of transliteration out of
> the encoding (refere my points at the end).


Me too.


> 
> >> Having diac. marks *encoded* suggests that we
> >> can store text in both formats - right ?  Is this OK ?
> >> Something's not quite right here - right ?
> >> Comments please......
> >Let me explain what I think. May be others can comment.
> >Having diac. marks encoded means that, using a single
> >font, tamil texts can be entered in either format (
> >in tamil script OR in transliterated format).
> >
> >In self-standing fonts with direct output features,
> >the format will be decided already by the input
> >process. I can also open up any of the thousands
> >of archived materials in either format using the
> >same std. tamil font and read them. All tamil related
> >materials are handled by one single font. Period.
> 
> Kalyan, you only talked about *viewing* Tamil text.
> How would one implement a search function in a database
> of documents that has both format text ?  Will the
> search need to be done twice ?


Yes the crux of the problem is we are trying to have TWO
ENCODING SCHEMES for ONE LANGUAGE in ONE STANDARD.


> 
> >Of course in specialised DTP softwares with convertor
> >routines incorporated, additional options are possible
> >to store in either format - store in tamil script
> >even if the input is in transliterated format.
> >The latter is no different from the romanized
> >input method already accepted as a standard
> >inputing process.
> 
> No, I'm not talking about converter routines - they are
> plain and simple and we know how those work.  I'm suggesting
> that we store Tamil text in *only* one format - whatever the
> transliteration scheme we adopt - and have the glyph substitution
> implemented in the font.  This is in line with what the UNICODE
> folks thought about and what OpenType font specification is
> all about.  We are encoding characters - we are *not* designing
> a font.
> 
> With this, we will have the following :
> 
> 1. Ability to render Tamil text in it's current form (i.e. without
>    old style nai etc) with any one-on-one mapped font.  In other
>    words, anyone will be able to develop a TTF or Type1 (or in
>    one of the zillion formats around) that maps every character to
>    a glyph.  If we accept in include the grantha meys as single
>    characters, we *can* implement this on character terminals,
>    POS systems, and display boards as well !


I would love to see Tamil in Display Boards !!!! 
In a Washing machine(all in Tamil):

   "Nalvaravu...muthalil muunru 25 paisa naanayangalai thulaikalil
    ittu pin arukil ulla piththaanai amukkavum...."

aakaa enna arumai. Paarathi paarthaal...!!!



> 
> 2. Ability to render Tamil text in old-style (nai etc) with the
>    glyph substitution technology that is *embedded* in OpenType
>    fonts.  (no programming reqd).
>    Note that the old style chars need *not* be encoded.


A GOLDEN Note!!!


>  The number of glyphs in a (OpenType)
>    font need not match the number of encoded characters. I believe
>    this is also true for True Type - except that the later does
>    not provide for glyph substitution.
> 
> 3. Ability to render Tamil text in transliterated form using
>    either :
>         a. Conversion routines that trans-es the text on the fly.
>        - useful for both graphical and non-graphical (dumb) teminals
>             as well.
>        - needs coding
>      b. Using glyph substitution (as in (2)), substituting
>         roman (diac-ed or otherwise) glyphs for tamil character
>         sequences.
>        - no coding reqd. (I'll need to verify this).
> 
> Comments please....
> 
> anbudan,
> 
> ~ MUTHU
> 

Not familiar with OpenType. So, no comments on that.

anbudan,
nagu.

Follow-Ups:
- Re: [WMASTERS] Re: glyph choices for char.encoding -version 1.2
  - From: "Harold F. Schiffman" <haroldfs@ccat.sas.upenn.edu>
- Re: glyph choices for char.encoding -version 1.2
  - From: "Dr.K. Kalyanasundaram" <kalyan@igcsun3.epfl.ch>

References:
- Re: glyph choices for char.encoding -version 1.2
  - From: "Muthu Nedumaran" <MNEDUMAR@sg.oracle.com>

Prev by Date: Re: [WMASTERS] Response to Kumar
Next by Date: Re: glyph choices for char.encoding -version 1.2
Prev by thread: Re: glyph choices for char.encoding -version 1.2
Next by thread: Re: [WMASTERS] Re: glyph choices for char.encoding -version 1.2
Index(es):
- Date
- Thread

Home | Main Index | Thread Index