Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 7-bit and Grantha chars.



Dear George,

>It also seems to me that HTML is evolving in such a way that the font face
>ommand will soon be universal.  Netscape mail (but not yet Eudora)
>recognize HTML.  Soon, therefore, there will be no need for a font that
>combines Roman (lower 128) and Tamil (upper 128).

We discussed this to a great extend during the first phase of this debate.
HTML *is* but just one element of Tamil text usage.  You are right about 
font face tag becoming universal - in fact I understand it'll be part of
the 3.x standard.

However, are we just going to limit Tamil to *viewing* (or reading) ? 
Encoding Tamil in the 7-bit space *overloads* the *value* of the codes.
i.e. when someone sees a 65 as is - we will not know if it's an A or
some Tamil character.  Can you imagine the implications this will have
on CGI's and other non-web related text manipulation tasks ?

I think we should do away with overloading the 7bit space.  In fact, putting
Tamil in the 8bit space is *also* overloading - as most of the other non-english
languages are doing that.   But the impact is minimal.

The *real* answer is UNICODE - which will take some more time before it
becomes universally present.  The efforts on having an 8-bit encoded Tamil
character set is to fill in the gap until we reach there.

>Here, then, is my suggestion:  Let's settle on a standard for both a 7-bit
>and 8-bit font.  AND let's make the two compatible -- that is, positions of
>the 8-bit font are the same as the positions of the 7-bit one, except with
>the first bit set to 1 instead of 0 (i.e. the 7-bit position + 128).  This
>will make conversion extremely simple -- and it will mean that we can
>easily create sorting algorithms etc. that work on both fonts.  George Hart

Again, I do not think it's a good idea to have two standards - 7bit and
8bit.  We will not be able to *integrate* the archives around the world
into a single client application (like a browser etc) that *manipulates*
Tamil text (like search, match, sort, etc....).  When we move towards an
integrated standard, we should do away with *conversions* ultimately -
the only one that'll probably exist for a while will be the one that takes
to and fro the new std and Unicode.

We also understand that the 7bit space reserves 32 slots for control chars.
Though it is replicated in 8bit, it has become fairly easy to transmit 8bit
characters (even those that map the control seqs) over the wire world wide.

anbudan,

~ MUTHU
 ------------------------------- End of Message --------------------------------


---- Begin included message ----
I would like to keep all the normally used grantha characters, including kS
and sri.  This is not because of any political agenda, only because I think
many people will like to use those characters.  They have become a pretty
standard part of Tamil.

With regard to upper or lower 128, I think we must differentiate between a
font meant for the Web and one for use at home with word processors.  We
have a font we have used extensively (it's available in public domain --
tamlasr) which uses only the lower 128.  It works extremely well with word
processors etc.  In fact, I've written a Mac program that takes text in
Roman transliteration copied to the clipboard, converts it, and puts the
converted text on the clipboard.  It will selectively change only bold
text, and put the text into proper fonts.  With this sort of tool, there is
absolutely no need to use the same font for Tamil and Roman.

It also seems to me that HTML is evolving in such a way that the font face
command will soon be universal.  Netscape mail (but not yet Eudora)
recognize HTML.  Soon, therefore, there will be no need for a font that
combines Roman (lower 128) and Tamil (upper 128).

Here, then, is my suggestion:  Let's settle on a standard for both a 7-bit
and 8-bit font.  AND let's make the two compatible -- that is, positions of
the 8-bit font are the same as the positions of the 7-bit one, except with
the first bit set to 1 instead of 0 (i.e. the 7-bit position + 128).  This
will make conversion extremely simple -- and it will mean that we can
easily create sorting algorithms etc. that work on both fonts.  George Hart


---- End included message ----

Home | Main Index | Thread Index