there are still combinations of two code points that map to only one glyph (eg c̦

?!? what the ....
Wikipedia (emphases mine):

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character

Whether or not it is trully consistent depends on their interpretation of "character", because this section https://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters talks about "main characters" and "diacritical marks" combining to make what they call in earlier sections "abstract characters".

I personally believe it's a bad approach: not only because it makes it harder for computing industry, but it is in principle inconsistent with treatment of most, if not all characters. (ex: A made of 3 bars, B of 1 bar and 2 semi-circles or partial circles...), thus
(almost) any visible character can be regarded as a combination of some small, primitive "marks" (and historically probably evolved that way).

Not a perfect standard at all.


But my practical take-away is , still, that a character (and I mean a visible character, including example of c̦ ) on the screen is represented by 1 or, for "complex" characters, more bytes.
And the caret tries to step in between those bytes; sometimes ending up in "illegal positions" and not showing up.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.