there are still combinations of two code points that map to only one glyph (eg c̦

?!? what the ....
Wikipedia (emphases mine):

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character

Whether or not it is trully consistent depends on their interpretation of "character", because this section https://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters talks about "main characters" and "diacritical marks" combining to make what they call in earlier sections "abstract characters".

I personally believe it's a bad approach: not only because it makes it harder for computing industry, but it is in principle inconsistent with treatment of most, if not all characters. (ex: A made of 3 bars, B of 1 bar and 2 semi-circles or partial circles...), thus
(almost) any visible character can be regarded as a combination of some small, primitive "marks" (and historically probably evolved that way).

Not a perfect standard at all.

But my practical take-away is , still, that a character (and I mean a visible character, including example of c̦ ) on the screen is represented by 1 or, for "complex" characters, more bytes.
And the caret tries to step in between those bytes; sometimes ending up in "illegal positions" and not showing up.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.