I suspect that the terminology is taken from the original Windows edit control Scintilla originally emulated.

And that was probably designed well before multi-byte characters, hence the code page crap that is still in Scintilla.

It may seem unusual in this age of Unicode, but the world didn't suddenly wake up Unicode, it evolved to it, with many false steps along the way, each leaving its legacy scars on applications like Scintilla.

And even Unicode isn't perfect, even if you stored each code point in the same number of bytes (ie no UTF-8 or UTF-16 encoding) there are still combinations of two code points that map to only one glyph (eg c̦ which is two code points and if you copy it to Geany you can delete forward and it will remove the c, but not the cedilla and vice versa if you backspace and it takes two forward cursor movements to forward over it).

Geany always uses UTF-8 encoding in the buffer, so it only meets the weird world of other encodings at load or save time. But that does mean variable length code points and issues like the above that make screen positions and positions in the buffer hard to relate.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.