[Github-comments] [geany/geany] Problem select char counting (#1599)

elextr notifications at xxxxx
Mon Sep 4 13:59:54 UTC 2017


As @codbrainz said, Geany keeps text in the buffer in UTF-8, not in ISO8859-1, the buffer in memory does not know it will be saved in any particular encoding, the user can change that at save time.

The definition of code points, the bytes in the UTF-8 encoding of them, and the resulting glyphs and their width is provided by the Unicode standard.  

For example you can construct say a `รค` as the single code point with hex value 0xe4 but two byte UTF-8 encoding 0xc3 0xa4.  It shows as one single wide glyph.

But can also be represented by the two code points `a` with code 0x61 and the same UTF-8 encoding followed by the diaeresis combining code point 0x308 and UTF-8 encoding 0xcc 0x88.  It shows the same glyph as above.

So the same "character" you ie the one glyph, can be one or two code points and two or three UTF-8 bytes.

So a selection can be counted in bytes, code points, or glyphs (which can be single or double width when you have Asian characters in the mix making "column" even more difficult).

Geany is a programmers editor, and most programming languages are still in ASCII, so the numbers that someone has contributed the code to count and show on the status bar are still for ASCII, that is their use-case even if it differs from yours.  

Since the status bar is configurable then, as I said above, if somebody contributes the code to count and display some other options they are likely to be accepted as alternative options.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/1599#issuecomment-326970313
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.geany.org/pipermail/github-comments/attachments/20170904/aa6953c1/attachment.html>


More information about the Github-comments mailing list