Hello, I Use french font caracteres, my WM locale is defined like fr_FR.utf8.
When selecting characters on "none" encoded document or ISO8859-1 the accented characters counts for two byte instead of on.
exemple: create a document type "é" caracter and select it
![screenshot_20170904_105345](https://user-images.githubusercontent.com/12949168/30019317-66f45b6e-9160-11...)
Yes, UTF-8 enocdes characters outside the ASCII set as two three or four bytes, so your Latin-1 characters are encoded as two bytes.
On 2017-09-04 02:01 AM, lep42 wrote:
Hello, I Use french font caracteres, my WM locale is defined like fr_FR.utf8.
When selecting characters on "none" encoded document or ISO8859-1 the accented characters counts for two byte instead of on.
exemple: create a document type "é" caracter and select it
Hi,
Even though the character can be represented in one byte (in UTF-8 or ISO 8859-1), Geany converts to UTF-8 for in-memory representation, and then it must do some processing (ex. normalization) causing it to get split out into the standalone e and the combining diacritical mark ́. This is my guess.
That still doesn't explain why Geany shows 2 characters selected, the reason for that is likely that Geany[0] uses very naive byte-based code for the selection count rather than the number of glyphs selected.
Regards, Matthew Brush
[0]: and/or Scintilla, the widget which provides the editor buffer/manipulations/information.
I know about encoding characters UTF8,but it still annoying. even so the column count is correct
Closed #1599.
Reopened #1599.
Oupsss I close the issue by mistake
@codebrainz yeah, Geany shows the scintilla selection count, which is the position (ie byte) difference between the anchor and the cursor.
@lep42 the "column" count is actually a glyph count, so it may not agree with the code point count for double wide Unicode characters or for combining characters where two code points makes one glyph, eg u .
Unless you use purely ASCII
The thing is when you convert a text to an ISO8859-1, the selection count, should be the number of glyph selected displayed, by default I think it's most of use case !!!!, But I'm all right whith you with an extra options.
As @codbrainz said, Geany keeps text in the buffer in UTF-8, not in ISO8859-1, the buffer in memory does not know it will be saved in any particular encoding, the user can change that at save time.
The definition of code points, the bytes in the UTF-8 encoding of them, and the resulting glyphs and their width is provided by the Unicode standard.
For example you can construct say a `ä` as the single code point with hex value 0xe4 but two byte UTF-8 encoding 0xc3 0xa4. It shows as one single wide glyph.
But can also be represented by the two code points `a` with code 0x61 and the same UTF-8 encoding followed by the diaeresis combining code point 0x308 and UTF-8 encoding 0xcc 0x88. It shows the same glyph as above.
So the same "character" you ie the one glyph, can be one or two code points and two or three UTF-8 bytes.
So a selection can be counted in bytes, code points, or glyphs (which can be single or double width when you have Asian characters in the mix making "column" even more difficult).
Geany is a programmers editor, and most programming languages are still in ASCII, so the numbers that someone has contributed the code to count and show on the status bar are still for ASCII, that is their use-case even if it differs from yours.
Since the status bar is configurable then, as I said above, if somebody contributes the code to count and display some other options they are likely to be accepted as alternative options.
@lep42, this looks like a duplicate of #745.
If so, it might be closed. But make sure to upvote the original ticket to show the demand:
1. Go to #745 2. Click the smiley button (top right) and 3. Click 👍
@jesus2099 stop giving bad advice, this project does not run on the number of +1s
Understood. It's just a thing I use in GitHub, usually. :) Can come in handy in some projects.
Since Geany does not have a written set of rules for github usage it can be difficult for new contributors. Geany is a totally volunteer project, people do what they _want_ to do if they have time and ability. There isn't a cadre of paid programmers who are looking for the next thing to do, so most statistical things that are useful on corporate sponsored projects are unused here.
And having lots of people adding +1 to things has in the past annoyed people more than encouraged them.
github-comments@lists.geany.org