![select1](https://cloud.githubusercontent.com/assets/7548378/11047114/cc867938-8728-11...) ![select2](https://cloud.githubusercontent.com/assets/7548378/11047115/ccb51d10-8728-11...) When I select one character, the <b>sel</b>ection count and the <b>col</b>umn number are correct, but when I extend the selection by one character, the <b>sel</b>ection increases by 4 not 1, although the <b>col</b>umn number only increases by the expected value of 1.
--- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/745
The column counts in glyphs, the selection counts in octets. You do not provide information on what the second entity is, so I'm assuming its code point 1d41a, which has a four octet encoding in UTF-8 which is what the buffer uses.
The manual should be updated to not describe the selection as "characters" since that only applys to ASCII.
--- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/745#issuecomment-155219543
Could you please show both _bytes_ and _characters_ counts in the status bar? It is essential for non-english texts. E.g. "line: 2 col: 3 sel: 48 chars: 24".
@andreysm whats a character?
One symbol. Let's just count how many positions they occupy, i.e. sum selected columns for each row.
So glyphs, the Scintilla editing component doesn't supply that to Geany, but if someone wrote the code to request glyph counts from Scintilla and added them up for multiple lines counting/not counting line ends and noting that tab characters count as multiple glyphs then it can be another display item for the status bar.
But it would probably be better to make a separate issue rather than hijack a three year old one.
@elextr if it's a matter of counting columns it should be doable as Scintilla gives the column info, it's then a matter of counting how many columns are selected.
However, I'm not sure it makes much sense to count this, as some "characters" take up more than one column -- the most obvious one being the tab character, which even takes a variable amount of columns, but there are others. Counting code points might be slightly better, or rather whatever Scintilla counts as "stops", e.g. "*the number of positions the caret can be at*" (and this should be fairly easy to count, although probably in a fairly expansive way). Or even the number of actually composited characters. Meh, displaying symbols is so complicated. My preference would probably go to counting the code points, regardless of their composition because that's the number of "items" stored in the file, and that's often more interesting to know in a programming context than actual rendered characters on screen; but all these informations (bytes, code points, columns, composited characters) are useful in some situations and not in others.
One fairly important information to take into account here is that Geany uses UTF-8 internally, but that does not have to be the file's encoding. This means that the byte count in Geany does not necessarily makes sense in the target encoding -- and suggests the current info is kind of irrelevant, yet is often useful as well.
if it's a matter of counting columns it should be doable as Scintilla gives the column info, it's then a matter of counting how many columns are selected.
Thats what I said :)
some "characters" take up more than one column -- the most obvious one being the tab character, which even takes a variable amount of columns
ditto :)
My preference would probably go to counting the code points, regardless of their composition because that's the number of "items" stored in the file
But we are talking about the selection, not anything in a file
but all these informations (bytes, code points, columns, composited characters) are useful in some situations and not in others.
Yep, which is the obvious problem with this sort of thing, too many possibilities, but for the selection I'm doubtful any of them are really useful for any common use-case.
Geany uses UTF-8 internally, but that does not have to be the file's encoding
or the encoding of whatever you paste the selection into, IIUC it can be re-encoded when pasted, particularly on Windows.
And counting code points is fine, but what does it give you? Don't forget those combining characters Europeans like to use for their accented characters, two code points for one glyph :)
But if @andreysm has a specific use-case that needs one or other count then it should be no trouble to accept a well written pull request from somebody to add another % code to the status bar, nobody needs to show it if they don't want to.
github-comments@lists.geany.org