On 14 November 2013 19:07, janc@janc.es <janc@janc.es> wrote:
Hi! friends.

I found ehwn selecting chars in a linux UTF-8 text, in status bar it
count double if it is an extended char, I mean out of ASCII table.

Actually its counting octets in the underlying UTF-8 encoding that the buffer uses, so it could count as high as four for a specific code point. Or possibly higher when a glyph is made of two or more combining characters.
 

I was using that selection to format the output of script.

I think it should be the number of 'text chars' not bytes.

The difficulty is, as I alluded to above, what is a "text char"?  Depending on the use-case it could be the octets, the Unicode code points or the glyphs shown on the screen.

Octets is the information returned from the GUI editing component, that is why its what is shown.

It would be technically possible to scan the selection and count the Unicode code points in it, but it would have performance implications if the selection is large, for example if the user selected the whole document.

There is currently no way of knowing how many glyphs the GUI component used to display a sequence of octets, so the counted code points may not match what you see on the screen anyway.

So I don't think its worth changing, the option available, scanning the selection to count code points each time its changed is potentially slow and may not do what you expect in any case.

Cheers
Lex

 

What you think about?

Cheers.

--
Jose Angel Navarro Cortes
email: janc@janc.es
web: http://janc.es/
Usuario Linux: #49178

_______________________________________________
Users mailing list
Users@lists.geany.org
https://lists.geany.org/cgi-bin/mailman/listinfo/users