[Geany-devel] Use of Scintilla word boundaries for word searches

Mon Aug 22 17:22:09 UTC 2011

Le 22/08/2011 19:16, Dimitar Zhekov a écrit :
> On Mon, 22 Aug 2011 14:43:35 +0200
> Colomban Wendling <lists.ban at herbesfolles.org> wrote:
> 
>>> Uhm, I mean for FIF grep decides about the word boundaries, which may be
>>> different to GEANY_WORDCHARS and everything discussed here, no?
>>
>> Yeah, once a new definition. Though this one is, according to the manual:
>>
>>> Word-constituent characters are letters, digits, and the underscore.
>>
>> And it doesn't include any non-ASCII characters in the algorithm, making
>> e.g. word search "hé" match "héhé" (second byte of the first "é" being
>> treat as a separator).
> 
> grep uses plain char and doesn't support UTF-8. But if your "héhé"
> fits in an 8-bit code page, and you have the proper LC_CTYPE set, it
> works. I checked this with cp1251 "боза" earlier when we discussed
> word finding (but was 99% sure it'll work).
> 
> echo '@@боза@@' | grep -w '@боза@' works too, but echo 'а@боза@@' does
> not, and neither does '9 at ...' or '_ at ...' . So it checks the characters
> before and after the match for not being isalnum() or underscore.

...but б@боза@@ will match (at least using UTF-8), because б is a UTF-8
two-bytes characters, thus nether bytes matches (isalnum() || '_') (they
are > 0x80, actually 0xb1 0xd0).