[Geany-devel] Use of Scintilla word boundaries for word searches

Dimitar Zhekov dimitar.zhekov at xxxxx
Mon Aug 22 17:16:10 UTC 2011


On Mon, 22 Aug 2011 14:43:35 +0200
Colomban Wendling <lists.ban at herbesfolles.org> wrote:

> > Uhm, I mean for FIF grep decides about the word boundaries, which may be
> > different to GEANY_WORDCHARS and everything discussed here, no?
> 
> Yeah, once a new definition. Though this one is, according to the manual:
> 
> > Word-constituent characters are letters, digits, and the underscore.
> 
> And it doesn't include any non-ASCII characters in the algorithm, making
> e.g. word search "hé" match "héhé" (second byte of the first "é" being
> treat as a separator).

grep uses plain char and doesn't support UTF-8. But if your "héhé"
fits in an 8-bit code page, and you have the proper LC_CTYPE set, it
works. I checked this with cp1251 "боза" earlier when we discussed
word finding (but was 99% sure it'll work).

echo '@@боза@@' | grep -w '@боза@' works too, but echo 'а@боза@@' does
not, and neither does '9 at ...' or '_ at ...' . So it checks the characters
before and after the match for not being isalnum() or underscore.

-- 
E-gards: Jimmy



More information about the Devel mailing list