Le 22/08/2011 19:16, Dimitar Zhekov a écrit :
On Mon, 22 Aug 2011 14:43:35 +0200 Colomban Wendling lists.ban@herbesfolles.org wrote:
Uhm, I mean for FIF grep decides about the word boundaries, which may be different to GEANY_WORDCHARS and everything discussed here, no?
Yeah, once a new definition. Though this one is, according to the manual:
Word-constituent characters are letters, digits, and the underscore.
And it doesn't include any non-ASCII characters in the algorithm, making e.g. word search "hé" match "héhé" (second byte of the first "é" being treat as a separator).
grep uses plain char and doesn't support UTF-8. But if your "héhé" fits in an 8-bit code page, and you have the proper LC_CTYPE set, it works. I checked this with cp1251 "боза" earlier when we discussed word finding (but was 99% sure it'll work).
echo '@@боза@@' | grep -w '@боза@' works too, but echo 'а@боза@@' does not, and neither does '9@...' or '_@...' . So it checks the characters before and after the match for not being isalnum() or underscore.
...but б@боза@@ will match (at least using UTF-8), because б is a UTF-8 two-bytes characters, thus nether bytes matches (isalnum() || '_') (they are > 0x80, actually 0xb1 0xd0).