Re: [Geany-devel] Use of Scintilla word boundaries for word searches

22 Aug 2011


      Le 22/08/2011 19:16, Dimitar Zhekov a écrit :
...
On Mon, 22 Aug 2011 14:43:35 +0200
Colomban Wendling lists.ban@herbesfolles.org wrote:
...
...
Uhm, I mean for FIF grep decides about the word boundaries, which may be
different to GEANY_WORDCHARS and everything discussed here, no?
Yeah, once a new definition. Though this one is, according to the manual:
...
Word-constituent characters are letters, digits, and the underscore.
And it doesn't include any non-ASCII characters in the algorithm, making
e.g. word search "hé" match "héhé" (second byte of the first "é" being
treat as a separator).
grep uses plain char and doesn't support UTF-8. But if your "héhé"
fits in an 8-bit code page, and you have the proper LC_CTYPE set, it
works. I checked this with cp1251 "боза" earlier when we discussed
word finding (but was 99% sure it'll work).
echo '@@боза@@' | grep -w '@боза@' works too, but echo 'а@боза@@' does
not, and neither does '9@...' or '_@...' . So it checks the characters
before and after the match for not being isalnum() or underscore.
...but б@боза@@ will match (at least using UTF-8), because б is a UTF-8
two-bytes characters, thus nether bytes matches (isalnum() || '_') (they
are > 0x80, actually 0xb1 0xd0).

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [Geany-devel] Use of Scintilla word boundaries for word searches