[Geany-devel] Use of Scintilla word boundaries for word searches

Colomban Wendling lists.ban at xxxxx
Sat Aug 20 14:41:46 UTC 2011

Le 20/08/2011 01:33, Lex Trotman a écrit :
> On 20 August 2011 03:40, Dimitar Zhekov <dimitar.zhekov at gmail.com> wrote:
>> On Fri, 19 Aug 2011 18:10:42 +0200
>> Colomban Wendling <lists.ban at herbesfolles.org> wrote:
>>> Hi,
>>> I'm trying to address bug 3386129 [1], and I'd like comments & reviews
>>> about my fix, because the whole thing don't look obvious at all...
>>> We already have 2 ways of determining what a "word" is: a manual one
>>> using GEANY_WORDCHARS or a caller-given list of wordchars, and one that
>>> uses Scintilla's word boundaries.
>> 3? Shoudn't we have symbolchars for the current programming language
>> ([A-Za-z_] if unknown), and wordchars that match the current
>> locale? They don't have much in common.
> By wordchars we mean symbolchars, this confusion has existed from the
> beginnings of C at least, and we ain't gonna change it now. :-)
> Locale/human language word ends are not as simple as sets of
> characters so lets not go there, we would need something like IIUC to
> do that.

Yeah, I didn't meant anything about locale characters.  However
Scintilla seems to "magically" handle them correctly... maybe because of

>>> The former seems to make more sense when the caller code knows the kind
>>> of characters it wants (e.g. tags lookups), but the latter is better
>>> when getting the word to search for.
> Shouldn't the tags be using the same definition of word chars as
> Scintilla's highlighting?  I don't trust "knowing" stuff in two
> places, they will never match :-) I understand that it might be a bit
> of work to hack tagmanager into line though.

I'm afraid it'd be more than "a bit of work": most lexers won't work
correctly if the symbolchars are not the expected ones.  See the example
of the bug I refer in my first mail: the reporter wanted to add "$" in
the PHP wordchars for it to be selected as part of the variable.
choosing whether to have the "$" in the symbolchars is NOT possible in a
parser's POV.  So either it's not configurable or it's not used for parsing.

Scintilla lexers are a bit less concerned about this than tagmanager
since they generally don't mind much about the file content and only
takes care of a few control sequences (blocks, comments, etc.).  Plus of
course the keywords of course, those might use wordchars.

Actually the rational why we need to use Scintilla's representation of a
"word" when we do a search is that if we do a "full word" search and the
peeked "word" doesn't match the one Scintilla would have peek, the
search won't match the expected results.

> [...]
>> There is always a SCI_SETWORDCHARS... Hmmm, we even use it to set the
>> sci wordchars to the filetype wordchars if we don't know the exact
>> lexer or something? Well, I guess it's really non-trivial.
> We should be always setting Scintilla's wordchars from the filetype
> file, although IIUC a few lexers think they know better and ignore
> them.

Not completely sure what Scintilla uses wordchars for, but in
combination with "whitespace characters" it seems to uses it for
keyboard navigation [1].

>>> So in the attached patch, I added a alternative way to get the the
>>> current word (that uses the same algorithm as the word selection) and
>>> tries to use it whenever the word was fetched for a search.
>> Makes sense to me. Though I'm not sure about that SCI_SETWORDCHARS we
>> use in highlighting:styleset_common().
> Required, to make highlighting match word definitions (assuming lexer
> cooperation).
> [...] Nothing else suspicious, at least
>> from a first sight.
> +1
> Maybe everything should use the filetype wordchars definition, with
> GEANY_WORDCHARS moved to filetypes.common as the default.

Well, although it looks sensible at first sight (and probably would be
in most situations), we have a few places where we tune wordchars for
special cases, like:

callbacks.c:988: search for a color representation, adds # to wordchars
callbacks.c:1618: search for a filename

However maybe these should use something specific rather than the contrary.


[1] http://www.scintilla.org/ScintillaDoc.html#SCI_SETWORDCHARS

More information about the Devel mailing list