[Geany-devel] Use of Scintilla word boundaries for word searches
Colomban Wendling
lists.ban at xxxxx
Sat Aug 20 14:41:46 UTC 2011
Le 20/08/2011 01:33, Lex Trotman a écrit :
> On 20 August 2011 03:40, Dimitar Zhekov <dimitar.zhekov at gmail.com> wrote:
>> On Fri, 19 Aug 2011 18:10:42 +0200
>> Colomban Wendling <lists.ban at herbesfolles.org> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to address bug 3386129 [1], and I'd like comments & reviews
>>> about my fix, because the whole thing don't look obvious at all...
>>>
>>> We already have 2 ways of determining what a "word" is: a manual one
>>> using GEANY_WORDCHARS or a caller-given list of wordchars, and one that
>>> uses Scintilla's word boundaries.
>>
>> 3? Shoudn't we have symbolchars for the current programming language
>> ([A-Za-z_] if unknown), and wordchars that match the current
>> locale? They don't have much in common.
>
> By wordchars we mean symbolchars, this confusion has existed from the
> beginnings of C at least, and we ain't gonna change it now. :-)
>
> Locale/human language word ends are not as simple as sets of
> characters so lets not go there, we would need something like IIUC to
> do that.
Yeah, I didn't meant anything about locale characters. However
Scintilla seems to "magically" handle them correctly... maybe because of
whitespacechars.
>>> The former seems to make more sense when the caller code knows the kind
>>> of characters it wants (e.g. tags lookups), but the latter is better
>>> when getting the word to search for.
>
> Shouldn't the tags be using the same definition of word chars as
> Scintilla's highlighting? I don't trust "knowing" stuff in two
> places, they will never match :-) I understand that it might be a bit
> of work to hack tagmanager into line though.
I'm afraid it'd be more than "a bit of work": most lexers won't work
correctly if the symbolchars are not the expected ones. See the example
of the bug I refer in my first mail: the reporter wanted to add "$" in
the PHP wordchars for it to be selected as part of the variable.
choosing whether to have the "$" in the symbolchars is NOT possible in a
parser's POV. So either it's not configurable or it's not used for parsing.
Scintilla lexers are a bit less concerned about this than tagmanager
since they generally don't mind much about the file content and only
takes care of a few control sequences (blocks, comments, etc.). Plus of
course the keywords of course, those might use wordchars.
Actually the rational why we need to use Scintilla's representation of a
"word" when we do a search is that if we do a "full word" search and the
peeked "word" doesn't match the one Scintilla would have peek, the
search won't match the expected results.
> [...]
>> There is always a SCI_SETWORDCHARS... Hmmm, we even use it to set the
>> sci wordchars to the filetype wordchars if we don't know the exact
>> lexer or something? Well, I guess it's really non-trivial.
>
> We should be always setting Scintilla's wordchars from the filetype
> file, although IIUC a few lexers think they know better and ignore
> them.
Not completely sure what Scintilla uses wordchars for, but in
combination with "whitespace characters" it seems to uses it for
keyboard navigation [1].
>>> So in the attached patch, I added a alternative way to get the the
>>> current word (that uses the same algorithm as the word selection) and
>>> tries to use it whenever the word was fetched for a search.
>>
>> Makes sense to me. Though I'm not sure about that SCI_SETWORDCHARS we
>> use in highlighting:styleset_common().
>>
>
> Required, to make highlighting match word definitions (assuming lexer
> cooperation).
>
> [...] Nothing else suspicious, at least
>> from a first sight.
>>
>
> +1
>
> Maybe everything should use the filetype wordchars definition, with
> GEANY_WORDCHARS moved to filetypes.common as the default.
Well, although it looks sensible at first sight (and probably would be
in most situations), we have a few places where we tune wordchars for
special cases, like:
callbacks.c:988: search for a color representation, adds # to wordchars
callbacks.c:1618: search for a filename
However maybe these should use something specific rather than the contrary.
Cheers,
Colomban
[1] http://www.scintilla.org/ScintillaDoc.html#SCI_SETWORDCHARS
More information about the Devel
mailing list