Re: [Geany-devel] Use of Scintilla word boundaries for word searches

20 Aug 2011


      Le 20/08/2011 01:33, Lex Trotman a écrit :
...
On 20 August 2011 03:40, Dimitar Zhekov dimitar.zhekov@gmail.com wrote:
...
On Fri, 19 Aug 2011 18:10:42 +0200
Colomban Wendling lists.ban@herbesfolles.org wrote:
...
Hi,
I'm trying to address bug 3386129 [1], and I'd like comments & reviews
about my fix, because the whole thing don't look obvious at all...
We already have 2 ways of determining what a "word" is: a manual one
using GEANY_WORDCHARS or a caller-given list of wordchars, and one that
uses Scintilla's word boundaries.
3? Shoudn't we have symbolchars for the current programming language
([A-Za-z_] if unknown), and wordchars that match the current
locale? They don't have much in common.
By wordchars we mean symbolchars, this confusion has existed from the
beginnings of C at least, and we ain't gonna change it now. :-)
Locale/human language word ends are not as simple as sets of
characters so lets not go there, we would need something like IIUC to
do that.
Yeah, I didn't meant anything about locale characters.  However
Scintilla seems to "magically" handle them correctly... maybe because of
whitespacechars.
...
...
...
The former seems to make more sense when the caller code knows the kind
of characters it wants (e.g. tags lookups), but the latter is better
when getting the word to search for.
Shouldn't the tags be using the same definition of word chars as
Scintilla's highlighting?  I don't trust "knowing" stuff in two
places, they will never match :-) I understand that it might be a bit
of work to hack tagmanager into line though.
I'm afraid it'd be more than "a bit of work": most lexers won't work
correctly if the symbolchars are not the expected ones.  See the example
of the bug I refer in my first mail: the reporter wanted to add "$" in
the PHP wordchars for it to be selected as part of the variable.
choosing whether to have the "$" in the symbolchars is NOT possible in a
parser's POV.  So either it's not configurable or it's not used for parsing.
Scintilla lexers are a bit less concerned about this than tagmanager
since they generally don't mind much about the file content and only
takes care of a few control sequences (blocks, comments, etc.).  Plus of
course the keywords of course, those might use wordchars.
Actually the rational why we need to use Scintilla's representation of a
"word" when we do a search is that if we do a "full word" search and the
peeked "word" doesn't match the one Scintilla would have peek, the
search won't match the expected results.
...
[...]
...
There is always a SCI_SETWORDCHARS... Hmmm, we even use it to set the
sci wordchars to the filetype wordchars if we don't know the exact
lexer or something? Well, I guess it's really non-trivial.
We should be always setting Scintilla's wordchars from the filetype
file, although IIUC a few lexers think they know better and ignore
them.
Not completely sure what Scintilla uses wordchars for, but in
combination with "whitespace characters" it seems to uses it for
keyboard navigation [1].
...
...
...
So in the attached patch, I added a alternative way to get the the
current word (that uses the same algorithm as the word selection) and
tries to use it whenever the word was fetched for a search.
Makes sense to me. Though I'm not sure about that SCI_SETWORDCHARS we
use in highlighting:styleset_common().
Required, to make highlighting match word definitions (assuming lexer
cooperation).
[...] Nothing else suspicious, at least
...
from a first sight.
+1
Maybe everything should use the filetype wordchars definition, with
GEANY_WORDCHARS moved to filetypes.common as the default.
Well, although it looks sensible at first sight (and probably would be
in most situations), we have a few places where we tune wordchars for
special cases, like:
callbacks.c:988: search for a color representation, adds # to wordchars
callbacks.c:1618: search for a filename
However maybe these should use something specific rather than the contrary.
Cheers,
Colomban
[1] http://www.scintilla.org/ScintillaDoc.html#SCI_SETWORDCHARS

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [Geany-devel] Use of Scintilla word boundaries for word searches