[Geany-devel] Use of Scintilla word boundaries for word searches

Lex Trotman elextr at xxxxx
Mon Aug 22 01:14:43 UTC 2011

> That's not (completely?) true however.  GLib has at least a part of the
> Unicode tables, and both Scintilla and Geany depends on Pango through
> GTK anyway.  However, I doubt Geany can do something in its side, and I
> doubt Scintilla wants to have a hard dependency on Pango for non-GTK
> platforms, nor to behave differently depending on the platform.


> So basically yes, 1 is unlikely to be really fixed.
>> As Colomban said it is too much work to make tagmanager and lexer use
>> the same data, but IMHO if they disagree then its a bug, since
>> programming languages clearly define what is what Dimitar called
>> symbols.
> Yes, if tagmanager and SCI lexer disagree on what is a "symbol", there
> is probably a bug.  However as I already noted, I think SCI lexer don't
> need to be as precise as tagmanager's one since they probably don't care
> about words but for highlighting keywords.

The lexers care about words if they use them as the definition of
symbols.  At least the C/C++ lexer needs to know symbols to highlight

> If you allow the user to change the wordchars of Sinctilla (what the
> settings wordchars and whitespace_chars allows), then it's likely not to
> match tagmanager's one.   And here comes the problem when looking for a
> tagmanager symbol from a Scintilla word.
> So either we require the filetype to keep the wordchars to fit
> tagmanager's ones (and more or less what the language sees as a symbol),
> and thus don't allow what the user wanted to do in the first place (make
> "$" to be part of a Scintilla word for navigation and selection facilities).

I don't know PHP, but it seems like the $ is part of the symbol, so it
should be part of the word.  Users can always break things by editing
config files so I wouldn't worry about them changing wordchars, so
long as the comment in the config mentions it affects Scintilla only.

Lots of confusion seems to be caused by the fact that wordchars are
being used as symbolchars...

>> So then we have to ensure that the filetype wordchars lists are right,
>> and that tagmanager and lexers have no obvious bugs. Thats all :-)
> ...and we don't support what the user wanted to do?

IIUC the user wanted it to be consistent, ie $ is part of the symbol.

If $ is part of the PHP filetype's wordchars it would be consistent if
we used Scintilla word to select the usage search pattern as


PS I just realised that some languages eg Lua allow any Unicode
letters in symbols, the Scintilla definition can't be right for them
since it treats all characters >= 0x80 as wordchars and a quick look
at the Lua lexer seems to show it also treats anything >0x80 as a
symbolchar.  But lua tagmanager uses any non-blank sequence before '='
or '('.  No match!!  Essentially can't win for some languages so just
make the minimal change that doesn't break things.

More information about the Devel mailing list