[Geany-devel] Use of Scintilla word boundaries for word searches

Colomban Wendling lists.ban at xxxxx
Mon Aug 22 12:34:54 UTC 2011


Le 22/08/2011 03:14, Lex Trotman a écrit :
> [...]
>>> As Colomban said it is too much work to make tagmanager and lexer use
>>> the same data, but IMHO if they disagree then its a bug, since
>>> programming languages clearly define what is what Dimitar called
>>> symbols.
>>
>> Yes, if tagmanager and SCI lexer disagree on what is a "symbol", there
>> is probably a bug.  However as I already noted, I think SCI lexer don't
>> need to be as precise as tagmanager's one since they probably don't care
>> about words but for highlighting keywords.
> 
> The lexers care about words if they use them as the definition of
> symbols.  At least the C/C++ lexer needs to know symbols to highlight
> typenames.

True.

> [...]
>>
>> If you allow the user to change the wordchars of Sinctilla (what the
>> settings wordchars and whitespace_chars allows), then it's likely not to
>> match tagmanager's one.   And here comes the problem when looking for a
>> tagmanager symbol from a Scintilla word.
>>
>> So either we require the filetype to keep the wordchars to fit
>> tagmanager's ones (and more or less what the language sees as a symbol),
>> and thus don't allow what the user wanted to do in the first place (make
>> "$" to be part of a Scintilla word for navigation and selection facilities).
> 
> I don't know PHP, but it seems like the $ is part of the symbol, so it
> should be part of the word.  Users can always break things by editing
> config files so I wouldn't worry about them changing wordchars, so
> long as the comment in the config mentions it affects Scintilla only.
> 
> [...]
>
>>> So then we have to ensure that the filetype wordchars lists are right,
>>> and that tagmanager and lexers have no obvious bugs. Thats all :-)
>>
>> ...and we don't support what the user wanted to do?
> 
> IIUC the user wanted it to be consistent, ie $ is part of the symbol.
> 
> If $ is part of the PHP filetype's wordchars it would be consistent if
> we used Scintilla word to select the usage search pattern as
> suggested?

But then the tagmaneger and Scintilla's definition of a PHP symbol would
differ, since tagmanager don't include $ in the symbols.  And then if we
use the Scintilla definition of a word to do a tag search, it would fail
if it included a $ :(

So yeah, for searching it is better (what my patch [1] tried to
implement), but it wouldn't be for tag lookup.


> PS I just realised that some languages eg Lua allow any Unicode
> letters in symbols, the Scintilla definition can't be right for them
> since it treats all characters >= 0x80 as wordchars and a quick look
> at the Lua lexer seems to show it also treats anything >0x80 as a
> symbolchar.  But lua tagmanager uses any non-blank sequence before '='
> or '('.  No match!!  Essentially can't win for some languages so just
> make the minimal change that doesn't break things.

Well... either fix one or the other, or assume it's not important.  I
don't know Lua but I'd be tempted to guess both definitions will
actually be OK in a well-formed file (e.g. that the difference between
both only includes invalid syntax).
But of course I can be completely wrong, since I don't know Lua.


Cheers,
Colomban


[1] applied as r5895



More information about the Devel mailing list