Le 22/08/2011 03:14, Lex Trotman a écrit :
[...]
As Colomban said it is too much work to make tagmanager and lexer use the same data, but IMHO if they disagree then its a bug, since programming languages clearly define what is what Dimitar called symbols.
Yes, if tagmanager and SCI lexer disagree on what is a "symbol", there is probably a bug. However as I already noted, I think SCI lexer don't need to be as precise as tagmanager's one since they probably don't care about words but for highlighting keywords.
The lexers care about words if they use them as the definition of symbols. At least the C/C++ lexer needs to know symbols to highlight typenames.
True.
[...]
If you allow the user to change the wordchars of Sinctilla (what the settings wordchars and whitespace_chars allows), then it's likely not to match tagmanager's one. And here comes the problem when looking for a tagmanager symbol from a Scintilla word.
So either we require the filetype to keep the wordchars to fit tagmanager's ones (and more or less what the language sees as a symbol), and thus don't allow what the user wanted to do in the first place (make "$" to be part of a Scintilla word for navigation and selection facilities).
I don't know PHP, but it seems like the $ is part of the symbol, so it should be part of the word. Users can always break things by editing config files so I wouldn't worry about them changing wordchars, so long as the comment in the config mentions it affects Scintilla only.
[...]
So then we have to ensure that the filetype wordchars lists are right, and that tagmanager and lexers have no obvious bugs. Thats all :-)
...and we don't support what the user wanted to do?
IIUC the user wanted it to be consistent, ie $ is part of the symbol.
If $ is part of the PHP filetype's wordchars it would be consistent if we used Scintilla word to select the usage search pattern as suggested?
But then the tagmaneger and Scintilla's definition of a PHP symbol would differ, since tagmanager don't include $ in the symbols. And then if we use the Scintilla definition of a word to do a tag search, it would fail if it included a $ :(
So yeah, for searching it is better (what my patch [1] tried to implement), but it wouldn't be for tag lookup.
PS I just realised that some languages eg Lua allow any Unicode letters in symbols, the Scintilla definition can't be right for them since it treats all characters >= 0x80 as wordchars and a quick look at the Lua lexer seems to show it also treats anything >0x80 as a symbolchar. But lua tagmanager uses any non-blank sequence before '=' or '('. No match!! Essentially can't win for some languages so just make the minimal change that doesn't break things.
Well... either fix one or the other, or assume it's not important. I don't know Lua but I'd be tempted to guess both definitions will actually be OK in a well-formed file (e.g. that the difference between both only includes invalid syntax). But of course I can be completely wrong, since I don't know Lua.
Cheers, Colomban
[1] applied as r5895