[...]
That's not (completely?) true however. GLib has at least a part of the Unicode tables, and both Scintilla and Geany depends on Pango through GTK anyway. However, I doubt Geany can do something in its side, and I doubt Scintilla wants to have a hard dependency on Pango for non-GTK platforms, nor to behave differently depending on the platform.
True.
So basically yes, 1 is unlikely to be really fixed.
[...]
As Colomban said it is too much work to make tagmanager and lexer use the same data, but IMHO if they disagree then its a bug, since programming languages clearly define what is what Dimitar called symbols.
Yes, if tagmanager and SCI lexer disagree on what is a "symbol", there is probably a bug. However as I already noted, I think SCI lexer don't need to be as precise as tagmanager's one since they probably don't care about words but for highlighting keywords.
The lexers care about words if they use them as the definition of symbols. At least the C/C++ lexer needs to know symbols to highlight typenames.
[...]
If you allow the user to change the wordchars of Sinctilla (what the settings wordchars and whitespace_chars allows), then it's likely not to match tagmanager's one. And here comes the problem when looking for a tagmanager symbol from a Scintilla word.
So either we require the filetype to keep the wordchars to fit tagmanager's ones (and more or less what the language sees as a symbol), and thus don't allow what the user wanted to do in the first place (make "$" to be part of a Scintilla word for navigation and selection facilities).
I don't know PHP, but it seems like the $ is part of the symbol, so it should be part of the word. Users can always break things by editing config files so I wouldn't worry about them changing wordchars, so long as the comment in the config mentions it affects Scintilla only.
Lots of confusion seems to be caused by the fact that wordchars are being used as symbolchars...
[...]
So then we have to ensure that the filetype wordchars lists are right, and that tagmanager and lexers have no obvious bugs. Thats all :-)
...and we don't support what the user wanted to do?
IIUC the user wanted it to be consistent, ie $ is part of the symbol.
If $ is part of the PHP filetype's wordchars it would be consistent if we used Scintilla word to select the usage search pattern as suggested?
Cheers Lex
PS I just realised that some languages eg Lua allow any Unicode letters in symbols, the Scintilla definition can't be right for them since it treats all characters >= 0x80 as wordchars and a quick look at the Lua lexer seems to show it also treats anything >0x80 as a symbolchar. But lua tagmanager uses any non-blank sequence before '=' or '('. No match!! Essentially can't win for some languages so just make the minimal change that doesn't break things.