Le 20/08/2011 01:33, Lex Trotman a écrit :
On 20 August 2011 03:40, Dimitar Zhekov dimitar.zhekov@gmail.com wrote:
On Fri, 19 Aug 2011 18:10:42 +0200 Colomban Wendling lists.ban@herbesfolles.org wrote:
Hi,
I'm trying to address bug 3386129 [1], and I'd like comments & reviews about my fix, because the whole thing don't look obvious at all...
We already have 2 ways of determining what a "word" is: a manual one using GEANY_WORDCHARS or a caller-given list of wordchars, and one that uses Scintilla's word boundaries.
3? Shoudn't we have symbolchars for the current programming language ([A-Za-z_] if unknown), and wordchars that match the current locale? They don't have much in common.
By wordchars we mean symbolchars, this confusion has existed from the beginnings of C at least, and we ain't gonna change it now. :-)
Locale/human language word ends are not as simple as sets of characters so lets not go there, we would need something like IIUC to do that.
Yeah, I didn't meant anything about locale characters. However Scintilla seems to "magically" handle them correctly... maybe because of whitespacechars.
The former seems to make more sense when the caller code knows the kind of characters it wants (e.g. tags lookups), but the latter is better when getting the word to search for.
Shouldn't the tags be using the same definition of word chars as Scintilla's highlighting? I don't trust "knowing" stuff in two places, they will never match :-) I understand that it might be a bit of work to hack tagmanager into line though.
I'm afraid it'd be more than "a bit of work": most lexers won't work correctly if the symbolchars are not the expected ones. See the example of the bug I refer in my first mail: the reporter wanted to add "$" in the PHP wordchars for it to be selected as part of the variable. choosing whether to have the "$" in the symbolchars is NOT possible in a parser's POV. So either it's not configurable or it's not used for parsing.
Scintilla lexers are a bit less concerned about this than tagmanager since they generally don't mind much about the file content and only takes care of a few control sequences (blocks, comments, etc.). Plus of course the keywords of course, those might use wordchars.
Actually the rational why we need to use Scintilla's representation of a "word" when we do a search is that if we do a "full word" search and the peeked "word" doesn't match the one Scintilla would have peek, the search won't match the expected results.
[...]
There is always a SCI_SETWORDCHARS... Hmmm, we even use it to set the sci wordchars to the filetype wordchars if we don't know the exact lexer or something? Well, I guess it's really non-trivial.
We should be always setting Scintilla's wordchars from the filetype file, although IIUC a few lexers think they know better and ignore them.
Not completely sure what Scintilla uses wordchars for, but in combination with "whitespace characters" it seems to uses it for keyboard navigation [1].
So in the attached patch, I added a alternative way to get the the current word (that uses the same algorithm as the word selection) and tries to use it whenever the word was fetched for a search.
Makes sense to me. Though I'm not sure about that SCI_SETWORDCHARS we use in highlighting:styleset_common().
Required, to make highlighting match word definitions (assuming lexer cooperation).
[...] Nothing else suspicious, at least
from a first sight.
+1
Maybe everything should use the filetype wordchars definition, with GEANY_WORDCHARS moved to filetypes.common as the default.
Well, although it looks sensible at first sight (and probably would be in most situations), we have a few places where we tune wordchars for special cases, like:
callbacks.c:988: search for a color representation, adds # to wordchars callbacks.c:1618: search for a filename
However maybe these should use something specific rather than the contrary.
Cheers, Colomban
[1] http://www.scintilla.org/ScintillaDoc.html#SCI_SETWORDCHARS