Title says it all: instead of fetching the name from the deprecated ID, directly use the lexer name. This removes a deprecation warning, and probably aligns us for the future. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/3668
-- Commit Summary --
* Use Scintilla lexer names rather than deprecated IDs
-- File Changes --
M src/highlighting.c (6) M src/highlightingmappings.h (114) M src/sciwrappers.c (17) M src/sciwrappers.h (2)
-- Patch Links --
https://github.com/geany/geany/pull/3668.patch https://github.com/geany/geany/pull/3668.diff
@techee commented on this pull request.
Thumbs up for fixing the only warning that we, ordinary humans with no special compiler options, see during Geany compilation :-).
Looks good _but_ I didn't check if the actual lexer names are correct - but see the related comment in the review.
@@ -696,23 +696,24 @@ gint sci_get_lexer(ScintillaObject *sci)
}
-void sci_set_lexer(ScintillaObject *sci, guint lexer_id) +void sci_set_lexer_name(ScintillaObject *sci, const gchar *lexer_name)
This sounds like the function sets the name of the lexer but it just sets it based on the name. Wouldn't it be better to keep it named as before?
{
- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);
I was wondering also in https://github.com/geany/geany/pull/3616 if it would be possible to do something similar like we do for the TM mappings checks - when starting Geany, going through all the lexers we use and check if we manage to create them; if not, hard-crash with `g_error()`.
This would make errors like https://github.com/geany/geany/pull/3616 impossible. But this depends on two things: 1. `CreateLexer()` shouldn't be too expensive to call, I haven't checked what exactly it does. 2. We'd need to iterate through all lexers. Right now all lexers have to be mentioned explicitly at 2 different places in switches in `highlighting.c` and we probably don't want to add one more. So we'd have to create a table containing all the lexers, possibly in a similar way like in `filetypes.c`. As a side-effect one could then maybe avoid the warning from https://github.com/geany/geany/pull/2398 (which is produced by clang by default).
Thoughts?
@b4n commented on this pull request.
{
- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);
I don't know how expansive it'd be indeed, so whether or not it's reasonable to do it, but we could check. From a quick look, it'll perform a linear search for the lexer of that name, and call Create() on the module, which either calls the lexer's `LexerFactoryFunction` (calling lexer code, which normally would instantiate an instance of the relevant lexer), or instantiate a new `LexerSimple`. So… it seems to be in between cheap and expansive, but I didn't profile anything.
2. We'd need to iterate through all lexers. Right now all lexers have to be mentioned explicitly at 2 different places in switches in `highlighting.c` and we probably don't want to add one more. So we'd have to create a table containing all the lexers, possibly in a similar way like in `filetypes.c`.
I was wondering whether, now that we use a name for the lexers, if we'd want to avoid hard-coding things in highlightingmappings.h and move that to the filetypes files à-la-SciTE? E.g. maybe something like this:
``` [lexer] # could also be e.g. in [settings] as lexer_name name = cpp # style[lexer style number] = style name for use in [styling] styles[0] = default styles[1] = comment styles[2] = commentline styles[3] = commentdoc # … # keywors, those could possibly use GetWordListDescription() or similar to have actual names keywords[0] = primary keywords[1] = secondary keywords[2] = docComment # … ```
or if we were to break compatibility, we could avoid adding a proxy and just use the style/keyword numbers directly instead of mapping them twice..
I don't particularly like the idea of having arbitrary numbers there, but in the end, is it worse than the internal tables? And maybe we could use `NamedStyles()`/`NameOfStyle()` and `DescribeWordListSets()` if enough lexers we need support that.
Anyway, it might not be relevant as we're not loading all filetype files anyway.
But note that any list to check would have to *actually* be synchronized with what is used; the issue in #3616 would only be found if checking the ID we *used*, not the list of lexers we *had*, as it was missing from the `AddLexerModules()` call.
@b4n commented on this pull request.
@@ -696,23 +696,24 @@ gint sci_get_lexer(ScintillaObject *sci)
}
-void sci_set_lexer(ScintillaObject *sci, guint lexer_id) +void sci_set_lexer_name(ScintillaObject *sci, const gchar *lexer_name)
Fair enough, as indeed I ended up removing the old one. Will do.
@b4n pushed 1 commit.
1009e021c39bb116e77f007ec4e59d7fb64fc8a6 Use Scintilla lexer names rather than deprecated IDs
Looks good _but_ I didn't check if the actual lexer names are correct - but see the related comment in the review.
Yeah that's the tricky/boring part.
What I did was based on the results from `git grep -h 'LexerModule.*SCLEX_' | sed -r 's/^.*(SCLEX_\w*)\W[^"]*("[^"]*").*$/\1: \2/' | sort -u` (which misses Erlang which has its definition span multiple lines, but that's the only one), and then do it manually.
I validated part of it by loading all ctags test files `geany -vc /tmp/tempconf/ $(find tests/ctags/ -type f ! -name '*.tags')` (this command is not perfect and load TRS/LOG files as well, but meh), and verify that nothing crashes, and that everything is styled some way or another (e.g. has colors), and get worried each time it didn't work only to see we don't have a lexer for that language.
…and you could check this patch using this: ```bash #!/bin/bash
diff -u \ <( git show | grep -P '#define highlighting_lexer_' | awk '{print $3}' | \ while read -r lexer_id; do read -r lexer_name test "${lexer_name%_*}" = highlighting_lexer && continue echo "$lexer_id $lexer_name" done | sort -u ) \ <( git grep -h 'LexerModule.*SCLEX_' | sed -r 's/^.*(SCLEX_\w*)\W[^"]*("[^"]*").*$/\1 \2/' | sort -u ) ```
which should give you a diff between the changes in the patch (`LEXER_ID "name"`) and the `LexerModule()` calls with their names. What I see myself: ```diff $ bash check-lexer-names.sh --- /dev/fd/63 2023-11-02 22:53:41.310224390 +0100 +++ /dev/fd/62 2023-11-02 22:53:41.310224390 +0100 @@ -1,19 +1,21 @@ SCLEX_ABAQUS "abaqus" SCLEX_ADA "ada" +SCLEX_AS "as" SCLEX_ASCIIDOC "asciidoc" SCLEX_ASM "asm" SCLEX_AU3 "au3" SCLEX_BASH "bash" SCLEX_BATCH "batch" +SCLEX_BLITZBASIC "blitzbasic" SCLEX_CAML "caml" SCLEX_CMAKE "cmake" SCLEX_COBOL "COBOL" SCLEX_COFFEESCRIPT "coffeescript" SCLEX_CPP "cpp" +SCLEX_CPPNOCASE "cppnocase" SCLEX_CSS "css" SCLEX_D "d" SCLEX_DIFF "diff" -SCLEX_ERLANG "erlang" SCLEX_F77 "f77" SCLEX_FORTH "forth" SCLEX_FORTRAN "fortran" @@ -24,10 +26,13 @@ SCLEX_JULIA "julia" SCLEX_LATEX "latex" SCLEX_LISP "lisp" +SCLEX_LITERATEHASKELL "literatehaskell" SCLEX_LUA "lua" SCLEX_MAKEFILE "makefile" SCLEX_MARKDOWN "markdown" +SCLEX_MATLAB "matlab" SCLEX_NSIS "nsis" +SCLEX_NULL "null" SCLEX_OCTAVE "octave" SCLEX_PASCAL "pascal" SCLEX_PERL "perl" @@ -35,6 +40,7 @@ SCLEX_PO "po" SCLEX_POWERSHELL "powershell" SCLEX_PROPERTIES "props" +SCLEX_PUREBASIC "purebasic" SCLEX_PYTHON "python" SCLEX_R "r" SCLEX_RUBY "ruby" ``` Tells me the patch doesn't change `SCLEX_AS`, `SCLEX_BLITZBASIC`, `SCLEX_CPPNOCASE`, `SCLEX_LITERATEHASKELL`, `SCLEX_MATLAB`, `SCLEX_NULL` and `SCLEX_PUREBASIC`. This looks fine, as we don't use these lexers -- and if I failed to transform an integer to a string I'd have a nasty warning (buried in the middle of the non-fixed ones, though :wink:). It also tells me I transformed `SCLEX_ERLANG` that didn't exist, but that's a glitch in the grep as mentioned above. Basically, not seeing any `-`/`+` pairs is a good sign :slightly_smiling_face:
@elextr commented on this pull request.
{
- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);
going through all the lexers we use and check if we manage to create them;
Making startup slower, would have to be avoided if geany is only being run to send a filename over the socket.
Better if all lexers are tested by CI not internal to Geany.
@techee commented on this pull request.
{
- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);
To avoid speculations regarding the speed of creation of all lexers, I created #3673 testing it (less than 2 ms on Raspberry Pi 4). So I don't think we have to worry about speed here.
@techee commented on this pull request.
{
- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);
I was wondering whether, now that we use a name for the lexers, if we'd want to avoid hard-coding things in highlightingmappings.h and move that to the filetypes files à-la-SciTE? E.g. maybe something like this:
That would of course be great. No "built-in" or "external" filetypes, just builtin lexers and builtin ctags parsers which are configured dynamically in config files.
The other things like the indexed style definitions would be possible too, it would just be a little too magic for ordinary users as they don't know what's at the given index and would have to check the lexer sources.
But note that any list to check would have to actually be synchronized with what is used; the issue in https://github.com/geany/geany/pull/3616 would only be found if checking the ID we used, not the list of lexers we had, as it was missing from the AddLexerModules() call.
Yes, sure.
…and you could check this patch using this:
Wizardry :-)
@kugel- commented on this pull request.
SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);
- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))
We should probably deprecate `sci_get_lexer()` and stop using it internally.
@b4n commented on this pull request.
SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);
- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))
I'm not sure what is the recommendation on this: the [documentation](https://scintilla.org/ScintillaDoc.html#SCI_SETILEXER) says to use this for checking whether setting the new lexer worked. OTOH, there is [SCI_GETLEXERLANGUAGE](https://scintilla.org/ScintillaDoc.html#SCI_GETLEXERLANGUAGE) which probably would work as well (albeit a bit more expansive).
@kugel- commented on this pull request.
SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);
- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))
Must be out-dated. Clearly, because of
Some lexers may not have a lexer ID, just a lexer name in which case 0 is returned.
`SCI_GETLEXER` cannot be reliably used anymore, if using any recent lexer anyway.
@b4n commented on this pull request.
SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);
- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))
OK. I can easily use this here (seems to work), but porting every use in Geany is gonna be more tricky, especially for cases like `highlighting_is_*_style()`
To avoid speculations regarding the speed of creation of all lexers, I created https://github.com/geany/geany/pull/3673 testing it (less than 2 ms on Raspberry Pi 4). So I don't think we have to worry about speed here.
Thank you, but my intended point was that it should be done offline before release (if not in CI), not when being used by a user, thats too late.
I didn't read the discussion above, but I totally support replacing builtins with configurability.
Thank you, but my intended point was that it should be done offline before release (if not in CI), not when being used by a user, thats too late.
Yeah, but that would happen. If we do it every time Geany starts, it will also happen for unit tests because these too launch the Geany binary, and of course the developer who is updating to a new version of Scintilla will see it too because Geany will just refuse to start.
@kugel- commented on this pull request.
SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);
- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))
Well we'll have to start somewhere. Other places can be fixed incrementally I would say.
github-comments@lists.geany.org