[geany/geany] Use Scintilla lexer names rather than deprecated IDs (PR #3668)

List overview All Threads

newer

older

[geany/geany] Update pt.po (PR...

[geany/plugins.geany.org] Update...

Colomban Wendling

31 Oct 2023 31 Oct '23

1:36 p.m.

Title says it all: instead of fetching the name from the deprecated ID, directly use the lexer name. This removes a deprecation warning, and probably aligns us for the future. You can view, comment on, or merge this pull request online at:

https://github.com/geany/geany/pull/3668

-- Commit Summary --

* Use Scintilla lexer names rather than deprecated IDs

-- File Changes --

M src/highlighting.c (6) M src/highlightingmappings.h (114) M src/sciwrappers.c (17) M src/sciwrappers.h (2)

-- Patch Links --

https://github.com/geany/geany/pull/3668.patch https://github.com/geany/geany/pull/3668.diff

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668 You are receiving this because you are subscribed to this thread. Message ID: <geany/geany/pull/3668@github.com>

Attachments:

attachment.html (text/html — 3.2 KB)

Show replies by date

Jiří Techet

2 Nov 2 Nov

8:06 p.m.

@techee commented on this pull request.

Thumbs up for fixing the only warning that we, ordinary humans with no special compiler options, see during Geany compilation :-).

Looks good _but_ I didn't check if the actual lexer names are correct - but see the related comment in the review.

...

@@ -696,23 +696,24 @@ gint sci_get_lexer(ScintillaObject *sci)

}

-void sci_set_lexer(ScintillaObject *sci, guint lexer_id) +void sci_set_lexer_name(ScintillaObject *sci, const gchar *lexer_name)

This sounds like the function sets the name of the lexer but it just sets it based on the name. Wouldn't it be better to keep it named as before?

...

{

- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);

I was wondering also in https://github.com/geany/geany/pull/3616 if it would be possible to do something similar like we do for the TM mappings checks - when starting Geany, going through all the lexers we use and check if we manage to create them; if not, hard-crash with `g_error()`.

This would make errors like https://github.com/geany/geany/pull/3616 impossible. But this depends on two things: 1. `CreateLexer()` shouldn't be too expensive to call, I haven't checked what exactly it does. 2. We'd need to iterate through all lexers. Right now all lexers have to be mentioned explicitly at 2 different places in switches in `highlighting.c` and we probably don't want to add one more. So we'd have to create a table containing all the lexers, possibly in a similar way like in `filetypes.c`. As a side-effect one could then maybe avoid the warning from https://github.com/geany/geany/pull/2398 (which is produced by clang by default).

Thoughts?

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#pullrequestreview-1711028659 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1711028659@github.com

Colomban Wendling

10:37 p.m.

@b4n commented on this pull request.

...

{

- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);

I don't know how expansive it'd be indeed, so whether or not it's reasonable to do it, but we could check. From a quick look, it'll perform a linear search for the lexer of that name, and call Create() on the module, which either calls the lexer's `LexerFactoryFunction` (calling lexer code, which normally would instantiate an instance of the relevant lexer), or instantiate a new `LexerSimple`. So… it seems to be in between cheap and expansive, but I didn't profile anything.

...

2. We'd need to iterate through all lexers. Right now all lexers have to be mentioned explicitly at 2 different places in switches in `highlighting.c` and we probably don't want to add one more. So we'd have to create a table containing all the lexers, possibly in a similar way like in `filetypes.c`.

I was wondering whether, now that we use a name for the lexers, if we'd want to avoid hard-coding things in highlightingmappings.h and move that to the filetypes files à-la-SciTE? E.g. maybe something like this:

``` [lexer] # could also be e.g. in [settings] as lexer_name name = cpp # style[lexer style number] = style name for use in [styling] styles[0] = default styles[1] = comment styles[2] = commentline styles[3] = commentdoc # … # keywors, those could possibly use GetWordListDescription() or similar to have actual names keywords[0] = primary keywords[1] = secondary keywords[2] = docComment # … ```

or if we were to break compatibility, we could avoid adding a proxy and just use the style/keyword numbers directly instead of mapping them twice..

I don't particularly like the idea of having arbitrary numbers there, but in the end, is it worse than the internal tables? And maybe we could use `NamedStyles()`/`NameOfStyle()` and `DescribeWordListSets()` if enough lexers we need support that.

Anyway, it might not be relevant as we're not loading all filetype files anyway.

But note that any list to check would have to *actually* be synchronized with what is used; the issue in #3616 would only be found if checking the ID we *used*, not the list of lexers we *had*, as it was missing from the `AddLexerModules()` call.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1380806370 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1711282555@github.com

Colomban Wendling

10:37 p.m.

@b4n commented on this pull request.

...

@@ -696,23 +696,24 @@ gint sci_get_lexer(ScintillaObject *sci)

}

-void sci_set_lexer(ScintillaObject *sci, guint lexer_id) +void sci_set_lexer_name(ScintillaObject *sci, const gchar *lexer_name)

Fair enough, as indeed I ended up removing the old one. Will do.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1380806643 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1711282979@github.com

Colomban Wendling

10:42 p.m.

@b4n pushed 1 commit.

1009e021c39bb116e77f007ec4e59d7fb64fc8a6 Use Scintilla lexer names rather than deprecated IDs

-- View it on GitHub: https://github.com/geany/geany/pull/3668/files/28fb2d044defbfe62bb29464f3e96... You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/push/15674220437@github.com

Colomban Wendling

10:47 p.m.

...

Looks good _but_ I didn't check if the actual lexer names are correct - but see the related comment in the review.

Yeah that's the tricky/boring part.

What I did was based on the results from `git grep -h 'LexerModule.*SCLEX_' | sed -r 's/^.*(SCLEX_\w*)\W[^"]*("[^"]*").*$/\1: \2/' | sort -u` (which misses Erlang which has its definition span multiple lines, but that's the only one), and then do it manually.

I validated part of it by loading all ctags test files `geany -vc /tmp/tempconf/ $(find tests/ctags/ -type f ! -name '*.tags')` (this command is not perfect and load TRS/LOG files as well, but meh), and verify that nothing crashes, and that everything is styled some way or another (e.g. has colors), and get worried each time it didn't work only to see we don't have a lexer for that language.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#issuecomment-1791583382 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/c1791583382@github.com

Colomban Wendling

11:01 p.m.

…and you could check this patch using this: ```bash #!/bin/bash

diff -u \ <( git show | grep -P '#define highlighting_lexer_' | awk '{print $3}' | \ while read -r lexer_id; do read -r lexer_name test "${lexer_name%_*}" = highlighting_lexer && continue echo "$lexer_id $lexer_name" done | sort -u ) \ <( git grep -h 'LexerModule.*SCLEX_' | sed -r 's/^.*(SCLEX_\w*)\W[^"]*("[^"]*").*$/\1 \2/' | sort -u ) ```

which should give you a diff between the changes in the patch (`LEXER_ID "name"`) and the `LexerModule()` calls with their names. What I see myself: ```diff $ bash check-lexer-names.sh --- /dev/fd/63 2023-11-02 22:53:41.310224390 +0100 +++ /dev/fd/62 2023-11-02 22:53:41.310224390 +0100 @@ -1,19 +1,21 @@ SCLEX_ABAQUS "abaqus" SCLEX_ADA "ada" +SCLEX_AS "as" SCLEX_ASCIIDOC "asciidoc" SCLEX_ASM "asm" SCLEX_AU3 "au3" SCLEX_BASH "bash" SCLEX_BATCH "batch" +SCLEX_BLITZBASIC "blitzbasic" SCLEX_CAML "caml" SCLEX_CMAKE "cmake" SCLEX_COBOL "COBOL" SCLEX_COFFEESCRIPT "coffeescript" SCLEX_CPP "cpp" +SCLEX_CPPNOCASE "cppnocase" SCLEX_CSS "css" SCLEX_D "d" SCLEX_DIFF "diff" -SCLEX_ERLANG "erlang" SCLEX_F77 "f77" SCLEX_FORTH "forth" SCLEX_FORTRAN "fortran" @@ -24,10 +26,13 @@ SCLEX_JULIA "julia" SCLEX_LATEX "latex" SCLEX_LISP "lisp" +SCLEX_LITERATEHASKELL "literatehaskell" SCLEX_LUA "lua" SCLEX_MAKEFILE "makefile" SCLEX_MARKDOWN "markdown" +SCLEX_MATLAB "matlab" SCLEX_NSIS "nsis" +SCLEX_NULL "null" SCLEX_OCTAVE "octave" SCLEX_PASCAL "pascal" SCLEX_PERL "perl" @@ -35,6 +40,7 @@ SCLEX_PO "po" SCLEX_POWERSHELL "powershell" SCLEX_PROPERTIES "props" +SCLEX_PUREBASIC "purebasic" SCLEX_PYTHON "python" SCLEX_R "r" SCLEX_RUBY "ruby" ``` Tells me the patch doesn't change `SCLEX_AS`, `SCLEX_BLITZBASIC`, `SCLEX_CPPNOCASE`, `SCLEX_LITERATEHASKELL`, `SCLEX_MATLAB`, `SCLEX_NULL` and `SCLEX_PUREBASIC`. This looks fine, as we don't use these lexers -- and if I failed to transform an integer to a string I'd have a nasty warning (buried in the middle of the non-fixed ones, though :wink:). It also tells me I transformed `SCLEX_ERLANG` that didn't exist, but that's a glitch in the grep as mentioned above. Basically, not seeing any `-`/`+` pairs is a good sign :slightly_smiling_face:

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#issuecomment-1791600374 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/c1791600374@github.com

elextr

3 Nov 3 Nov

1:15 a.m.

@elextr commented on this pull request.

...

{

- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);

...

going through all the lexers we use and check if we manage to create them;

Making startup slower, would have to be avoided if geany is only being run to send a filename over the socket.

Better if all lexers are tested by CI not internal to Geany.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1380912804 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1711449679@github.com

Jiří Techet

6:36 p.m.

@techee commented on this pull request.

...

{

- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);

To avoid speculations regarding the speed of creation of all lexers, I created #3673 testing it (less than 2 ms on Raspberry Pi 4). So I don't think we have to worry about speed here.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1382019208 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713219452@github.com

Jiří Techet

6:47 p.m.

@techee commented on this pull request.

...

{

- g_warning("Failed to find lexer for ID %u", lexer_id); + g_warning("Failed to find lexer for name %s", lexer_name);

...

I was wondering whether, now that we use a name for the lexers, if we'd want to avoid hard-coding things in highlightingmappings.h and move that to the filetypes files à-la-SciTE? E.g. maybe something like this:

That would of course be great. No "built-in" or "external" filetypes, just builtin lexers and builtin ctags parsers which are configured dynamically in config files.

The other things like the indexed style definitions would be possible too, it would just be a little too magic for ordinary users as they don't know what's at the given index and would have to check the lexer sources.

...

But note that any list to check would have to actually be synchronized with what is used; the issue in https://github.com/geany/geany/pull/3616 would only be found if checking the ID we used, not the list of lexers we had, as it was missing from the AddLexerModules() call.

Yes, sure.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1382030236 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713237285@github.com

Jiří Techet

6:50 p.m.

...

…and you could check this patch using this:

Wizardry :-)

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#issuecomment-1792883141 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/c1792883141@github.com

Thomas Martitz

9:20 p.m.

@kugel- commented on this pull request.

...

SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);

- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))

We should probably deprecate `sci_get_lexer()` and stop using it internally.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#pullrequestreview-1713475789 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713475789@github.com

Colomban Wendling

10:03 p.m.

@b4n commented on this pull request.

...

SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);

- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))

I'm not sure what is the recommendation on this: the [documentation](https://scintilla.org/ScintillaDoc.html#SCI_SETILEXER) says to use this for checking whether setting the new lexer worked. OTOH, there is [SCI_GETLEXERLANGUAGE](https://scintilla.org/ScintillaDoc.html#SCI_GETLEXERLANGUAGE) which probably would work as well (albeit a bit more expansive).

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1382201059 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713529687@github.com

Thomas Martitz

10:07 p.m.

@kugel- commented on this pull request.

...

SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);

- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))

Must be out-dated. Clearly, because of

...

Some lexers may not have a lexer ID, just a lexer name in which case 0 is returned.

`SCI_GETLEXER` cannot be reliably used anymore, if using any recent lexer anyway.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1382203937 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713533346@github.com

Colomban Wendling

10:38 p.m.

@b4n commented on this pull request.

...

SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);

- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))

OK. I can easily use this here (seems to work), but porting every use in Geany is gonna be more tricky, especially for cases like `highlighting_is_*_style()`

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1382225682 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713565445@github.com

elextr

4 Nov 4 Nov

2:25 a.m.

...

To avoid speculations regarding the speed of creation of all lexers, I created https://github.com/geany/geany/pull/3673 testing it (less than 2 ms on Raspberry Pi 4). So I don't think we have to worry about speed here.

Thank you, but my intended point was that it should be done offline before release (if not in CI), not when being used by a user, thats too late.

I didn't read the discussion above, but I totally support replacing builtins with configurability.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#issuecomment-1793286047 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/c1793286047@github.com

Jiří Techet

6:55 p.m.

...

Thank you, but my intended point was that it should be done offline before release (if not in CI), not when being used by a user, thats too late.

Yeah, but that would happen. If we do it every time Geany starts, it will also happen for unit tests because these too launch the Geany binary, and of course the developer who is updating to a new version of Scintilla will see it too because Geany will just refuse to start.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#issuecomment-1793507837 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/c1793507837@github.com

Thomas Martitz

5 Nov 5 Nov

2:20 p.m.

@kugel- commented on this pull request.

...

SSM(sci, SCI_SETILEXER, 0, (uintptr_t) lexer);

- if (old != (gint)lexer_id) + if (old != sci_get_lexer(sci))

Well we'll have to start somewhere. Other places can be fixed incrementally I would say.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3668#discussion_r1382575002 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/pull/3668/review/1713984778@github.com

426

Age (days ago)

431

Last active (days ago)

github-comments@lists.geany.org

17 comments

4 participants

tags (0)

participants (4)

Colomban Wendling
elextr
Jiří Techet
Thomas Martitz