Hard-coded ```C #define GEANY_WORDCHARS "_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" ``` is used at many places in the code instead of the user-defined `wordchars` from the config file. Some filetypes e.g. allow `$` or `-` inside identifiers and this breaks things for them.
Did you identify the specific issues? I agree it's sometimes an issue and I believe I've suffered from it with e.g. CSS, bit also that unconditionally using the filetype value everywhere might cause other issues -- or maybe it wouldn't?
Anyway, identifying the actual problems that this causes would.be great :)
> Did you identify the specific issues?
Yes, I noticed that when implementing the LSP plugin, see below.
> I agree it's sometimes an issue and I believe I've suffered from it with e.g. CSS, bit also that unconditionally using the filetype value everywhere might cause other issues -- or maybe it wouldn't?
I think we should make sure that user-specified `wordchars` always contain all characters from `GEANY_WORDCHARS` no matter what crazy thing the users define. Apart from that it should be fine IMO _except_ one thing - some languages allow unicode characters to be part of identifiers too and we should possibly allow them as well.
> Anyway, identifying the actual problems that this causes would.be great :)
What's below are only the problematic cases - I'm skipping those that seem to be OK. The biggest offender is `read_current_word()` from which it propagates to other functions.
**1st level** - direct usage of `GEANY_WORDCHARS`: https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - with NULL `wc` argument, it uses `GEANY_WORDCHARS`.
**2nd level** - usage of `read_current_word()`: https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - uses `GEANY_WORDCHARS` when wc == NULL
https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - propagates the problem to plugins which, when `wordchars` isn't specified, uses `GEANY_WORDCHARS`. Note that plugins don't have access to `wordchars` from filetype config file and have to hard-code something for every language or add some configuration (see https://github.com/techee/geany-lsp/blob/3a31ec9be8323c668299d1c292b7c401f2e... in geany-lsp)
https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - uses `GEANY_WORDCHARS` for everything except CSS and Latex and breaks autocompletion of unicode identifiers
**3rd level** - usage of `editor_get_word_at_pos()`: https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - may break scope autocompletion
**3rd level** - usage of `editor_find_current_word()`: https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - current word stored (end emitted with signal) with mouse right-click - I haven't investigated where exactly this leads
https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - calltips probably don't work for unicode symbols or symbols containing some special characters
https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - again, may break symbol goto
https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... - I haven't checked in detail what this one might cause
In addition, there's also https://github.com/geany/geany/blob/c043996a9a3dc1b0aa36571cd5f5d61fd8d2eb59... I haven't checked the places where this function is used and why it's used instead of `editor_find_current_word()` and whether this is correct.
Actually I didn't mean *where the code is using it*, but rather what are the user-visible symptoms of the code not doing it correctly :)
Actually I didn't mean where the code is using it, but rather what are the user-visible symptoms of the code not doing it correctly :)
Well, those functions above aren't where the code is using `GEANY_WORDCHARS`, but rather "where the code is using it incorrectly". So the user-visible symptoms follow from these. Now to test all the situations we would need a ctags parser supporting all the features and some extra identifier character such as `$`, `-`, etc. I've been lazy to search for such a parser and simulated this on C by removing `_` from `GEANY_WORDCHARS` and considering this the "missing identifier character" (such as `$` or `-` for other languages):
`editor_start_auto_complete()` - check the screenshot below - the word boundary isn't determined correctly and offers autocompletion only for the sequence following the last `_`. I'm pretty sure you'll run into the same thing with Verilog which allows `$` inside identifiers. <img width="440" alt="Screenshot 2024-11-17 at 22 57 38" src="https://github.com/user-attachments/assets/6b3c8073-88c4-432b-851b-967f0c34d440">
`autocomplete_scope()` - scope autocompletion works for variables without `_` but not for variables with `_`. <img width="458" alt="Screenshot 2024-11-17 at 22 59 28" src="https://github.com/user-attachments/assets/8e62aede-fe30-4242-a588-8c8c48b6d6ba">
`editor_show_calltip()` - no calltip for functions containing `_` because of incorrect word boundaries <img width="621" alt="Screenshot 2024-11-17 at 23 00 26" src="https://github.com/user-attachments/assets/b2b8a92e-d533-4949-bc51-b285de1b2207">
`symbols_goto_tag()` - again, because of incorrect word boundaries the goto attempt is made for an incorrect word. For a Verilog example see https://github.com/geany/geany/pull/4037#discussion_r1845239877 <img width="303" alt="Screenshot 2024-11-17 at 22 56 12" src="https://github.com/user-attachments/assets/fbd6fdf0-da35-46e9-bf81-f22b1da883dc">
`editor_get_word_at_pos()` - I have no example but clearly this function propagates this error to plugins using it.
In any case, `read_current_word()` is the source of all these problems and if it were modified to use `wordchars`, users could modify this behavior for the specific needs of the languages they use (or better, we'd provide the right `wordchars` in the default configuration). We could avoid the hacks in `editor_start_auto_complete()` which handles CSS and Latex but nothing else. We could also make sure that `wordchars` contains at least `GEANY_WORDCHARS` to harden against bad user configurations.
One good thing I was wrong about is unicode characters - those `!IS_ASCII()` checks seem to do the right thing in `read_current_word()` and all the simple unicode cases with Czech I tried worked fine for me.
Just from the topo of my head: PHP has `$` for variables, and it makes sense to select it with the variable name, but it's not part of the characters allowed in an identifier and ctags don't put iti in tags, so that's still gonna be a weird special case.
Depends how you look at it - to me it's more like `$variable_name` than "variable name starting with $" so selection of `variable_name` only doesn't seem like a too special case (also in some cases like when you e.g. want to rename a variable, you want to keep the first `$` and I think this is something users are used to).
By the way is there any need for `editor_find_current_word_sciwc()`? I think it uses some hard-coded Scintilla word boundary rules and it just is another source of possible source of word boundary inconsistencies.
Depends how you look at it - to me it's more like `$variable_name` than "variable name starting with $" so selection of `variable_name` only doesn't seem like a too special case (also in some cases like when you e.g. want to rename a variable, you want to keep the first `$` and I think this is something users are used to).
Well, when renaming a variable you usually want to avoid touching non-variables so including the $ seems reasonable to me, and I'm used to it, but I see your way of looking at it.
Related issue: https://github.com/geany/geany/issues/2679
github-comments@lists.geany.org