I've just noticed that typenames don't get colorized for newly opened documents (this is from editor.h):
<img width="467" alt="Screenshot 2023-09-01 at 20 49 40" src="https://github.com/geany/geany/assets/713965/8255c38b-096b-400b-a035-f772cc504566">
However, when Geany gets restarted and this file is opened from the session list (and not manually), colorization works:
<img width="467" alt="Screenshot 2023-09-01 at 20 50 16" src="https://github.com/geany/geany/assets/713965/1c614c29-bd50-4566-8f01-4c7b8de8f7d9">
I've bisected this to the commit updating Scintilla to version 5 just after the 1.38 release.
Current git WFM. Can you give a more precise reproducer, ie exact steps?
OK, it appears to be some problem when ProjectOrganizer is enabled - I'll investigate more.
Are you SURE you don't have one of your developmental ctags/tagmangler patches present?
Yeah, pretty sure, tag manager is for losers. I now have a [LSP](https://microsoft.github.io/language-server-protocol/) plugin + related Geany patches in the works ;-).
LSP is old school, should interface to ChatGPT to have it write the code for you :stuck_out_tongue_winking_eye:
OK, to trigger the problem, one has to change the filetype to a compatible type (which currently is just C and C++). So when you change the filetype from C to C++ (and then possibly back), the highlighting disappears. See #3553.
LSP is old school, should interface to ChatGPT to have it write the code for you 😜
Yeah, but coding will be trivial then. I'll just start with `write a better editor than Geany` followed by a sequence of `write a better editor than the one you have just written`. Can't go wrong ;-).
Interesting, so IIUC, "compatible" is really only one way, C++ can read C, especially with the ctags parser, so it probably has the same type keyword string, so same hash. But changing C++ to C will almost certainly fail quickly (at the first occurrence of "class" for eg) but (IIUC) if the parser fails Geany leaves the typenames the same so as to try to minimise flashing and changing of the symbols pane as the user types, so again hash is the same.
Confirmed by adding `typedef foo int;` to change the hash and get colouring again.
But since 0 is a valid hash value maybe `keyword_hash` should be incremented to get a different hash value so the comparison fails, otherwise the problem might still appear one in every 2^32 times the user swaps C and C++ types on the file :stuck_out_tongue_winking_eye: .
But since 0 is a valid hash value maybe keyword_hash should be incremented to get a different hash value so the comparison fails, otherwise the problem might still appear one in every 2^32 times the user swaps C and C++ types on the file 😜 .
There's already this "problem" where two different sets of keywords may have the same hash. But probably not a problem in practice.
@techee the code [here](https://github.com/geany/geany/blob/3787677de43204d1396c8f5439e82e90fe61b8a4...) is fundamentally flawed, it only compares the hashes, not the strings, so any hash clash between different strings will result in the keywords not being loaded.
Personally I would just keep the string and do a compare, its unlikely to be slower than calculating a hash. That would mean keeping the string, but the memory usage is unlikely to be significant compared to the tagmanager structs it is created from.
Edit: Yes, the situation won't happen often, but it comes under the heading of "Why do the theoretically wrong thing when the right thing is probably not materially worse"
Yes, but does it really happen in practice? This isn't like a hash table where you have say 1000 entries so collisions are frequent. Here you should really have something much more close to the ideal case 2^32. At least I haven't experienced any problems with it yet.
Also the string with all typenames can get quite big when you index the whole project like GTK and you'd have to store it for every open document.
Also the string with all typenames can get quite big when you index the whole project like GTK and you'd have to store it for every open document.
Surely its still only a fraction of the tagmanager structures that has generated?
Oh wait, its the "every file can see every other files symbols even if they are not used" problem isn't it, so its `the number of open C/C++ files * all C/C++ symbols` size.
Sigh, ok, I guess the wrong way might have to do, and any clashes will be guaranteed to be "unable to reproduce" :-)
Oh wait, its the "every file can see every other files symbols even if they are not used" problem isn't it
Yes.
I think this should actually be configurable because the colorization of types is kind of random based on what files you have open (unless you let the tag manager index all project files plus all the dependency files).
It also has a non-zero performance cost because Scintilla goes through the supplied keywords one by one for every word in the document to check whether the word should be colorized. I remember submitting a patch to Scintilla so it at least uses binary search to check whether the word is in the keyword list but Neil wanted to have this optimized for smaller number of keywords where the linear search won so the patch wasn't accepted.
so its the number of open C/C++ files * all C/C++ symbols size
Not all symbols size but only those which define types. And only their names concatenated together into a single string.
Sigh, ok, I guess the wrong way might have to do, and any clashes will be guaranteed to be "unable to reproduce" :-)
Yeah, exactly, that's the approach. Users are just wrong ;-).
PS I don't think the problem is anything to do with Scintilla 5, its just that its the first time anyone noticed, after all who would change filetypes from C to C++?
It is. I tried previous versions and it worked. The problem is now more visible with the new ProjectOrganizer feature that automatically sets header's filetype based on the source file (e.g. for `my_header.h` and `my_source.c` in the project it sets `C` as the header's filetype from the original `C++` filetype).
unless you let the tag manager index all project files plus all the dependency files
What about loaded global tags types, won't they get coloured?
I remember submitting a patch to Scintilla so it at least uses binary search to check whether the word is in the keyword list but Neil wanted to have this optimized for smaller number of keywords where the linear search won so the patch wasn't accepted.
Apply the Geany solution (make it an option :grin:) to put the keywords in a `std::set`, then Neil can have his fast when small and you can have your fast(ish) when large. But I would benchmark because of the cost of recreating the set each time we send new typenames.
I remember submitting a patch to Scintilla so it at least uses binary search to check whether the word is in the keyword list but Neil wanted to have this optimized for smaller number of keywords where the linear search won so the patch wasn't accepted.
It's https://sourceforge.net/p/scintilla/feature-requests/1312/, here is the implementation for Notepad2 https://github.com/zufuliu/notepad2/blob/main/scintilla/lexlib/WordList.cxx#...
Maybe Geany can switch to sub styles (which use `std::map` to map word to style) for tag highlighting: https://github.com/ScintillaOrg/lexilla/blob/master/lexlib/SubStyles.h#L17 https://github.com/ScintillaOrg/lexilla/blob/master/lexers/LexCPP.cxx#L922
It's https://sourceforge.net/p/scintilla/feature-requests/1312/, here is the implementation for Notepad2
https://github.com/zufuliu/notepad2/blob/main/scintilla/lexlib/WordList.cxx#...
The one I was talking about was
https://sourceforge.net/p/scintilla/bugs/1734/
I don't completely agree with Neil's argument regarding optimization of short lists (which IMO doesn't require any optimization and is fast enough by itself) but it's not up to me to decide.
Anyway, we don't have any terrible performance problems with this - I was just mentioning it as an extra argument for possible typename colorization disable switch.
Closed #3550 as completed via #3553.
github-comments@lists.geany.org