After spending a lot of time on things like tag manager where nobody sees any difference, I got quite a radical idea that it would be nice to start working on things that somebody wants ;-). Assuming that "thumbs up" is a good proxy to "I want this", this translates to:
<img width="765" alt="Screenshot 2024-08-23 at 12 33 41" src="https://github.com/user-attachments/assets/258915a2-b32d-4782-bbe4-e3942dcc198f">
I think we are actually in a very good position to address most of these in the next release(s). This meta-issue tries to summarize where we are if we try to address these issues:
- [ ] #905 - pretty much addressed by @LiquidCake's #3911 - the discussion in the issue also mentioned things like hot exit but what the OP asked for was really the Notepad++ feature automatically saving/restoring untitled documents. - [ ] #2184 - hopefully there - have to test a few more things (https://github.com/techee/geany-lsp) - [ ] #2750 - I asked Neil for increasing it so it's 60 (up from the previous 20) which is reasonably "unlimited". Will be fixed once we update Scintilla. - [ ] #1141 - addressed by #3899 (the feature itself is in Scintilla itself, we just have to make the click-modifiers configurable as depending on the desktop, some configurations work and some not) - [ ] #1268 - fixed by #3934 - [ ] #872 - I guess most users wanted it for editor tabs which is fixed already but there is also Nick's #3469 that could be applied for the sidebar and message window - [ ] #1192 - no work-in-progress-yet. The only issue here is that the popup window is shown using `dialogs_show_question_full()` which is used by other popups with possibly destructive actions where default "Cancel" is the right choice. So either this function has to take an extra parameter specifying which of the buttons should be default or to create a new dialog specifically for this situation. - [ ] #371 - see https://github.com/ScintillaOrg/lexilla/pull/265 which I eventually port for Geany - [ ] #3724 - slightly lower in the thumbs up list but I think it would also be nice to add, see https://github.com/ScintillaOrg/lexilla/pull/267
Fear not, I'm not adding #2061 to this list :-).
it would be nice to start working on things that somebody wants ;-)
Don't worry, just sit quietly in a dark room and the feeling will soon pass :stuck_out_tongue_winking_eye:
On your sadly unnumbered list:
The "persistent temp files" (PTF) is rather disappointing to me having been saved by nice vscode when there was a power cut. What vscode provides is intermittent saves of __all modified buffers__ (and not over the top of the original files like autosave does), and the ability to restore buffers to the same modified state they were. This would cover the limited PTF feature as well. Safety _and_ a new feature, what more could you want?
LSP agree basically there, and thanks again.
Zoom, whatever Scintilla supports is fine.
Multiple carets, no right answers about clashes of keycodes on different platforms and desktops yet.
TOML, done
Mouse wheel tabs, don't know why GTK removed it, probably because it was useful, but since it seems to work on the editor why not all notebooks indeed?
Wrap search on not found, there is a preference to automatically wrap search, so AFAICT #1192 (ignoring the hijacks) is done. The only argument is should that pref default to true not false, but myeh, you only have to set it once.
Only one query for dart since 2014 doesn't seem much support for adding yet another built in language. Just because you added it to Lexilla isn't an argument to build a wasted language to Geany :stuck_out_tongue_winking_eye: .
Same goes for zig.
What would be better than keeping adding built in stuff would be a @techee minor 100000000 or so line change that allows lexillas and ctags parsers and the language configuration to be in plugins so the cost of low traction languages isn't imposed on everybody (or dlls with different API if appropriate, eg IIUC Lexilla allows for that) ... now _thats_ something for you to get your teeth into :grin:
What vscode provides is intermittent saves of **all modified buffers** (and not over the top of the original files like autosave does), and the ability to restore buffers to the same modified state they were.
I think part of this is already present in some plugin ("auto save" or something like that, I'm typing this from phone). It saves modified files periodically in configured directory. It just lacks a nice way to restore the previous states. And yes, I'd love to have this feature 👍
Don't worry, just sit quietly in a dark room and the feeling will soon pass 😜
I feel I need to lock myself in the cellar for one year to achieve this :-)
The "persistent temp files" (PTF) is rather disappointing to me having been saved by nice vscode when there was a power cut. What vscode provides is intermittent saves of all modified buffers (and not over the top of the original files like autosave does), and the ability to restore buffers to the same modified state they were. This would cover the limited PTF feature as well. Safety and a new feature, what more could you want?
Sure, would be nice to have but it's a big project that would mean rewriting the majority of session handling in Geany and I'm not sure if anyone is ever going to do that. So I'd go for what we can have now easily - autosave pretty much simulates the file restore by overwriting original files (IMO doesn't matter much for properly VCS'd files) and the "persistent temp files" feature handles unnamed documents.
Multiple carets, no right answers about clashes of keycodes on different platforms and desktops yet.
IMO we have the right answer - configurable click-bindings which for symbol goto and rectangular selection default to the same value as before and for multiple cursor just something reasonable that can be changed when it conflicts with something else.
Wrap search on not found, there is a preference to automatically wrap search, so AFAICT https://github.com/geany/geany/issues/1192 (ignoring the hijacks) is done.
When you look at it, it's about something different - it's about the case when you see the popup dialog. In that dialog, the Cancel button is default that gets selected when pressing enter - and it would make more sense if Find was the default button. I can imagine that people want to be informed about the fact that they reached the end of the document and see this popup but be able to quickly wrap by only pressing Enter.
Only one query for dart since 2014 doesn't seem much support for adding yet another built in language. Just because you added it to Lexilla isn't an argument to build a wasted language to Geany 😜 .
See #371 (which has 5 thumbs up), #3313, and pull requests #379, #3372.
In Scintilla, issues sorted by thumbs up reaction are https://github.com/ScintillaOrg/lexilla/issues?q=is%3Aissue+is%3Aopen+sort%3...
In Notepad++ it's the most requested language: https://github.com/notepad-plus-plus/notepad-plus-plus/issues/651
Dart seems to be number 17 language based on the PR number metric: https://madnight.github.io/githut/#/pull_requests/2024/1
So I simply think that Geany should support it.
Same goes for zig.
Zig is indeed probably a different case, this is the first time I heard about it but it appears like quite a nice language and there seem to be relatively big projects using Zig on github and I don't see a reason why it shouldn't be supported by Geany.
What would be better than keeping adding built in stuff would be a @techee minor 100000000 or so line change that allows lexillas and ctags parsers and the language configuration to be in plugins so the cost of low traction languages isn't imposed on everybody (or dlls with different API if appropriate, eg IIUC Lexilla allows for that) ... now thats something for you to get your teeth into 😁
This is exactly the opposite of what this meta-issue tries to solve - real user's problems. I haven't seen a single user asking: "Please, modify Geany so I have to download language support from all around github and compile it myself" ;-). I think this is a fundamentally wrong approach.
What would be much better instead is to have all the lexilla configuration (that we currently hard-code) defined dynamically in configuration files and possibly link all the lexers present in Scintilla even if we don't use them right now. This would allow users to add support for their languages just by modifying the config files and referencing the linked lexers by IDs. The benefits would be: 1. Editing config files is much easier, especially for beginning programmers and people using something else than C (there's a kind of chicken and egg problem where we know C but don't know the language whose support is being added and the person wanting the support of the language he knows doesn't know C). We could get more contributions this way and do less work ourselves. 2. Given our frequency of releases, users could be advised to update the config files in a certain way if there are some problems or to grab some config files from the Geany repository when there's a support of a new language. 3. New/updated lexers would be properly upstreamed to lexilla (because we'd only support those) instead of some forks at random places at github and filetype config files would be sent to Geany.
@dolik-rce :
I think part of this is already present in some plugin ("auto save" or something like that, I'm typing this from phone). It saves modified files periodically in configured directory. It just lacks a nice way to restore the previous states. And yes, I'd love to have this feature 👍
AFAIK the `save_actions` core plugin saves a backup in the configured directory, but it still saves over the file as well, and it doesn't save the path of the file or any metadata, and it doesn't save buffers that are not file backed.
@techee :
mean rewriting the majority of session handling in Geany
yeah, but might make it waaaaaay simpler, anyway, ATM if you aren't gonna do it probably won't happen :disappointed:
As I said I'm disappointed, not against PTF, but it does make one more thing to remove when it is done right.
Oh, and "proper" VCS-ing does not mean committing intermediate changes for backups, unlike Geany's git a "proper" VCS is buildable at every commit and preferably even runnable, that way `git blame` will work. So saving backups is back in Geany's court.
IMO we have the right answer
We should be able to distinguish platforms, so can set win and mac, but there is still conflict between Linux desktops, and common ones like Gnome and Cinnamon not just obscure ones. But the UI can be merged and the default argument can continue (so long as _my_ default is used ;-).
When you look at it, it's about something different
Well the preference means the user never sees the popup, but anyway changing the popup default button S/B fine.
Dart seems to be number 17 language
Thats a good reason if we had someone who knew about the language, or has your gopher been stabbed by a dart? :wink:
What we should have is a language champion for each language supported by Geany. @b4n for C. @techee for Go. @elextr for C++. Who else? This is not for coding, but for advice on language issues, sensible mappings for Lexilla, LSP etc.
there's a kind of chicken and egg problem where we know C but don't know the language whose support is being added and the person wanting the support of the language he knows doesn't know C
:egg:zactly.
But thats a problem everywhere, for vscode they need typescript, for emacs they need lisp, for others they need C or C++. Thats where all the regexen lexers come from, they don't need code, just some arcane weird version of regex plus the proprietary extensions because pure regex isn't enough. But as you say, at least its only editing a text file not rebuilding the source. But then the speed issue exists. And parsers are not supportable purely in configuration, but maybe you could invent a JIT PEG compiler?
I don't see a reason why it shouldn't be supported by Geany.
Because __every__ user has to pay the price of all the languages they do not use.
Maybe you want to support some more "nice" languages https://en.wikipedia.org/wiki/List_of_programming_languages? :grin:
real user's problems. I haven't seen a single user asking: "Please, modify Geany so I have to download language support from all around github and compile it myself" ;-). I think this is a fundamentally wrong approach.
What are you talking about? Where did I say anything like that? You seem to have have fundamentally misunderstood the suggestion and projected your worst nightmare onto it.
It is the job of software engineers of a general tool to convert user requests into sensible implementations that make the application suitable for a range of use-cases. No user says "I want a Lexilla highlighter and a ctags parser and highlightingmappings and ..." thats our job, the user just says "I want Zag[^1]".
All I am suggesting is that the Lexilla highlighter and the ctags parser and the mappings be provided in a dll instead of built in like now. They won't be "from all around github" but from the Geany project.
lexilla configuration (that we currently hard-code) defined dynamically in configuration files
Totally agree, I just said "in the dll" so this change doesn't have to be done as well although it could be. It is something that should be done separately in any case. But no user will ask for it :smiling_imp:
But if it is done the config can also say `lexerparser=geanyzag.so` to load the lexer and parser. As I keep saying it is not good software design to compile everything in, making all users, even those on Raspberry Pi, pay for lexers and parsers that are not used by them.
[^1]: fictitious example language
As I keep saying it is not good software design to compile everything in, making all users, even those on Raspberry Pi, pay for lexers and parsers that are not used by them.
Honest question here: would they pay in binary size or in memory consumption? I know that Notepad++ statically builds the entirety of (Sci/Lex)illa into every release [since 2021][0] — yes, *every* lexer module is in there, whether end users know it or not. But the only degradation from the older "SciLexer.dll" was set down to an unoptimized build of [Boost.Regex][1].
[0]: https://github.com/notepad-plus-plus/notepad-plus-plus/commit/ab58c8ee3ed1b8... [1]: https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9975#issuecomm...
@rdipardo well, all of the above. But in truth there probably needs to be some benchmarking done, and don't forget I said ctags too. On this X86-64 system the current liblexilla is 56Mb, I wonder how big all lexers would be. But libctags is only 10Mb, I guess thats due to many languages skipping statements and expressions that Lexilla has to understand. Not sure what the mappings would be, and memory usage, at least some configuration isn't loaded until the filetype is needed, but all Scintilla mapping is hard coded ATM and ditto ctags.
And of course the UI is cluttered by all filetypes ATM, and there have been reports of the filetype menus being too big for small screens, which is unfixable since menus can't be scrolled onto the screen. It has been suggested to fold those menus alphabetically, see my link to wikipedia above to see that would be better.
Some of the paranoia was triggered by the request to include the uctags peg parser for TOML and kotlin, TOML compiles into 5000 lines of C but kotlin compiles into over 20000 lines, but I guess again we need benchmarks how big they are compiled.
Also note Notebook++ does not run on Raspberry Pis.
Ultimately I can't make up my mind if its really an issue or not because on this development workstation it definitely is not :-) but I can't tell about smaller systems. So since its not undoable once a language is added I tend towards minimising languages that don't have lots of contributors since we don't have the expertise or resources to support many languages (including current ones :-).
On this X86-64 system the current liblexilla is 56Mb
I commonly see 3.7 MB release and 11 MB debug for liblexilla.so with all lexers on X86-64. ~~~ Debug: -rwxrwxr-x 1 neil neil 15811640 Aug 24 21:37 libscintilla.so -rwxrwxr-x 1 neil neil 11288216 Aug 24 21:37 liblexilla.so
Release: -rwxrwxr-x 1 neil neil 2011048 Aug 24 21:41 libscintilla.so -rwxrwxr-x 1 neil neil 3737824 Aug 24 21:41 liblexilla.so ~~~
Hmmm, Geany doesn't build lexilla.so, only lexilla.a which was the size (48Mb) I quoted. But geany.so that has Geany and ctags and Scintilla and Lexilla is 47MB < 48MB so I guess the .a format has a lot of overhead and 4MB is closer to the real value. In which case thats really not much overhead, even on an RPi. So possibly we could simplify the importing of Lexilla by not limiting it. Thanks @nyamatongwe thats one less complication.
Now, its just uctags and mappings and the UI and the rest.
The effect of extra lexers on RAM use is often very small since most of the code just sits on disk and isn't paged in. It's mostly just the lexers that you actually use that are loaded into memory. There's some other aspects like static lexer data but that is quite small.
The Linux builds could probably be made smaller. On Windows, Link Time Code Generation is used to strongly deduplicate code so a release Lexilla.DLL is less than a megabyte.
Oh, and "proper" VCS-ing does not mean committing intermediate changes for backups, unlike Geany's git a "proper" VCS is buildable at every commit and preferably even runnable, that way git blame will work. So saving backups is back in Geany's court.
I didn't mean to use it for back up - I meant it as a possibility to get back to the previous version when the autosave feature saves something you don't want to have saved under some special occasions like the power cut you mentioned.
But the UI can be merged and the default argument can continue (so long as my default is used ;-).
That's the way to go :-)
Thats a good reason if we had someone who knew about the language, or has your gopher been stabbed by a dart? 😉
Not using a language ourselves doesn't mean we cannot support it. Nobody knows all the languages there are in Geany - but we have access to the documentation of the languages, we know C/C++ (somewhat in my case), we know Geany internals and if we want to develop a multi-language editor which Geany is, it's more up to us to add support for new languages than someone who has no idea about C and Geany.
Or look at it another way, let's say we want an editor that supports 95% of most used languages. Then by doing nothing this goal starts regressing as new languages are introduced and gain some traction. So I think from time to time is inevitable to add some new language if we want Geany to be an up-to-date multi-language editor.
Because every user has to pay the price of all the languages they do not use.
Hmm, what price is it? If it's about the binary size, the Dart lexer has about 24KB which I think doesn't matter. And to sum up the binary size discussions above, our lexilla has 1.3MB, full Scintilla's lexilla has 3.6MB so even including all language support would add just about 2.3MB (to our 4MB `libgeany.so` so the total size would be about 6.3MB). I'm all for minimal binary sizes and against bloated software but I think this is totally acceptable.
Some notes about the binaries to avoid confusion: @elextr the sizes you report are binaries with debugging symbols in them - to remove debugging symbols, run `strip my_binary.so/a/o`. To debug exceptions, templates and other stuff, C++ debugging symbols take up more space than C which is why you see the unstripped ctags binaries much smaller than for Scintilla. Second, as Neil said, code pages are loaded lazily only when needed so the code that isn't executed just sits on the disk. And the same applies also for the huge binaries with debugging symbols where the debugging symbols just sit on the disk until you start debugging the application.
All I am suggesting is that the Lexilla highlighter and the ctags parser and the mappings be provided in a dll instead of built in like now. They won't be "from all around github" but from the Geany project.
Then I miss the point totally. What's the point of splitting `libgeany.so` into tons of smaller `so`s? The only thing I can imagine that could potentially be slow is dynamic linking when the application starts but it matters only for libraries with huge API surfaces (and Scintilla's API pretty much consists of a single function to which you only pass different arguments).
But if it is done the config can also say lexerparser=geanyzag.so to load the lexer and parser. As I keep saying it is not good software design to compile everything in, making all users, even those on Raspberry Pi, pay for lexers and parsers that are not used by them.
Again, code pages are loaded lazily so not the whole binary is loaded in ram. But even the whole code had to be loaded, it's just 6.3MB with all the lexers. (And I really care about Raspberry Pi performance and test on it - Geany should run fast there but this is just marginal amount of code and doesn't really matter.)
Edit: to be clear binary size == memory to load it into, plus there is heap memory used as well as a filetype and lexer and parser is created, but really all that needs actual benchmarking.
To be clear, not ;-). See above.
And of course the UI is cluttered by all filetypes ATM, and there have been reports of the filetype menus being too big for small screens, which is unfixable since menus can't be scrolled onto the screen. It has been suggested to fold those menus alphabetically, see my link to wikipedia above to see that would be better.
I agree about the file type menu size problem (Geany seems to support too many "Programming languages" ;-) but this cannot be an argument for stopping support for new languages. Personally, I'd split the menu into alphabetic groups - I don't see any better option there.
Some of the paranoia was triggered by the request to include the uctags peg parser for TOML and kotlin, TOML compiles into 5000 lines of C but kotlin compiles into over 20000 lines, but I guess again we need benchmarks how big they are compiled.
For me the "paranoia" was not about "20000 lines of code" but about "unreviewable, autogenerated, impossible to debug, impossible to say what performance characteristics it has 20000 lines of code". Not about binary size at all on my side.
I didn't mean to use it for back up - I meant it as a possibility to get back to the previous version when the autosave feature saves something you don't want to have saved under some special occasions like the power cut you mentioned.
What I meant is git has nothing to do with it, almost every file is saved so the compiler can complain about the errors, or some other tool can complain, thats not something to commit to git ;-)
The thing the buffer (not file) save provides is letting the process of making the changes without either autosave writing an unknown partial changed state to the file or the whole lot being lost on crash (and remember its not just power failures, which are rare thankfully, there are regular opportunities for me to accuse users of "crashing" Geany by logging out or shutting down without closing it, a bad habit they got into with VS).
Anyway as I said I'm disappointed that a better solution is not gonna happen, but since I don't have time, and nobody else is willing to do it, then I'm neither for nor against PTF.
Not using a language ourselves doesn't mean we cannot support it. Nobody knows all the languages there are in Geany
You don't??? ;-)
Geany devs can create any filetype, but supporting it well is different. There have been some atrocious highlighting mappings when the person who made them didn't understand the language, similarly there have been some poor categorisation of symbols (usually to C categories, not all languages are C you know ;-), and then there are all the functions hidden away that decide things based on the filetype. Of course Geany devs can create a filetype for any language, but my point is we should have someone who knows the language to provide an input to it rather than treat them all as C.
Hmm, what price is it? If it's about the binary size, the Dart lexer has about 24KB which I think doesn't matter. And to sum up the binary size discussions above, our lexilla has 1.3MB, full Scintilla's lexilla has 3.6MB so even including all language support would add just about 2.3MB (to our 4MB libgeany.so so the total size would be about 6.3MB). I'm all for minimal binary sizes and against bloated software but I think this is totally acceptable.
Some notes about the binaries to avoid confusion: @elextr the sizes you report are binaries with debugging symbols in them - to remove debugging symbols, run strip my_binary.so/a/o. To debug exceptions, templates and other stuff, C++ debugging symbols take up more space than C which is why you see the unstripped ctags binaries much smaller than for Scintilla. Second, as Neil said, code pages are loaded lazily only when needed so the code that isn't executed just sits on the disk. And the same applies also for the huge binaries with debugging symbols where the debugging symbols just sit on the disk until you start debugging the application.
I already said "So possibly we could simplify the importing of Lexilla by not limiting it.", you are pushing on an open door (or maybe it says 'pull') :-)
And as I said, now I just need the actual evidence for "Now, its just uctags and mappings and the UI and the rest." and I will stop caring about how many filetypes we add.
Then I miss the point totally. What's the point of splitting libgeany.so into tons of smaller sos?
(Note to self, stop thinking onto the page before finishing idea, and to be clear this is nothing to do with size specifically)
My suggestion was that each filetype has a dll, sure the biggest part of them is the lexer and the parser, but also all the various functions hidden around Geany that depend on filetype can either be in config file, or in the dll.
That means everything for a filetype is in one code file plus lexer and parser and a config file, so things don't get missed and checking filetypes is easier. In fact more per filetype features could be supported without adding support overhead, as a simple example a per filetype "autocomplete trigger" function would mean autocomplete would no longer be limited to C/C++ '.' or '->'.
I agree about the file type menu size problem (Geany seems to support too many "Programming languages" ;-) but this cannot be an argument for stopping support for new languages. Personally, I'd split the menu into alphabetic groups - I don't see any better option there.
Agreed, and as I said above a quick scan of the Wikipedia alphabetical list of programming languages shows a better spread that way. And the top level doesn't have to be 26 one letter menus, it could be ranges "a-c" "d-i" etc so the range is longer for letters that have few if any Geany filetypes in them.
For me the "paranoia" was not about "20000 lines of code" but about "unreviewable, autogenerated, impossible to debug, impossible to say what performance characteristics it has 20000 lines of code". Not about binary size at all on my side.
But reviewing and debugging should be happening in uctags, not in the import to Geany? If its reviewed and benchmarked there (size and speed) then many client projects will benefit.
Geany devs can create any filetype, but supporting it well is different. There have been some atrocious highlighting mappings when the person who made them didn't understand the language, similarly there have been some poor categorisation of symbols (usually to C categories, not all languages are C you know ;-), and then there are all the functions hidden away that decide things based on the filetype. Of course Geany devs can create a filetype for any language, but my point is we should have someone who knows the language to provide an input to it rather than treat them all as C.
Yes, but things can be fixed over time. If we make some mistake, someone knowing the language will point it out or will create a pull request and it gets fixed in the next release - pretty typical open source development. So by the means of bug reports and pull requests, we actually have people knowing the language.
My suggestion was that each filetype has a dll, sure the biggest part of them is the lexer and the parser, but also all the various functions hidden around Geany that depend on filetype can either be in config file, or in the dll.
Well, we can organize our sources whichever way we want and it can still be single libgeany.so like now so it's not really an argument for many shared libraries. The only valid argument I see is if we want to do something like vscode where e.g. support for some language is loadable from some external source but I think it's an overkill.
But reviewing and debugging should be happening in uctags, not in the import to Geany? If its reviewed and benchmarked there (size and speed) then many client projects will benefit.
I tried it a few years back and it was just too slow to my taste (barely usable on Raspberry Pi with 400 LOC file). And uctags doesn't have the same "real time" performance requirements like Geany so the fact that it works for ctags doesn't necessarily mean it's suitable for Geany. Also, I can't imagine anyone is able to review the generated code - I'm most worried about about some pathologic case that you don't hit normally but only for some language construct where certain grammar rules cause much slower parsing - we can't try all the possible input to be sure there's nothing like that and we can't inspect the code because it's too big and unreadable. For normal parsers we can be pretty sure that parsing happens more or less in linear time but not here.
To be clear, I'm not saying we cannot merge it - it's just a completely different kind of parser than we have with some drawbacks and someone with a higher karma level than me (@b4n) should decide if we want such parsers or not.
Yes, but things can be fixed over time. If we make some mistake, someone knowing the language will point it out or will create a pull request and it gets fixed in the next release - pretty typical open source development. So by the means of bug reports and pull requests, we actually have people knowing the language.
Ok, so lets muddle along in our current way? I don't understand what you have against having nominated supporters of languages? Does the world fall down if we have someone to ping when an issue about filetype X is raised? Otherwise if its not in the mapping all we can do is tell the user to raise it upstream because we don't know if they have the language wrong or not. Improving support doesn't seem to me to be a negative, especially if more languages are going to get added as below. We might not get volunteers for all languages, but at least its possible support will improve for those that have backers.
support for some language is loadable from some external source but I think it's an overkill.
Yes, thats exactly what I mean. If the dll is only loaded when the filetype is first used (ie its not linked in) it doesn't slow startup. Its not loaded until a file of that type is opened. And then all those parts of Geany that have built-in decisions based on filetype can be exported to the dll instead of being forgotten to be updated when languages are added. And also all those assumptions based on "all languages are C" can instead be provided by the dlls. That is __why__ Vscode and almost all other IDE/editors do it that way.
For normal parsers we can be pretty sure that parsing happens more or less in linear time but not here.
It happens in that way because most parsers are able to skip statements and expressions and only parse declarations, and for most languages only global declarations are parsed. Only the recentish C/C++ parser does locals AFAIK. As was discussed on https://github.com/universal-ctags/ctags/discussions/4053 Kotlin needs to parse statements and expressions because it can have declarations in them. So its going to be slower because of the language, however the parser is implemented.
Lexilla is C++, and its been my experience that nearly all of a C++ codebase gets paged into the working set because of constructors being spread through the code and being called when global objects are being created, objects like lexers. I havn't specifically evaluated Lexilla, but I don't see why it would be any different in general. C parsers might be less paged as you argue, but it will depend on the order its loaded into libgeany.so.
But as I said, when actual numbers for size were provided for Lexilla, even if all the 4Mb is loaded, its pretty immaterial.
There needs to be numbers provided for the parsers, you claim the Kotlin one is "slow", but no numbers are forthcoming. Evidence is the way to persuade me, how big is the code memory for libctags, if its like Lexilla and its small (including Kotlin and TOML) I don't care, but need numbers. And how "slow" is the Kotlin one.
I'm most worried about about some pathologic case that you don't hit normally but only for some language construct where certain grammar rules cause much slower parsing
That makes you sound very contradictory, on one hand you are saying "lets have more languages" but on the other hand "not languages that have specific characteristics" because they might in theory be slow and you can't analyse the code.
Also you claim that additional parsers have little effect on the working set and add little overhead for those that don't use it. Therefore, if you don't use it, what do you care if its slow, or if there is some characteristic that you can't glean from the source.
If we are accepting all languages that are PRed as you propose, the Kotlinists are the ones that need to make the judgement, not you, or me, or @b4n. A user of the Kotlin parser has said it is acceptable in practice https://github.com/geany/geany/pull/3034#issuecomment-987693940 (oh, and that provides one of my numbers, 150k, thats not terrible, so I am now happy about it) so why are you worrying?
Either your claim that adding languages doesn't cost non-users is wrong or you should not care about theoretical costs that only Kotlin users pay. If the slowness becomes an issue then take the Geany approach, add an option :grin: to allow built-in parsers to be disabled so Kotlin users can still edit their code, at least until the Kotlin LSP is improved and parsing moves asynchronous.
Lexilla is C++, and its been my experience that nearly all of a C++ codebase gets paged into the working set because of constructors being spread through the code and being called when global objects are being created, objects like lexers.
I haven't examined the Linux shared library closely but on Windows I have checked the map and also used the debugger to see how memory is used.
Lexilla doesn't create lexers itself at load time, it creates `LexerModule` objects which are simple and similar to C structs. There are simple initializers which means that the `LexerModule`s won't be in a read-only segment. The initialization code for all the lexer modules should be collected into one initialization segment or similar (likely `.init_array` for Linux) so that relatively few pages are loaded at startup. Put break points on the 2 `LexerModule` constructors and check down-stack for the calling code.
Other module level data in lexers is mostly simpler and initialized without any C++ constructors called.
If an application asks for many lexers to be created at startup (perhaps to access metadata) then there will be more overhead but that is the application's choice.
Lexilla doesn't create lexers itself at load time
AFAICT from a quick look Geany only calls `CreateLexer()` when the filetype of a Scintilla editor widget is set, so only as needed. So if Lexilla has no globals (or static members, same thing for these purposes) with non-trivial constructors then little or no code will be paged in at startup to initialise them.
The initialization code for all the lexer modules should be collected into one initialization segment or similar
And as you say its all dependent on what the compiler/linker does, which may have changed since I looked in detail. To be clear my "experience" of this level of detail is only in one case where it was a problem, in all other cases it has never mattered so it hasn't been examined in detail.
But the point is to note that it is never safe to assume it works like C!!!! :grin:
I don't understand what you have against having nominated supporters of languages? Does the world fall down if we have someone to ping when an issue about filetype X is raised?
OMG, this is getting absurd. Don't you see the reality? Geany is a really niche editor which man-power-wise cannot compare to vscode or even vim. If you have a look at the activity, most Geany developers don't have time for the project. Would I like to have an expert for every language? Sure, bring them here! I'm all for it! I'm sure Neil would love to have an expert for every lexer too and Masatake in uctags as well.
But until you assemble a board of experts, I'll keep living in reality and try to improve language support in Geany despite my limited knowledge of all the languages (and in fact, learning a bit about new languages is one of the things that make developing an editor fun).
Yes, thats exactly what I mean. If the dll is only loaded when the filetype is first used (ie its not linked in) it doesn't slow startup.
It just doesn't happen - if I'm not mistaken, SciTE uses full lexilla and it starts really fast (@nyamatongwe am I right?). I'd suggest to spend our time on real problems and not inventing non-existing ones.
That is why Vscode and almost all other IDE/editors do it that way.
Yes, but these are giant plugins here that possibly contain whole compilers, language servers, etc. with 100MB sizes or more. They tend to do crazy things in the background and I'm really happy nothing like that is in Geany - you don't really want to have all these built in. But with Lexilla it's just a few "simple" lexers whose code won't get normally executed.
It happens in that way because most parsers are able to skip statements and expressions and only parse declarations, and for most languages only global declarations are parsed.
Skipping statements speeds up parsing but doesn't change the complexity of the algorithm - it's pretty much linear. The new C++ ctags parser that parses function bodies in a limited way is say twice as slow as the original one (in fact a bit faster) but the Kotlin parser is 100x slower which I assume is because of backtracking. With PEGs you have unlimited lookahead which can cause exponential complexity in the worst case because of backtracking and at least I am not able to say whether this "worst case" happens in the 20000 line code base and what effect it will have on performance.
There needs to be numbers provided for the parsers, you claim the Kotlin one is "slow", but no numbers are forthcoming.
You have numbers in https://github.com/geany/geany/pull/3034 in which you were involved so you should know the facts - the PEG parser is in the order of magnitude 100x slower than normal parsers.
Evidence is the way to persuade me, how big is the code memory for libctags
I'm not sure all ctags parsers should be made part of Geany - it's something you started suggesting. I don't think normal users will know how to do the mappings and there are quite a few specific settings that they probably won't get right - this is more about tag manager knowledge than language knowledge so we are better people for it. But ctags parsing is a bonus feature - the basic one is syntax highlighting and general filetype specification. When that is present, one can start using a LSP server if there's one and in such a case ctags isn't needed at all.
**In fact, I'm not even remotely planning anything like rewriting Geany configuration and including all the lexers.** I mentioned it as a reaction to this post of yours
What would be better than keeping adding built in stuff would be a @techee minor 100000000 or so line change that allows lexillas and ctags parsers and the language configuration to be in plugins so the cost of low traction languages isn't imposed on everybody (or dlls with different API if appropriate, eg IIUC Lexilla allows for that) ... now thats something for you to get your teeth into
where you claim something about "cost of low traction languages isn't imposed on everybody" which you failed to prove so far. If you don't believe me or Neil, _you_ (not me) should come with numbers and prove such a problem exists. Nobody has reported such a problem, I haven't seen such a problem neither on my development machine nor Raspberry PI and when you want someone to develop such a feature, you should come with convincing arguments for it.
That makes you sound very contradictory, on one hand you are saying "lets have more languages" but on the other hand "not languages that have specific characteristics" because they might in theory be slow and you can't analyse the code.
Please just read what I say. I'm not saying " languages that have specific characteristics" but "PARSERS that have specific characteristics" - it's not about Kotlin, it's about PEG parsers in general, see above.
Also you claim that additional parsers have little effect on the working set and add little overhead for those that don't use it. Therefore, if you don't use it, what do you care if its slow, or if there is some characteristic that you can't glean from the source.
Because even though I personally don't use Kotlin, it should be usable for other users even on slower hardware. If Geany blocks for 1s every time I write a letter in a 400 LOC file on a Raspberry Pi, it's just unacceptably slow for me at least. Others may have a different opinion and feel free to overvote me.
If we are accepting all languages that are PRed as you propose, the Kotlinists are the ones that need to make the judgement, not you, or me, or @b4n. A user of the Kotlin parser has said it is acceptable in practice https://github.com/geany/geany/pull/3034#issuecomment-987693940 (oh, and that provides one of my numbers, 150k, thats not terrible, so I am now happy about it) so why are you worrying?
I'm not saying we should accept just anything anyone ever proposes (@elextr please stop putting words into my mouth that I don't say, I really don't like that) - like every PR, language support PRs undergo a review and if they don't meet certain criteria, they aren't merged. For me, here, the performance criterium isn't met so I personally won't merge it and I feel obliged to point out the possible problems and not to be silent about them. If you, Colomban or Enrico have a different opinion, and want to merge it, I'm completely fine with it.
Either your claim that adding languages doesn't cost non-users is wrong or you should not care about theoretical costs that only Kotlin users pay.
Wow, what a garbage argument this is! How does it become either the first or the second? 1. Yes, I do claim that non-Kotlin users don't have to care at all 2. But, I do care if Kotlin is usable in Geany on low performance machines even though I personally don't use it - should we just say "f*** you" to users who use different languages than us? What an idiocy!
If the slowness becomes an issue then take the Geany approach, add an option 😁 to allow built-in parsers to be disabled so Kotlin users can still edit their code
Alright, **this** is actually the only useful input in the discussion (which I'm ending on my side because the above is not leading anywhere, is hugely off-topic and has nothing to do with what I plan to do). We already have such a settings - it's ```ini [settings] tag_parser= ``` and when left empty, it already disables the builtin parser. So in case the Kotlin parser gets merged (which I think really requires Colomban's approval), this could be left empty by defalut with a comment that the parser is slow but that it can be enabled by removing this line.
OMG, this is getting absurd. Don't you see the reality?
Statements like this are unneccessary and do not contribute to useful discussion. There should be no reason that suggestions for improvements cannot be made, be they to processes or code or the whole application. Until it is actually tried we don't know how "impossible" something is. For example in the past everybody was exorcising demons :imp: every time LSP was mentioned, but in the end it cost a few hundred lines in Geany because all the complexity was external, but nobody knew that until you actually implemented it. It is inappropriate to try to shut down suggestions for alternative program organisations or processes until someone implements it.
For the filetype support, all it takes is a list of contributors who are willing and able to comment on filetype issues so they can be pinged when appropriate, eg C - @b4n, C++ - @elextr, Go - @techee, Kotlin - @dolik-rce?, Python - @eht16, if nobody is willing to look at topics relating to a filetype then its users get whatever is made for them, or someone steps up to help. But at the moment there is no way of someone doing that.
It just doesn't happen - if I'm not mistaken, SciTE uses full lexilla and it starts really fast (@nyamatongwe am I right?). I'd suggest to spend our time on real problems and not inventing non-existing ones.
Possibly you have the misunderstood the reason behind the suggestion. It is nothing to do with Lexilla specifically, it is to do with combining all code for a filetype in one place (which may include Lexilla lexer and uctags parser) so it can be found and checked and modified easily. The point is to make it easy to add flexibility for language specific features that are currently coded into Geany and spread throughout the code. And it is easier to add features that should be language specific but nobody has done it because its too much bother finding all the places to code it into Geany. Geany itself should provide filetype independent fallbacks, not try to handle per filetype specifics spread throughout.
I have been calling it a DLL because it might need a different API, but this could be a standard Geany plugin, but the current plugin API does not really support plugins providing functionality back to Geany, thats why LSP has to have specific tests where it hooks in. But I suppose that mechanism could be extended to per filetype functions as well.
Yes this is a re-organisation of Geanys structure. Why is it impossible to suggest such a thing? If it is not mentioned it will not happen, ever.
[end gripe]
Now to specifics.
but the Kotlin parser is 100x slower which I assume is because of backtracking
The problem with PEG is the potential infinite lookahead as you said, and yes that can cause exponential performance, but the point of packrat is to use memoization to linearise performance. So your assumption should not be true. That does not mean that the packcc implementation is fast of course, it could be slow linear performance or the implementation of memoization could be simply wrong. I agree that its not likely that a human can assess that from the packcc output code.
the PEG parser is in the order of magnitude 100x slower than normal parsers.
None of those parsers are parsing the same code and giving the same result, its totally inappropriate to compare them like that. Also so what? The Kotlin user has stated that the performance is acceptable in practice even though you are unhappy at performance of a single example Kotlin code on a Rpi. Are you really saying that _nobody_ can have Kotlin symbols because it might not be fast enough on a slow machine with a specific example?
Please just read what I say. I'm not saying " languages that have specific characteristics" but "PARSERS that have specific characteristics" - it's not about Kotlin, it's about PEG parsers in general, see above.
As I pointed out above packrat PEG parsers do not in general have any characteristics that should make them unacceptable. And _I_ am saying it depends on the language, because if a language allows declarations anywhere, then it requires all of its code to be parsed to locate the declarations. For languages that allow declarations to be made inside statements and expressions they will naturally be slower to parse than languages where it is easy to decide declaration vs statement vs expression and skip parsing the latter two.
I'm not sure all ctags parsers should be made part of Geany - it's something you started suggesting.
I did not intend to suggest that, and I don't believe I suggested that, I only suggested all of Lexilla which, as you noted, has precedent in other applications. And I didn't say all lexed languages need to be a filetype, it could just be dead code (gasp!!! horror!!!). And in fact if a lexer is not referred to from Geany I suspect a smart enough linker will omit it anyway. And the same for ctags parsers if we ever moved to simply linking all of them. This is suggested to simplify the process of importing Lexilla and Uctags.
For the specific Kotlin parser, size was hidden in a previous comment and when I found it I pointed out it was small enough to stop me worrying, so its only the possible performance that is an issue.
Benchmarking a variety of Kotlin code and plotting speed vs size would be useful guidance since the packcc output is not amenable to analysis, that will show linearity or not, what the coefficient is, and if there is a constant or not.
But ctags parsing is a bonus feature
True, but if there is a parser available there should be no reason not to add it to a built-in filetype if "somebody" makes a PR.
I'm not saying we should accept just anything anyone ever proposes (@elextr please stop putting words into my mouth that I don't say, I really don't like that)
I apologise if I misunderstood, but that was what I thought you meant. But the point was the rest of the paragraph that the trade off of feature vs performance is something that users need to make. And my suggestion that making it possible for configuration to remove built-in parsers mitigates performance issues if users find them in real life meaning, we don't have to worry so much.
And I presume you apologise for the misrepresentations you have made due to your misunderstandings of my arguments in the next part of the post.
We already have such a setting
I didn't think `tagparser=` worked on built-in parsers because they were hard coded, (Edit: actually now you mention it, I have a vague memory that was added in the LSP merge), if it does then great, now if any language is too slow on a users platform (eg your Rpi) they can disable it.
That means there is even less argument that we need to police parser performance if it works acceptably for some users.
@techee SciTE uses full lexilla and it starts really fast (@nyamatongwe am I right?)
SciTE starts up fairly fast but is slowed down by loading and evaluating (and re-evaluating) a large number of poorly-partitioned settings. Loading Scintilla and Lexilla has never been a prominent performance problem and this dates back to the early days when computers and spinning disks were much slower.
This seems to have gone very quiet for a while so given that the discussion has weaved over many topics it seems worth summarising before its all forgotten.
1. Persistent temp files: to me would have been nice if it was subsumed by the "backup modified buffers" suggestion but as that is unlikely to happen "soon" no objections to it. 2. LSP is in Geany, only the plugin https://github.com/geany/geany-plugins/pull/1331 waiting input, should probably be merged. 3. Zoom, awaiting Scintilla upgrade, but no objections AFAICT 4. Multi cursor support, set keyboard shortcuts on Linux with my desktop Cinnamon compatible defaults ;-P, no other problems delaying it AFAIK. 5. Toml #3934 needs to be merged, further improvements can be subsequent 6. Consistent Notebook wheel tab change behaviour, needs #3469 merge, no other objections AFAIK 7. Change the default button on the search wrap dialog, no objections but nothing done yet, since the dialog can be avoided by selecting "always wrap" its not high importance IMO. 8. Dart, either merge #3968 which is a custom filetype, or wait for Scintilla update that includes Dart lexer and make built-in filetype 9. Zig, waiting for updated Scintilla
The questions about having more and more filetypes, overhead, menu, filetype organisation, and Geany organisation are woven through the issue posts.
10. Overhead, actual data changed my view: - for me Lexilla is no longer a problem (in fact it could even be possible to include all Lexilla) after the size number I was using was corrected by @nyamatongwe - for me Ctags size so far is no problem, @techee raised issues about size and speed of the PEG parsers for Toml and Kotlin but data indicates that the compiled size is not excessive IMO, @rdipardo has indicated that the Kotlin parser is ok for their usage, and I am less concerned about speed now the built-in parser can be disabled from the filetype file if a user finds it too slow 11. Menu: this is the only remaining problem with adding more filetypes AFAICT, everything goes in the "Programming" submenu which is already too large, seems general agreement that the menu should be arranged by name in alphabetic ranges. The ranges can be chosen to make each sub-menu about the same size. __Question should this change happen before more filetypes are added to avoid making the menu even bigger?__ 12. Filetype organisation: there appears to have been many misunderstandings about the proposal of filetype DLLs and discussion, it is not to be implemented as part of this, but the more filetypes are added the more is subject to any re-organisation and thats why it was raised here. Will raise a new issue. 13. Geany organisation: it was suggested that a method of recording contributors who know the filetype and who volunteer to advise and suggest for its support. That way when something is raised by filetype users, Geany devs have someone to ping for their expertise. I think this may have been overinterpreted, just a file such as `filetype_support.md` in the top level was all that was intended, but probably badly communicated. Will raise new issue.
@rdipardo has indicated that the Kotlin parser is ok for their usage ...
Not that I recall. This is probably a reference to @dolik-rce's pull request comment:
- https://github.com/geany/geany/pull/3034#issuecomment-987693940
- Menu: this is the only remaining problem with adding more filetypes AFAICT, everything goes in the "Programming" submenu which is already too large, seems general agreement that the menu should be arranged by name in alphabetic ranges. The ranges can be chosen to make each sub-menu about the same size. Question should this change happen before more filetypes are added to avoid making the menu even bigger?
I was thinking about how to best implement this. I remember there were objections against alphabetic naming of the groups as it might be hard to figure out quickly whether the searched language is within the letter range.
I think we might make all the groups configurable - we do it only for custom filetypes now ```ini [Groups] Programming=Arduino;Clojure;CUDA;Cython;Genie;Groovy;Kotlin;Scala;Swift; #Script=Graphviz;TypeScript;Meson; #Markup= #Misc=JSON; #None= ``` but we could extend it to all filetypes so the default config would look something like this: ```ini [Groups] group_name_1=A-F group_langs_1=Abc;Arduino;... group_name_2=G-L group_langs_2=Groovy;... ... ``` This would allow users create their own groups like ```ini [Groups] group_name_1=My Languages group_langs_1=Python;C;C++;Go;JSON ``` for quick access to the languages they use. There would just have to be some implicit "Other" group for cases when users redefine the groups in such a way that some languages aren't listed at all - such languages would be placed into this "Other" group so they are not gone forever.
Thoughts?
@techee yep, that sounds like the sort of thing.
A couple of comments:
1. since `filetypes_extensions.conf` is not versionable better to use a completely different `[Groups]` name, maybe `[Menu]` because in existing files `[Groups]` will not have `group_name_*` and `group_lang_*` in it. 2. how will it interact with users existing `filetype_extensions.conf` that uses the existing group names? 3. I would leave out the `group_` or any other prefix, its in that section and doesn't need it.
since filetypes_extensions.conf is not versionable better to use a completely different [Groups] name, maybe [Menu] because in existing files [Groups] will not have group_name_* and group_lang_* in it.
No problem with having `[Menu]` but the problem shouldn't exist as existing `filetypes_extensions.conf` under `.config` without these settings would just inherit them from the default `filetypes_extensions.conf`.
how will it interact with users existing filetype_extensions.conf that uses the existing group names?
Those will be ignored. Custom user-defined filetypes will appear in the "Other" fallback group unless assigned explicitly to some of the groups.
I would leave out the group_ or any other prefix, its in that section and doesn't need it.
Yeah, whatever, this was just a quick example.
Unless there is some reason the filetype is named different to the programming language (Fortran/F77 all I could find, but both are F) it should not be a problem I would have thought.
But when you have ``` A-G H-L M-Q R-Z ``` and now say "quick, in which group is Python?", I'll know it's somewhere in the second half but will have to start going through the letters of the alphabet to be sure where `P` is exactly.
and now say "quick, in which group is Python?", I'll know it's somewhere in the second half but will have to start going through the letters of the alphabet to be sure where P is exactly.
Oh, I see what you mean, maybe make the menu labels `A-B-C-E-F-G`, `H-I-J-K-L`, `M-N-O-P-Q`, and `R-S-T-U-V-W-X-Y-Z`. The last one is a bit long but can be split or can leave out the letters not used.
I did a quick prototype (+Dart & Zig, see I'm not against them ;-)
- A-B - Abaqus - ABC - Actionscript - Ada - Arduino - Asciidoc - Asm - Autoit - Batch - Bibtex - C - C - Caml - CIL - Clojure - Cmake - COBOL - Coffeescript - Conf - Cpp - CS - CUDA - Cython - D-E-F - D - Dart - Diff - Docbook - Dockerfile - Erlang - F77 - Fortran - Freebasic - G-H - Gdscript - Genia - GLSL - Go - Graphviz - Groovy - Haskell - Haxe - HTML - J-L - Java - Javascript - JSON - Julia - Kotlin - Latex - Lisp - Lua - M-N-O - Makefile - Markdown - Matlab - Meson - Nim - NSIS - Objectivec - P - Pascal - Perl - PHP - PO - Powershell - Prolog - Python - R-S - R - Raku - Restructuredtext - Ruby - Rust - Scala - Sh - Smalltalk - SQL - Swift - T-V-X-Y-Z - TCL - Txt2tags - Typescript - Vala - Verilog - VHDL - XML - Yaml - Zephir - Zig
There are no filetype names starting with `IQUW` so I left those out.
I did a quick prototype (+Dart & Zig, see I'm not against them ;-)
Yeah, this is definitely better. So maybe it's not even necessary to customize those and have them hard-coded. I would personally just include those `IQUW` so one does not have to remember to include them when a language starting with that letter is added.
I'll try to have a look at it and prepare something.
@elextr Done in #3977 - it turned out to be pretty straightforward. Let me know what you think.
github-comments@lists.geany.org