After the update to Scintilla 5.1.5, I started getting memory corruption crashes when saving some files. (Sorry for not noticing this before, I thought the Scintilla update didn't contain any changes on the Geany side.) I may be wrong but I think the problem is in
https://github.com/geany/geany/pull/3046/commits/d7c985e47412f8a53c5a8202a0f...
and misunderstanding if the following Scintilla API change
When calling SCI_GETTEXT, SCI_GETSELTEXT, and SCI_GETCURLINE with a NULL buffer argument to discover the length that should be allocated, do not include the terminating NUL in the returned value. The value returned is 1 less than previous versions of Scintilla. Applications should allocate a buffer 1 more than this to accommodate the NUL. The wParam (length) argument to SCI_GETTEXT and SCI_GETCURLINE also omits the NUL. This is more consistent with other APIs.
This part
When calling SCI_GETTEXT, SCI_GETSELTEXT, and SCI_GETCURLINE with a NULL buffer argument to discover the length that should be allocated, do not include the terminating NUL in the returned value...
...The wParam (length) argument to SCI_GETTEXT and SCI_GETCURLINE also omits the NUL.
says that that the returned __length__ doesn't include NUL, not that the returned string misses the NUL character. The rest of the change description says that we should allocate `+1` buffer to accommodate for that.
I believe we should revert the above patch and then go through all the uses of `SCI_GETTEXT`, `SCI_GETSELTEXT`, and `SCI_GETCURLINE` and corresponding `sci_` wrappers and update the code where needed (including plugins). I don't think it's a good idea to update the `sci_` wrappers alone to mirror previous behavior since the whole Scintilla API is part of Geany's API so this is a API change anyway and making `sci_` functions do something else than direct Scintilla calls is confusing.
Rant: I really don't understand why Neil made this change - even though it fixes some inconsistency in Scintilla API, this is a hard to spot backwards incompatible change and such things shouldn't be made in minor releases of a library.
@kugel- @eht16 @elextr What do you think?
For reference, I just created this issue against Scintilla asking about whether this API change is a good idea:
https://sourceforge.net/p/scintilla/bugs/2308/
Weird I never saw a problem, just lucky I guess, although I did say not heavily tested.
I did think when I saw the change in Scintilla "thats gonna be a problem".
I think we are better off to just go to 5.1.4 for now because 5.1.5 is gonna be an ABI break because existing functions will return different values from the current ones. All plugins which use any of these will now only work with the matching Geany so better to release a version with 5.1.4 and then later 5.1.5 with the ABI break.
@techee, you might also point out to Neil that although we can fix Geany we (and any other app with plugins) have no means of ensuring all plugins that access the Scintilla buffer have been corrected.
Perhaps if it causes crashes on Geany master then #3046 should be reverted for now, no reflection on @kugel-, but nobody is going to be able to use master much if it intermittently crashes on save (and maybe other things too).
Well I would rather see a band aid workaround than a revert, then its easier to consider and test real solutions. Or is the crash caused by a plugin?
master is a development branch, nothing that has any guarantees for stability. Unless the bug remains unfixed for a long time its IMO ok to work on fixing it without hysterically reverting changes. Anyone can checkout the commit before the crash is introduced locally and work from that.
Of course if the fix takes a long time or is really complicated a revert might be justified but I don't think we're there yet.
That said, I hope I can look into it this evening.
@techee, You may also want to leave a comment on the mailing list thread where the revised API was announced:
There have been some inconsistencies in the way that string returning APIs were handled by Scintilla, particularly SCI_GETTEXT, SCI_GETSELTEXT, and SCI_GETCURLINE. Most APIs return the length of the string excluding any NUL byte terminator but the mentioned APIs returned one more, including the NUL.
It is likely best to change these 3 APIs to be consistent with others. However, __*this could break applications that will no longer allocate enough memory to include the NUL terminator*__. Downstream projects should check their use of these APIs.
https://groups.google.com/g/scintilla-interest/c/DoRE5t2vihE (emphasis added)
Weird I never saw a problem, just lucky I guess, although I did say not heavily tested.
I saw it because I tested the GDScript filetype with the example from here
https://docs.godotengine.org/en/stable/getting_started/scripting/gdscript/gd...
and playing with it (adding, removing and saving the result). I first thought it was something in the GDScript parser or lexer but then I reverted back to master and saw it too. Note that I'm running Geany in a arm64 virtual machine on a macBook and memory management can differ from the Intel one.
It is likely best to change these 3 APIs to be consistent with others. However, this could break applications that will no longer allocate enough memory to include the NUL terminator. Downstream projects should check their use of these APIs.
@rdipardo Thanks for the pointer. This is why I'd first want to hear from Neil if he really considered it well or not. If it's meant to be this way, we'll have to deal with it.
master is a development branch, nothing that has any guarantees for stability.
Agree, the doomsday hasn't come yet :-).
I went through the plugins and these plugins use some of the affected Scintilla API changes (I didn't go in detail through them and maybe not all of them need to be updated): ``` SCI_GETTEXT: vimode, geanypg sci_get_text: geanygendoc sci_get_contents: pretty-printer, geniuspaste, geanypy
SCI_GETSELTEXT: geanypg sci_get_selected_text_length: pretty-printer, scope, addons, geanypy, geanyctags sci_get_selected_text: geanyctags ```
I'm now leaning towards actually really reverting the Scintilla update, waiting for Neil's feedback and if this change is going to stay in Scintilla, updating both Geany and g-p really carefully. Still hoping this was a not well considered change in Scintilla as I think it will affect most editors based on Scintilla, including their plugins.
@techee so which part of d7c985e4741 do you think is faulty, exactly?
I changed `sci_get_text()` to account for scintilla writing 1 extra NUL byte, therefore I now pass `len - 1` to the API. The only in-core caller already allocates `sci_get_length() + 1` bytes. Also be aware that `sci_get_text()` is deprecated.
I think I made a mistake in `sci_get_contents()` for the `buffer_len < 0` case. `sci_get_string()` won't do the right thing. But correcting that does not fix the crash. The only 2 in-core callers do pass `-1` for `buffer_len` so the other case doesn't matter for the crash.
I'm not yet clear what exactly is causing the crash. At least the recipe provided by @elextr also crashes for me regularly.
I found the problem, will open a PR in a few minutes. The problem is the explicit NUL termination that I added to `sci_get_text()` which is superfluous anyway (so the fix removes it). But my change to `sci_get_contents()` is flawed as well.
My opinion regarding the scintilla change itself and how we proceed: 1) I think the change makes generally sense and I appreciate making APIs similar to each other so. It's unfortunate that it's a breaking change but it was communicated well enough, at least I was aware and tried to make the necessary changes (and failed ;-) I wouldn't necessary say it's something that deserves a major version increment, I don't even know if the Scintilla project uses version numbers like that.
2) Assuming Neil leaves the change as is, I think we should just get along and make the fixes in g-p. On our side we should definitely increment the plugin API so that plugins can adapt to the behavior (or reject running on older Geany). Incrementing the ABI, on the other hand, does not make sense. It cannot fix the plugin code. All it does is giving the false idea that recompiling the affected plugin resolves the situation.
I found the problem, will open a PR in a few minutes. The problem is the explicit NUL termination that I added to sci_get_text() which is superfluous anyway (so the fix removes it).
Yep.
I think the change makes generally sense and I appreciate making APIs similar to each other so. It's unfortunate that it's a breaking change but it was communicated well enough, at least I was aware and tried to make the necessary changes (and failed ;-) I wouldn't necessary say it's something that deserves a major version increment, I don't even know if the Scintilla project uses version numbers like that.
To me this change isn't a good idea. Scintilla has been very stable API-wise. When you consider that we migrated from Scintilla 3 to Scintilla 5 which is something like a 4 year difference without any change like that needed and then there comes Scintilla 5.1.5 after Scintilla 5.1.4 one month apart that requires that all editors and their plugins have to be updated, it's quite a difference and IMO a major underestimate of what this change would cause. I think someone who updates Scintilla once a year will just miss this change. This isn't just Geany but all the other editors that use Scintilla and this is the kind of API change I'd expect from a 0.x release where API isn't stable but not from a component that has been here for years. Yes, the API isn't consistent but this is just part of historical heritage.
Assuming Neil leaves the change as is, I think we should just get along and make the fixes in g-p.
On our side we should definitely increment the plugin API so that plugins can adapt to the behavior (or reject running on older Geany).
Of course, then we have to deal with it, no question about that.
Closed #3095 via #3098.
Reopened #3095.
master is a development branch, nothing that has any guarantees for stability. Unless the bug remains unfixed for a long time its IMO ok to work on fixing it without hysterically reverting changes. Anyone can checkout the commit before the crash is introduced locally and work from that.
The problem with this is it is not just a problem with Geany, but plugins too, and those not in G-P could take an unknown amount of time to be fixed.
Geany is too scared of reverting, as I said its not a poor reflection on the person who merged the change, its just pragmatic.
On our side we should definitely increment the plugin API so that plugins can adapt to the behavior (or reject running on older Geany).
Of course, then we have to deal with it, no question about that.
The problem is that incrementing the API does nothing to stop plugins that have not been modified from running. Geany only tests its API is greater than that the plugin asks for.
Incrementing the ABI, on the other hand, does not make sense. It cannot fix the plugin code. All it does is giving the false idea that recompiling the affected plugin resolves the situation.
Yes, agreed, but at least it stops plugins that have not been recompiled from running and possibly crashing.
Like I said above currently we can't stop an unfixed plugin from being run, so if Geany is going to start making changes to the plugin interface with non-backward compatible crashing changes we need to introduce a test to reject plugins that say they need an API _less than_ the new number. At least that will require the plugin author to manually change something, and might prompt them to look why its needed.
Have reopened since #3098 (thanks @kugel- for the quick turnaround) does not address the plugin issue.
The problem is that incrementing the API does nothing to stop plugins that have not been modified from running. Geany only tests its API is greater than that the plugin asks for.
That's fine. The majority of plugins is not affected. What the API increment achieves is that the plugin can adapt itself depending on the Geany version.
Yes, agreed, but at least it stops plugins that have not been recompiled from running and possibly crashing.
We communicate ABI changes such that "you only need to recompile the plugins", without involving the author. That's not the case and a false signal.
Like I said above currently we can't stop an unfixed plugin from being run, so if Geany is going to start making changes to the plugin interface with non-backward compatible crashing changes we need to introduce a test to reject plugins that say they need an API less than the new number. At least that will require the plugin author to manually change something, might prompt them to look why its needed, and will protect users from unchanged plugins crashing.
Poor users that lose all their plugins because the authors can't be bothered to bump their target API version. Because that would be necessary, even for plugins that are not affected. The only possible solution would be to check per API (e.g. in `sci_get_text()` if the calling plugin has at least API XY, and `abort()` if not) but most APIs don't even know if they're called from the core or a plugin. Outright rejecting all "old" plugins is too overarching as it hits plugins that don't use the affected APIs.
Most people will get Geany 1.39/2.0 and along with that G-P 1.39/2.0 (which is going to be fixed), so very few people will even notice. Please, don't make the problem of a few plugins to a problem for each and everyone.
Poor users that lose all their plugins because the authors can't be bothered to bump their target API version.
Don't be stupid, having plugin maintainer delay being a user crash is not a professional approach. Plugin updates may be delayed for any number of reasons.
Outright rejecting all "old" plugins is too overarching as it hits plugins that don't use the affected APIs.
Agree its not ideal, and agree that having Geany functions checking is impractical. But the alternative is that we set users up for loss of work when Geany crashes.
Please, don't make the problem of a few plugins to a problem for each and everyone.
Knowingly releasing a version of Geany that will cause plugin crashes for some unmodified plugins people is not professional. Not all plugins are part of G-P, including your ones.
It is unacceptable to crash users because a busy plugin maintainer did not notice they needed to change something. Its not the plugin maintainers problem, you set them up to fail!!!
Well, it's Neils change, not mine. And he already indicated that he's not going to revert months after it was made. Dont shoot at me.
You seem to suggest to ever stick with scintilla 5.1.4. That's not acceptable either.
We have to do *something* with the incompatible change that was forced upon us. Best we can do is getting G-P fixed and hope for the best.
Also we're not professionals, we're all volunteers. Spare time is also a relevant factor for this mess.
Now that I have a coffee and calmed down :smile:
You are correct its an annoying situation that is forced on us unfortunately.
The point I am making is that the current plugin interface mechanism does not handle backward incompatible changes like this. So we need to consider how to handle it going forward. We really do need to be able to make breaking changes, being tied in the current straight jacket is too limiting.
Maybe functions that are not backward compatible have to have a name change (see sqlite xxx1 xxx2 as an example of this method).
Geany developers may be volunteers, not programming professionals, but that does not say they should behave unprofessionally.
Hang on, why do the scintilla wrapper functions need to be incompatible? There should be no harm in them returning what they did before even if scintilla changed.
AFAICT plugins really should not be jumping into scintilla directly, is it really exposed in the API?
Hang on, why do the scintilla wrapper functions need to be incompatible? There should be no harm in them returning what they did before even if scintilla changed.
I originally thought it would be confusing if direct Scintilla call were different from the wrapper functions. But now after looking at the plugins I think it would really be best to keep the wrappers working as before. @kugel- did it in https://github.com/geany/geany/pull/3098 for `sci_get_text()` and `sci_get_contents()` but `sci_get_selected_text_length()` wasn't updated this way and I think it should be done too.
AFAICT plugins really should not be jumping into scintilla directly, is it really exposed in the API?
It is, wrappers cover only the most essential functionality but when you need something more special, you have to use the Scintilla API directly. I use it in the vimode plugin this way and in fact only use Scnitilla because mixing wrappers and scintilla looked strange in the code.
I wouldn't do anything crazy personally to mitigate the current problem, just bumping API. The remaining non-wrapper calls in g-p are just 2 plugins ``` SCI_GETTEXT: vimode, geanypg SCI_GETSELTEXT: geanypg ``` and in general plugins really use wrappers.
I just created #3099 in Geany that updates `sci_get_selected_text_length()` to return the same value like in Scintilla 5.1.4 and earlier and also created https://github.com/geany/geany-plugins/pull/1154. I believe this should be all that's needed to fix the problem (in code known to us).
Luckily `sci_get_text()` and `sci_get_contents()` already required to pass buffers large enough for the NUL byte and they don't use the return value of the `SCI_GETTEXT` api so they didn't actually change. The crash is just a bad bug introduced by me.
AFAICT plugins really should not be jumping into scintilla directly, is it really exposed in the API?
It is, wrappers cover only the most essential functionality but when you need something more special, you have to use the Scintilla API directly. I use it in the vimode plugin this way and in fact only use Scnitilla because mixing wrappers and scintilla looked strange in the code.
Maybe the wrappers need expanding, I know Geany mixes it too, but we have (possibly re-)learned a lesson, the wrappers are more than convenience, they allow hiding changes to Scintilla APIs. I don't think we should rush in and change Geany or vimode, but progressively over time move that way and put a note in HACKING and plugin API docs to that effect.
Geany has 182 uses of SSM outside `sci_wrappers.c` 91 in editor.c, 76 in highlighting.c, and the other 15 scattered about. I bet a few strategic wrappers would get rid of the majority leaving only specialised functions accessing Scintilla directly.
The crash is just a bad bug introduced by me.
Oh well, it happens.
And it has been useful in exposing a fragility of our API mechanism.
Specifically that we can't handle changes to the behaviour of existing functions, and need to introduce new names if the behaviour changes (as sci_wrappers did changing `sci_get_selected_text()` to `sci_get_selection_contents()`). And if the old function cannot be changed to remain backward compatible then it should be removed, then the recompile with an ABI change will also fail.
Hum, has this been resolved? I admittedly didn't read everything, but we definitely at the very least need to bump ABI (even if indeed it's not a perfect solution).
IMO, @elextr's idea of rejecting older APIs altogether is not a bad one. It's a bit annoying on plugin authors, but I totally agree with @elextr that it's **entirely unacceptable** that we might happily load plugins that are buggy not because they are not written correctly, but because they have not been updated to be compatible. I think we ought to find a fix, even if it's inconvenient. (and yes, I know bumping ABI isn't actually enough to be safe). Also, really concerned authors could likely even `#ifdef` their way out of it for supporting older APIs, if they really want to go to the trouble.
Actually, one (very annoying) fix could be overriding `scintilla_send_message()` to patch up the calls so it works as it used to. It's not as efficient, and it's a pain to maintain, but at least it would fix it. But then, we could just as well "unpatch" Scintilla itself and live forever with that (and bear in mind that it makes the Scintilla documentation slightly off regarding our version).
A probably better option is to scratch our heads a couple times as to why Neil introduced such a dramatic change in a point release of an otherwise hugely stable component, and then come to terms with it. Then, maybe we cant take the opportunity and decide on the N+1 version number to be 2.0 indeed, make a big bold announcement, and require *all* plugins to be *manually* patched, even if that means annoying authors. It's not actually an uncommon practice that to require authors to declare the range of versions their plugin is supporting, even if it means authors have to get ahead of new versions.
Maybe the wrappers need expanding, I know Geany mixes it too, but we have (possibly re-)learned a lesson, the wrappers are more than convenience, they allow hiding changes to Scintilla APIs. I don't think we should rush in and change Geany or vimode, but progressively over time move that way and put a note in HACKING and plugin API docs to that effect.
Well, for one thing, it won't really scale (lots of APIs, new APIs, etc.). Secondly, we'd just move the burden of backward compatibility on Geany, whereas we until then have had no trouble trusting Scintilla for this. And at some point if either Geany or Scintilla evolve into *requiring* some API-facing changes, we'll have the same problem again, *and* a million wrappers to update.
A possible precaution could be reviewing all of *Scintilla*'s API and suggest Neil changing the "bad" ones **now** so we don't know we're carrying more bad APIs that somebody will want to change in the future :slightly_smiling_face:
What about plugins that work fine (because they don't use affected APIs) but won't be updated in months (or ever) because the authors have run away?
Somebody who cares update them, or they don't work Or you find a way to lookup whether the plugin's SO is *not* calling `scintilla_send_message()`, in which case it's (probably?) safe.
My understanding is that the Geany API has been made safe (I'm not sure if sometimes it might result in an additional byte as well as the nul, but that is safe). Not sure about G-P.
I like the idea of making a big message about updating plugins, and we can do an audit of G-P fairly simply, so broken plugins can be excluded or fixed by interested parties, ie you want your plugin back fix it or persuade somebody to do it for you.
The change to Scintilla should have been the other way to be safe, even if unchanged code might get an extra byte used (quelle horreur), and its unfortunate that Neil didn't take that route. Sadly I think the chance of reverting Scintillas unsafe changes has flown, other users will have adapted.
There are only a few Scintilla messages that are a problem, maybe `ScintillaGTK.cxx` scintilla_send_message() can be patched to check for the bad message numbers and silently increment the count safely, we already patch that file to add GEANY_API_SYMBOL to scintilla_send_message(), a slightly bigger patch won't hurt. And remember bad plugins crash Geany, a few bytes don't matter.
Well, for one thing, it won't really scale (lots of APIs, new APIs, etc.).
Yes, but I bet there is a lot of duplication, so _most_ of the direct SSM calls can be removed with only a few extra wrappers. That reduces the surface area for problems. Anyway I prevaricated sufficiently by saying "over time" etc etc to allow flexibility.
Hum, has this been resolved? I admittedly didn't read everything, but we definitely at the very least need to bump ABI (even if indeed it's not a perfect solution).
Agree we should bump ABI, it's better than nothing.
IMO, @elextr's idea of rejecting older APIs altogether is not a bad one. It's a bit annoying on plugin authors, but I totally agree with @elextr that it's entirely unacceptable that we might happily load plugins that are buggy not because they are not written correctly, but because they have not been updated to be compatible. I think we ought to find a fix, even if it's inconvenient. (and yes, I know bumping ABI isn't actually enough to be safe).
This is something I would suggest doing if the problem affected many plugins. But the only plugin affected by this change (after we made scintilla wrappers behave the same way as before) in geany-plugins can be fixed by this simple patch:
https://github.com/geany/geany-plugins/pull/1154
Of course, there may be other plugins in the wild we don't know about but I don't think many of them use this particular API or they use the `sci_` wrappers.
So IMO, while not perfect - making the `sci_` wrappers behave as before (done) - applying https://github.com/geany/geany-plugins/pull/1154 - bumping ABI
should be sufficient and I don't think we'll be bombed by bug reports about crashing plugins because of this problem.
I think we have all things fixed here - I just merged https://github.com/geany/geany-plugins/pull/1154 and the ABI bump was merged in https://github.com/geany/geany/pull/3541.
So I'm closing this issue.
Closed #3095 as completed.
github-comments@lists.geany.org