I did think of this a little. I think mostly other code related to Scintilla will work fine since Scintilla doesn't usually assume that a NUL character will terminate a string, it just treats it like any other valid UTF-8 character, except displays a special character for control characters[1]. See below for specifics.
Yes, I expect Scintilla to work, but I expect problems with Geany code as things stop at NULs. I would hope that nothing will crash, but I expect lots of truncation, which means that it doesn't work. And that leads to user complaints no matter how much you document that it doesn't work, thats why I suggested that some things might need disabling, eg by forcing the file readonly (and preventing enabling changes).
- editor.c and others including plugins use sci_add_text a number of
times, will any of that break? If so it needs to be disabled or fixed.
Using the old function will just lop off the end of a string with embedded NULs because it uses strlen(), but it will still add the truncated text fine.
Well truncated isn't right, so its not fine from a user standpoint.
- do searches work with files containing NULs?
Seem to work as expected.
Hmmm, wonder how, the regex engine takes a NUL terminated string. I would expect the Scintilla searches to work, but not regexes (well not past the first NUL anyway), so maybe they need disabling or need to be made to continue past the NUL. Since one of the main reasons put forward in the bugs you listed for opening files with NULs is lightly corrupted log files, being able to search them is important.
- How do selections work with embedded NULs, AFAICT all the selection
calls use NUL termination?
Selecting with the mouse/keyboard works the same. As far as selecting programatically, assuming the Scintilla wrappers are more or less directly wrapping the Scintilla messages, there shouldn't be a problem. I'll check these out shortly, but at glance, they look fine.
But what we select gets truncated when pasted.
- All the places sci_get_string, sci_get_line, sci_get_text,
sci_get_contents and any others that get/set text using NUL termination need to be disabled or fixed.
The sci_get_string() function properly finds out the size of the string using a Scintilla message and then ensuring a final NUL-terminator, so it should work fine. Any code calling these functions will either truncate the returned string at the first NUL or handle it completely. Same goes for the other sci_get_*() functions.
Yes, but what does the code that used the call do, it has to stop at the first NUL in the file since it doesn't have any other length info. Therefore it likely doesn't do all it should.
- All of the above goes for plugins too.
Aside from the issues identified with the API breakage, the worst case scenario should be truncation of the string I think.
Which again means that it is likely to not work right.
- Do lexers work with embedded NULs or does such a file have to be
filetype None, if so it needs to be enforced.
At least the CPP lexer works fine, others to be tested, though I don't think they rely too much (or at all) on NUL termination.
On further thought I guess if they work with Scite they will be ok.
Probably a good idea would be to improve the warning dialog that pops up when you open a file with embedded NULs that explains that some functions may not work properly.
Yes, I think that there would definitely need to be some warning. That means that you have to check the file for NUL, can this be coded in your encoding coding? :-D So the file doesn't need scanning again.
If we let the user edit the file, the main thing that must work is finding and deleting the NULs so the user can get the file back to being a text file and the search capabilities.
Cheers Lex