IMO, it's far superior to the current way where it opens the file but chops off everything after the first NUL character. At least with the changes, most things will work and the whole text file can be loaded. I think this will cause less bug reports than there are currently on this NUL truncation bug.
I agree its better to load the file, but I don't know that it will get less bug reports if I know users (being one) :-)
My main concern with all these comments is to get a situation where there are as few surprises as possible, surprises are what confuses and upsets users.
- editor.c and others including plugins use sci_add_text a number of
times, will any of that break? If so it needs to be disabled or fixed.
Using the old function will just lop off the end of a string with embedded NULs because it uses strlen(), but it will still add the truncated text fine.
Well truncated isn't right, so its not fine from a user standpoint.
Agreed that truncation isn't right, but it's how Geany is currently handling everything with NUL characters, my changes only improve this situation to some degree.
And I might add that even though we see it as an improvement and are thankful, users just take it as a given and go on to the next issue. Sigh...
- do searches work with files containing NULs?
Seem to work as expected.
Hmmm, wonder how, the regex engine takes a NUL terminated string. I would expect the Scintilla searches to work, but not regexes (well not past the first NUL anyway), so maybe they need disabling or need to be made to continue past the NUL. Since one of the main reasons put forward in the bugs you listed for opening files with NULs is lightly corrupted log files, being able to search them is important.
Just tested searching with a regex, it works up until the first NUL character. I'll have to see if it's something in the regex implementation or an assumption in Geany that NULs don't exist. The regex search in SciTE works fine in this case.
Scite works because it uses Scintilla's built in but limited regex engine, we use the superior system or GNU regex engine, but it only takes a NUL terminated string to search. I don't think this one is fixable, lets put it on the list of things that don't work to be documented. Search is limited to text only.
- How do selections work with embedded NULs, AFAICT all the selection
calls use NUL termination?
Selecting with the mouse/keyboard works the same. As far as selecting programatically, assuming the Scintilla wrappers are more or less directly wrapping the Scintilla messages, there shouldn't be a problem. I'll check these out shortly, but at glance, they look fine.
But what we select gets truncated when pasted.
Just tested this, and when pasting text selected/copied with embedded NUL, it does truncate at the first NUL. Again, I'll need to see if this is something can be fixed from the Geany side. SciTE behaves the same way currently. In theory, we know where the selection starts and ends, so it should be possible copy the text and know the length.
No, cut/paste happens in Scintilla we can't do anything about it, but my guess would be that it uses the GTK clipboard, and that only takes NUL terminated strings. Again I think, document and leave.
- All the places sci_get_string, sci_get_line, sci_get_text,
sci_get_contents and any others that get/set text using NUL termination need to be disabled or fixed.
The sci_get_string() function properly finds out the size of the string using a Scintilla message and then ensuring a final NUL-terminator, so it should work fine. Any code calling these functions will either truncate the returned string at the first NUL or handle it completely. Same goes for the other sci_get_*() functions.
Yes, but what does the code that used the call do, it has to stop at the first NUL in the file since it doesn't have any other length info. Therefore it likely doesn't do all it should.
Agreed, but still an improvement over the current handling. Functions like sci_get_contents() and sci_get_text() require a length parameter, so the calling code does have length info already.
So long as truncation doesn't do something destructive.
- All of the above goes for plugins too.
Aside from the issues identified with the API breakage, the worst case scenario should be truncation of the string I think.
Which again means that it is likely to not work right.
In the same way as Geany, probably yes.
So long as they don't do something unexpectedly destructive or crash. You've identified the VCS plugin already.
- Do lexers work with embedded NULs or does such a file have to be
filetype None, if so it needs to be enforced.
At least the CPP lexer works fine, others to be tested, though I don't think they rely too much (or at all) on NUL termination.
On further thought I guess if they work with Scite they will be ok.
SciTE actually handles text files with NULs quite nicely, it doesn't even mark them as read-only or anything and it saves them no problems.
Well the main reason for the read-only is the truncation, so you don't write a short file over the long one. Now you've fixed that, but I still think its worth leaving it read-only.
Probably a good idea would be to improve the warning dialog that pops up when you open a file with embedded NULs that explains that some functions may not work properly.
Yes, I think that there would definitely need to be some warning. That means that you have to check the file for NUL, can this be coded in your encoding coding? :-D So the file doesn't need scanning again.
It's done and working[1] in a similar fashion as before (sans the truncation), where it compares the full data size with strlen() of the data and if they're different, passes read-only back as true, which causes the warning dialog to show and the doc to be made read-only.
Ok.
If we let the user edit the file, the main thing that must work is finding and deleting the NULs so the user can get the file back to being a text file and the search capabilities.
Already works fine, you just turn read-only off, select the NUL character and delete it. One note about "get the file back to being a text file" is that a file with embedded NULs is actually a perfectly valid text file.
Agree, I meant C's definition of "text file".
The
problem in Geany, is that it assumes that a text file is the same thing as a C string. Any file that contains all valid characters in it's encoding (of which NUL is one of), is a valid text file.
Agree.
Cheers Lex