[Geany-devel] [CODE REVIEW] Changes to encodings
elextr at xxxxx
Tue May 17 01:05:32 UTC 2011
> IMO, it's far superior to the current way where it opens the file but chops
> off everything after the first NUL character. At least with the changes,
> most things will work and the whole text file can be loaded. I think this
> will cause less bug reports than there are currently on this NUL truncation
I agree its better to load the file, but I don't know that it will get
less bug reports if I know users (being one) :-)
My main concern with all these comments is to get a situation where
there are as few surprises as possible, surprises are what confuses
and upsets users.
>>>> 1. editor.c and others including plugins use sci_add_text a number of
>>>> times, will any of that break? If so it needs to be disabled or fixed.
>>> Using the old function will just lop off the end of a string with
>>> NULs because it uses strlen(), but it will still add the truncated text
>> Well truncated isn't right, so its not fine from a user standpoint.
> Agreed that truncation isn't right, but it's how Geany is currently handling
> everything with NUL characters, my changes only improve this situation to
> some degree.
And I might add that even though we see it as an improvement and are
thankful, users just take it as a given and go on to the next issue.
>>>> 2. do searches work with files containing NULs?
>>> Seem to work as expected.
>> Hmmm, wonder how, the regex engine takes a NUL terminated string. I
>> would expect the Scintilla searches to work, but not regexes (well not
>> past the first NUL anyway), so maybe they need disabling or need to be
>> made to continue past the NUL. Since one of the main reasons put
>> forward in the bugs you listed for opening files with NULs is lightly
>> corrupted log files, being able to search them is important.
> Just tested searching with a regex, it works up until the first NUL
> character. I'll have to see if it's something in the regex implementation
> or an assumption in Geany that NULs don't exist. The regex search in SciTE
> works fine in this case.
Scite works because it uses Scintilla's built in but limited regex
engine, we use the superior system or GNU regex engine, but it only
takes a NUL terminated string to search.
I don't think this one is fixable, lets put it on the list of things
that don't work to be documented. Search is limited to text only.
>>>> 3. How do selections work with embedded NULs, AFAICT all the selection
>>>> calls use NUL termination?
>>> Selecting with the mouse/keyboard works the same. As far as selecting
>>> programatically, assuming the Scintilla wrappers are more or less
>>> wrapping the Scintilla messages, there shouldn't be a problem. I'll
>>> these out shortly, but at glance, they look fine.
>> But what we select gets truncated when pasted.
> Just tested this, and when pasting text selected/copied with embedded NUL,
> it does truncate at the first NUL. Again, I'll need to see if this is
> something can be fixed from the Geany side. SciTE behaves the same way
> currently. In theory, we know where the selection starts and ends, so it
> should be possible copy the text and know the length.
No, cut/paste happens in Scintilla we can't do anything about it, but
my guess would be that it uses the GTK clipboard, and that only takes
NUL terminated strings.
Again I think, document and leave.
>>>> 4. All the places sci_get_string, sci_get_line, sci_get_text,
>>>> sci_get_contents and any others that get/set text using NUL
>>>> termination need to be disabled or fixed.
>>> The sci_get_string() function properly finds out the size of the string
>>> using a Scintilla message and then ensuring a final NUL-terminator, so it
>>> should work fine. Any code calling these functions will either truncate
>>> returned string at the first NUL or handle it completely. Same goes for
>>> other sci_get_*() functions.
>> Yes, but what does the code that used the call do, it has to stop at
>> the first NUL in the file since it doesn't have any other length info.
>> Therefore it likely doesn't do all it should.
> Agreed, but still an improvement over the current handling. Functions like
> sci_get_contents() and sci_get_text() require a length parameter, so the
> calling code does have length info already.
So long as truncation doesn't do something destructive.
>>>> 5. All of the above goes for plugins too.
>>> Aside from the issues identified with the API breakage, the worst case
>>> scenario should be truncation of the string I think.
>> Which again means that it is likely to not work right.
> In the same way as Geany, probably yes.
So long as they don't do something unexpectedly destructive or crash.
You've identified the VCS plugin already.
>>>> 6. Do lexers work with embedded NULs or does such a file have to be
>>>> filetype None, if so it needs to be enforced.
>>> At least the CPP lexer works fine, others to be tested, though I don't
>>> they rely too much (or at all) on NUL termination.
>> On further thought I guess if they work with Scite they will be ok.
> SciTE actually handles text files with NULs quite nicely, it doesn't even
> mark them as read-only or anything and it saves them no problems.
Well the main reason for the read-only is the truncation, so you don't
write a short file over the long one. Now you've fixed that, but I
still think its worth leaving it read-only.
>>> Probably a good idea would be to improve the warning dialog that pops up
>>> when you open a file with embedded NULs that explains that some functions
>>> may not work properly.
>> Yes, I think that there would definitely need to be some warning.
>> That means that you have to check the file for NUL, can this be coded
>> in your encoding coding? :-D So the file doesn't need scanning again.
> It's done and working in a similar fashion as before (sans the
> truncation), where it compares the full data size with strlen() of the data
> and if they're different, passes read-only back as true, which causes the
> warning dialog to show and the doc to be made read-only.
>> If we let the user edit the file, the main thing that must work is
>> finding and deleting the NULs so the user can get the file back to
>> being a text file and the search capabilities.
> Already works fine, you just turn read-only off, select the NUL character
> and delete it. One note about "get the file back to being a text file" is
> that a file with embedded NULs is actually a perfectly valid text file.
Agree, I meant C's definition of "text file".
> problem in Geany, is that it assumes that a text file is the same thing as a
> C string. Any file that contains all valid characters in it's encoding (of
> which NUL is one of), is a valid text file.
More information about the Devel