[Geany-devel] [CODE REVIEW] Changes to encodings

Lex Trotman elextr at xxxxx
Tue May 17 01:05:32 UTC 2011


>
> IMO, it's far superior to the current way where it opens the file but chops
> off everything after the first NUL character.  At least with the changes,
> most things will work and the whole text file can be loaded.  I think this
> will cause less bug reports than there are currently on this NUL truncation
> bug.
>

I agree its better to load the file, but I don't know that it will get
less bug reports if I know users (being one) :-)

My main concern with all these comments is to get a situation where
there are as few surprises as possible, surprises are what confuses
and upsets users.

>>
>>>>
>>>> 1. editor.c and others including plugins use sci_add_text a number of
>>>> times, will any of that break? If so it needs to be disabled or fixed.
>>>
>>> Using the old function will just lop off the end of a string with
>>> embedded
>>> NULs because it uses strlen(), but it will still add the truncated text
>>> fine.
>>
>> Well truncated isn't right, so its not fine from a user standpoint.
>
> Agreed that truncation isn't right, but it's how Geany is currently handling
> everything with NUL characters, my changes only improve this situation to
> some degree.
>

And I might add that even though we see it as an improvement and are
thankful, users just take it as a given and go on to the next issue.
Sigh...

>>
>>>
>>>> 2. do searches work with files containing NULs?
>>>
>>> Seem to work as expected.
>>
>> Hmmm, wonder how, the regex engine takes a NUL terminated string.  I
>> would expect the Scintilla searches to work, but not regexes (well not
>> past the first NUL anyway), so maybe they need disabling or need to be
>> made to continue past the NUL.  Since one of the main reasons put
>> forward in the bugs you listed for opening files with NULs is lightly
>> corrupted log files, being able to search them is important.
>
> Just tested searching with a regex, it works up until the first NUL
> character.  I'll have to see if it's something in the regex implementation
> or an assumption in Geany that NULs don't exist.  The regex search in SciTE
> works fine in this case.
>

Scite works because it uses Scintilla's built in but limited regex
engine, we use the superior system or GNU regex engine, but it only
takes a NUL terminated string to search.
I don't think this one is fixable, lets put it on the list of things
that don't work to be documented.  Search is limited to text only.

>>
>>>
>>>> 3. How do selections work with embedded NULs, AFAICT all the selection
>>>> calls use NUL termination?
>>>
>>> Selecting with the mouse/keyboard works the same.  As far as selecting
>>> programatically, assuming the Scintilla wrappers are more or less
>>> directly
>>> wrapping the Scintilla messages, there shouldn't be a problem.  I'll
>>> check
>>> these out shortly, but at glance, they look fine.
>>
>> But what we select gets truncated when pasted.
>
> Just tested this, and when pasting text selected/copied with embedded NUL,
> it does truncate at the first NUL.  Again, I'll need to see if this is
> something can be fixed from the Geany side.  SciTE behaves the same way
> currently.  In theory, we know where the selection starts and ends, so it
> should be possible copy the text and know the length.
>

No, cut/paste happens in Scintilla we can't do anything about it, but
my guess would be that it uses the GTK clipboard, and that only takes
NUL terminated strings.
Again I think, document and leave.

>>
>>>
>>>> 4. All the places sci_get_string, sci_get_line, sci_get_text,
>>>> sci_get_contents and any others that get/set text using NUL
>>>> termination need to be disabled or fixed.
>>>
>>> The sci_get_string() function properly finds out the size of the string
>>> using a Scintilla message and then ensuring a final NUL-terminator, so it
>>> should work fine.  Any code calling these functions will either truncate
>>> the
>>> returned string at the first NUL or handle it completely. Same goes for
>>> the
>>> other sci_get_*() functions.
>>
>> Yes, but what does the code that used the call do, it has to stop at
>> the first NUL in the file since it doesn't have any other length info.
>>  Therefore it likely doesn't do all it should.
>
> Agreed, but still an improvement over the current handling.  Functions like
> sci_get_contents() and sci_get_text() require a length parameter, so the
> calling code does have length info already.
>

So long as truncation doesn't do something destructive.

>>
>>>
>>>> 5. All of the above goes for plugins too.
>>>
>>> Aside from the issues identified with the API breakage, the worst case
>>> scenario should be truncation of the string I think.
>>
>> Which again means that it is likely to not work right.
>
> In the same way as Geany, probably yes.

So long as they don't do something unexpectedly destructive or crash.
You've identified the VCS plugin already.

>
>>
>>>
>>>> 6. Do lexers work with embedded NULs or does such a file have to be
>>>> filetype None, if so it needs to be enforced.
>>>
>>> At least the CPP lexer works fine, others to be tested, though I don't
>>> think
>>> they rely too much (or at all) on NUL termination.
>>
>> On further thought I guess if they work with Scite they will be ok.
>
> SciTE actually handles text files with NULs quite nicely, it doesn't even
> mark them as read-only or anything and it saves them no problems.
>

Well the main reason for the read-only is the truncation, so you don't
write a short file over the long one.  Now you've fixed that, but I
still think its worth leaving it read-only.

>>
>>>
>>> Probably a good idea would be to improve the warning dialog that pops up
>>> when you open a file with embedded NULs that explains that some functions
>>> may not work properly.
>>
>> Yes, I think that there would definitely need to be some warning.
>> That means that you have to check the file for NUL, can this be coded
>> in your encoding coding? :-D  So the file doesn't need scanning again.
>
> It's done and working[1] in a similar fashion as before (sans the
> truncation), where it compares the full data size with strlen() of the data
> and if they're different, passes read-only back as true, which causes the
> warning dialog to show and the doc to be made read-only.
>

Ok.

>>
>> If we let the user edit the file, the main thing that must work is
>> finding and deleting the NULs so the user can get the file back to
>> being a text file and the search capabilities.
>>
>
> Already works fine, you just turn read-only off, select the NUL character
> and delete it.  One note about "get the file back to being a text file" is
> that a file with embedded NULs is actually a perfectly valid text file.

Agree, I meant C's definition of "text file".

The
> problem in Geany, is that it assumes that a text file is the same thing as a
> C string.  Any file that contains all valid characters in it's encoding (of
> which NUL is one of), is a valid text file.

Agree.

Cheers
Lex



More information about the Devel mailing list