[Geany-devel] [CODE REVIEW] Changes to encodings

Lex Trotman elextr at xxxxx
Mon May 16 07:18:21 UTC 2011


> I did think of this a little.  I think mostly other code related to
> Scintilla will work fine since Scintilla doesn't usually assume that a NUL
> character will terminate a string, it just treats it like any other valid
> UTF-8 character, except displays a special character for control
> characters[1].  See below for specifics.
>

Yes, I expect Scintilla to work, but I expect problems with Geany code
as things stop at NULs.  I would hope that nothing will crash, but I
expect lots of truncation, which means that it doesn't work.  And that
leads to user complaints no matter how much you document that it
doesn't work, thats why I suggested that some things might need
disabling, eg by forcing the file readonly (and preventing enabling
changes).

>>
>> 1. editor.c and others including plugins use sci_add_text a number of
>> times, will any of that break? If so it needs to be disabled or fixed.
>
> Using the old function will just lop off the end of a string with embedded
> NULs because it uses strlen(), but it will still add the truncated text
> fine.

Well truncated isn't right, so its not fine from a user standpoint.

>
>> 2. do searches work with files containing NULs?
>
> Seem to work as expected.

Hmmm, wonder how, the regex engine takes a NUL terminated string.  I
would expect the Scintilla searches to work, but not regexes (well not
past the first NUL anyway), so maybe they need disabling or need to be
made to continue past the NUL.  Since one of the main reasons put
forward in the bugs you listed for opening files with NULs is lightly
corrupted log files, being able to search them is important.

>
>> 3. How do selections work with embedded NULs, AFAICT all the selection
>> calls use NUL termination?
>
> Selecting with the mouse/keyboard works the same.  As far as selecting
> programatically, assuming the Scintilla wrappers are more or less directly
> wrapping the Scintilla messages, there shouldn't be a problem.  I'll check
> these out shortly, but at glance, they look fine.

But what we select gets truncated when pasted.

>
>> 4. All the places sci_get_string, sci_get_line, sci_get_text,
>> sci_get_contents and any others that get/set text using NUL
>> termination need to be disabled or fixed.
>
> The sci_get_string() function properly finds out the size of the string
> using a Scintilla message and then ensuring a final NUL-terminator, so it
> should work fine.  Any code calling these functions will either truncate the
> returned string at the first NUL or handle it completely. Same goes for the
> other sci_get_*() functions.

Yes, but what does the code that used the call do, it has to stop at
the first NUL in the file since it doesn't have any other length info.
 Therefore it likely doesn't do all it should.

>
>> 5. All of the above goes for plugins too.
>
> Aside from the issues identified with the API breakage, the worst case
> scenario should be truncation of the string I think.

Which again means that it is likely to not work right.

>
>> 6. Do lexers work with embedded NULs or does such a file have to be
>> filetype None, if so it needs to be enforced.
>
> At least the CPP lexer works fine, others to be tested, though I don't think
> they rely too much (or at all) on NUL termination.

On further thought I guess if they work with Scite they will be ok.

>
> Probably a good idea would be to improve the warning dialog that pops up
> when you open a file with embedded NULs that explains that some functions
> may not work properly.

Yes, I think that there would definitely need to be some warning.
That means that you have to check the file for NUL, can this be coded
in your encoding coding? :-D  So the file doesn't need scanning again.

If we let the user edit the file, the main thing that must work is
finding and deleting the NULs so the user can get the file back to
being a text file and the search capabilities.

Cheers
Lex



More information about the Devel mailing list