[Geany-devel] [CODE REVIEW] Changes to encodings

Matthew Brush mbrush at xxxxx
Mon May 16 06:20:44 UTC 2011


On 05/15/11 22:09, Lex Trotman wrote:
> Hi Matthew,
>
> Comments from inspection only as I am still development machine
> challenged.  Also only on the NUL handling, encodings are not my
> specialty (as in I avoid them like the plague).

Thanks for taking a look!

>
> The problem I see with opening files with embedded NULs is then
> ensuring that it doesn't break when code that expects NUL termination
> accesses the buffer.

I did think of this a little.  I think mostly other code related to 
Scintilla will work fine since Scintilla doesn't usually assume that a 
NUL character will terminate a string, it just treats it like any other 
valid UTF-8 character, except displays a special character for control 
characters[1].  See below for specifics.

>
> 1. editor.c and others including plugins use sci_add_text a number of
> times, will any of that break? If so it needs to be disabled or fixed.

Using the old function will just lop off the end of a string with 
embedded NULs because it uses strlen(), but it will still add the 
truncated text fine.

> 2. do searches work with files containing NULs?

Seem to work as expected.

> 3. How do selections work with embedded NULs, AFAICT all the selection
> calls use NUL termination?

Selecting with the mouse/keyboard works the same.  As far as selecting 
programatically, assuming the Scintilla wrappers are more or less 
directly wrapping the Scintilla messages, there shouldn't be a problem. 
  I'll check these out shortly, but at glance, they look fine.

> 4. All the places sci_get_string, sci_get_line, sci_get_text,
> sci_get_contents and any others that get/set text using NUL
> termination need to be disabled or fixed.

The sci_get_string() function properly finds out the size of the string 
using a Scintilla message and then ensuring a final NUL-terminator, so 
it should work fine.  Any code calling these functions will either 
truncate the returned string at the first NUL or handle it completely. 
Same goes for the other sci_get_*() functions.

> 5. All of the above goes for plugins too.

Aside from the issues identified with the API breakage, the worst case 
scenario should be truncation of the string I think.

> 6. Do lexers work with embedded NULs or does such a file have to be
> filetype None, if so it needs to be enforced.

At least the CPP lexer works fine, others to be tested, though I don't 
think they rely too much (or at all) on NUL termination.

Probably a good idea would be to improve the warning dialog that pops up 
when you open a file with embedded NULs that explains that some 
functions may not work properly.

Cheers,
Matthew Brush

[1] http://www.scintilla.org/ScintillaDoc.html#SCI_SETCONTROLCHARSYMBOL



More information about the Devel mailing list