[Geany-devel] [CODE REVIEW] Changes to encodings

Colomban Wendling lists.ban at xxxxx
Tue May 17 15:30:30 UTC 2011

Le 16/05/2011 05:45, Matthew Brush a écrit :
> Hi all,

Hey. A bit later, but I finally managed to find some time reading your
Below is basically only some comments on the patches, you already
discussed some points with Lex, I won't add anything on what was said
when I don't think I have something to add.

> I was wondering if anyone had time to review some changes I've been
> working on?

No really as you see, times is rare these days :/
But I'm trying to do my best...

> There's two commits involved (attached patches and linked below):
> 1) Allow embedded NUL chars in files (without truncating).

I did a fast and inaccurate review of the patch, see my comments on it:

yeah, I'm trying GitHub to see what it actually gives. But I'm afraid
it'll make my comments less visible for the ML readers, so unlikely fr
them to comment back…

> 2) Add more regex patterns for guessing encoding.

This one too, see

> The first one is kind of large and affects several files/functions.  One
> thing I did which should change is in a few places I added a 'gsize
> tmp_size' var to pass to the encodings_convert_to_utf8* functions, where
> a simple NULL size parameter would've sufficed, in these cases, the
> string is known to be NUL-terminated so this parameter isn't required.
> Unfortunately I made a commit after this so fixing this will come
> afterwards (though it's not such a big deal).  Also, I'm not completely
> satisfied with the name of the sci_add_text2() function added to the
> sciwrappers files, but there was already a function named sci_add_text()
> which truncates the text at the first NUL, and the new function is a
> more direct wrapper of the SCI_ADDTEXT message.

sci_add_text_full() ?

having counterpart for set_text() would probably be good, too.

> These changes break the
> plugin API, and would require two small changes to the geany-plugins
> GeanyVC: geanyvc.c:480 and geanyvc.c:498 as well as GeanyGenDoc:
> ggd-plugin.c:343.  It should be obvious why this breakage was required I
> hope.

yeah, sounds reasonable for such a change. However, prefer breaking the
API once (or at least it a single row), so I wouldn't see these
committed before further checking and other checks to whether other APIs
have to be updated.

> The following bug reports are related to the first set of changes:
> [1][2][3][4].  A few of the comments seem to suggest that it's not
> desired to support text files containing NUL control bytes, but IMO,
> it's nice to be able to show these anyway, especially since Scintilla
> has a nice way of dealing with these.  IMO, as long as the file gets
> marked read-only, displaying as much as possible is better than
> truncating at the first NUL byte.

As you discussed with Lex, it's something that may please some, but
supporting this partially may simply move the user complaints :D

> The second one is a simple refactoring of the code around the
> regex-based encoding detection and adding a few more patterns to detect
> a few more types of files.  Especially needing review are the actual
> pattern strings I used since, quite frankly, I suck at regexps.  One
> thing I noticed looking at the bug report here[5], after creating these
> patches/commits, is that it seems the <?xml ... encoding="" ?> is
> already supported by the CODING regex, so you can ignore the new XML one
> I added if that truly is the case (I'll remove it from the proper patch
> I send in the future).  Like I said, my regex skills aren't great :)

As said on GitHub, I'm puzzled with this one.

> Any feedback would be much appreciated as I plan to submit these changes
> as proper patches once I've ensured they are good enough.
> In the meanwhile, I will continue testing...

That's the most important part: daily testing ^^


More information about the Devel mailing list