[Geany-devel] [CODE REVIEW] Changes to encodings

Tue May 17 15:30:30 UTC 2011

Le 16/05/2011 05:45, Matthew Brush a écrit :
> Hi all,

Hey. A bit later, but I finally managed to find some time reading your
patches.
Below is basically only some comments on the patches, you already
discussed some points with Lex, I won't add anything on what was said
when I don't think I have something to add.

> I was wondering if anyone had time to review some changes I've been
> working on?

No really as you see, times is rare these days :/
But I'm trying to do my best...

> There's two commits involved (attached patches and linked below):
> 
> 1) Allow embedded NUL chars in files (without truncating).

I did a fast and inaccurate review of the patch, see my comments on it:
https://github.com/codebrainz/geany/commit/81c89cce736c88f235761ce5765f2361fe2bedc8

yeah, I'm trying GitHub to see what it actually gives. But I'm afraid
it'll make my comments less visible for the ML readers, so unlikely fr
them to comment back…

> 2) Add more regex patterns for guessing encoding.

This one too, see
https://github.com/codebrainz/geany/commit/5dec6db43b498378be3496d4e8949683544c1044

> The first one is kind of large and affects several files/functions.  One
> thing I did which should change is in a few places I added a 'gsize
> tmp_size' var to pass to the encodings_convert_to_utf8* functions, where
> a simple NULL size parameter would've sufficed, in these cases, the
> string is known to be NUL-terminated so this parameter isn't required.
> Unfortunately I made a commit after this so fixing this will come
> afterwards (though it's not such a big deal).  Also, I'm not completely
> satisfied with the name of the sci_add_text2() function added to the
> sciwrappers files, but there was already a function named sci_add_text()
> which truncates the text at the first NUL, and the new function is a
> more direct wrapper of the SCI_ADDTEXT message.

sci_add_text_full() ?

having counterpart for set_text() would probably be good, too.

> These changes break the
> plugin API, and would require two small changes to the geany-plugins
> GeanyVC: geanyvc.c:480 and geanyvc.c:498 as well as GeanyGenDoc:
> ggd-plugin.c:343.  It should be obvious why this breakage was required I
> hope.

yeah, sounds reasonable for such a change. However, prefer breaking the
API once (or at least it a single row), so I wouldn't see these
committed before further checking and other checks to whether other APIs
have to be updated.

> The following bug reports are related to the first set of changes:
> [1][2][3][4].  A few of the comments seem to suggest that it's not
> desired to support text files containing NUL control bytes, but IMO,
> it's nice to be able to show these anyway, especially since Scintilla
> has a nice way of dealing with these.  IMO, as long as the file gets
> marked read-only, displaying as much as possible is better than
> truncating at the first NUL byte.

As you discussed with Lex, it's something that may please some, but
supporting this partially may simply move the user complaints :D

> The second one is a simple refactoring of the code around the
> regex-based encoding detection and adding a few more patterns to detect
> a few more types of files.  Especially needing review are the actual
> pattern strings I used since, quite frankly, I suck at regexps.  One
> thing I noticed looking at the bug report here[5], after creating these
> patches/commits, is that it seems the <?xml ... encoding="" ?> is
> already supported by the CODING regex, so you can ignore the new XML one
> I added if that truly is the case (I'll remove it from the proper patch
> I send in the future).  Like I said, my regex skills aren't great :)

As said on GitHub, I'm puzzled with this one.

> Any feedback would be much appreciated as I plan to submit these changes
> as proper patches once I've ensured they are good enough.
> 
> In the meanwhile, I will continue testing...

That's the most important part: daily testing ^^

Cheers,
Colomban