[Geany-devel] [CODE REVIEW] Changes to encodings

Matthew Brush mbrush at xxxxx
Mon May 16 03:45:13 UTC 2011

Hi all,

I was wondering if anyone had time to review some changes I've been 
working on?

There's two commits involved (attached patches and linked below):

1) Allow embedded NUL chars in files (without truncating).
2) Add more regex patterns for guessing encoding.

The first one is kind of large and affects several files/functions.  One 
thing I did which should change is in a few places I added a 'gsize 
tmp_size' var to pass to the encodings_convert_to_utf8* functions, where 
a simple NULL size parameter would've sufficed, in these cases, the 
string is known to be NUL-terminated so this parameter isn't required. 
Unfortunately I made a commit after this so fixing this will come 
afterwards (though it's not such a big deal).  Also, I'm not completely 
satisfied with the name of the sci_add_text2() function added to the 
sciwrappers files, but there was already a function named sci_add_text() 
which truncates the text at the first NUL, and the new function is a 
more direct wrapper of the SCI_ADDTEXT message.  These changes break the 
plugin API, and would require two small changes to the geany-plugins 
GeanyVC: geanyvc.c:480 and geanyvc.c:498 as well as GeanyGenDoc: 
ggd-plugin.c:343.  It should be obvious why this breakage was required I 

The following bug reports are related to the first set of changes: 
[1][2][3][4].  A few of the comments seem to suggest that it's not 
desired to support text files containing NUL control bytes, but IMO, 
it's nice to be able to show these anyway, especially since Scintilla 
has a nice way of dealing with these.  IMO, as long as the file gets 
marked read-only, displaying as much as possible is better than 
truncating at the first NUL byte.

The second one is a simple refactoring of the code around the 
regex-based encoding detection and adding a few more patterns to detect 
a few more types of files.  Especially needing review are the actual 
pattern strings I used since, quite frankly, I suck at regexps.  One 
thing I noticed looking at the bug report here[5], after creating these 
patches/commits, is that it seems the <?xml ... encoding="" ?> is 
already supported by the CODING regex, so you can ignore the new XML one 
I added if that truly is the case (I'll remove it from the proper patch 
I send in the future).  Like I said, my regex skills aren't great :)

If you want to use the GitHub UI to review the commits [6][7] otherwise, 
patches are attached.  If you have a GitHub account, you can leave 
comments on the commits and/or specific lines of the commits.

Any feedback would be much appreciated as I plan to submit these changes 
as proper patches once I've ensured they are good enough.

In the meanwhile, I will continue testing...

Matthew Brush

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Make-encoding-not-care-about-embedded-NUL-control-ch.patch
Type: text/x-patch
Size: 19517 bytes
Desc: not available
URL: <http://lists.geany.org/pipermail/devel/attachments/20110515/ba21be5f/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-regex-patterns-for-XML-CSS-and-HTTP-Mail-headers.patch
Type: text/x-patch
Size: 2908 bytes
Desc: not available
URL: <http://lists.geany.org/pipermail/devel/attachments/20110515/ba21be5f/attachment-0001.bin>

More information about the Devel mailing list