[Geany-devel] [CODE REVIEW] Changes to encodings
Matthew Brush
mbrush at xxxxx
Mon May 16 03:45:13 UTC 2011
Hi all,
I was wondering if anyone had time to review some changes I've been
working on?
There's two commits involved (attached patches and linked below):
1) Allow embedded NUL chars in files (without truncating).
2) Add more regex patterns for guessing encoding.
The first one is kind of large and affects several files/functions. One
thing I did which should change is in a few places I added a 'gsize
tmp_size' var to pass to the encodings_convert_to_utf8* functions, where
a simple NULL size parameter would've sufficed, in these cases, the
string is known to be NUL-terminated so this parameter isn't required.
Unfortunately I made a commit after this so fixing this will come
afterwards (though it's not such a big deal). Also, I'm not completely
satisfied with the name of the sci_add_text2() function added to the
sciwrappers files, but there was already a function named sci_add_text()
which truncates the text at the first NUL, and the new function is a
more direct wrapper of the SCI_ADDTEXT message. These changes break the
plugin API, and would require two small changes to the geany-plugins
GeanyVC: geanyvc.c:480 and geanyvc.c:498 as well as GeanyGenDoc:
ggd-plugin.c:343. It should be obvious why this breakage was required I
hope.
The following bug reports are related to the first set of changes:
[1][2][3][4]. A few of the comments seem to suggest that it's not
desired to support text files containing NUL control bytes, but IMO,
it's nice to be able to show these anyway, especially since Scintilla
has a nice way of dealing with these. IMO, as long as the file gets
marked read-only, displaying as much as possible is better than
truncating at the first NUL byte.
The second one is a simple refactoring of the code around the
regex-based encoding detection and adding a few more patterns to detect
a few more types of files. Especially needing review are the actual
pattern strings I used since, quite frankly, I suck at regexps. One
thing I noticed looking at the bug report here[5], after creating these
patches/commits, is that it seems the <?xml ... encoding="" ?> is
already supported by the CODING regex, so you can ignore the new XML one
I added if that truly is the case (I'll remove it from the proper patch
I send in the future). Like I said, my regex skills aren't great :)
If you want to use the GitHub UI to review the commits [6][7] otherwise,
patches are attached. If you have a GitHub account, you can leave
comments on the commits and/or specific lines of the commits.
Any feedback would be much appreciated as I plan to submit these changes
as proper patches once I've ensured they are good enough.
In the meanwhile, I will continue testing...
Cheers,
Matthew Brush
[1]
https://sourceforge.net/tracker/index.php?func=detail&aid=3232135&group_id=153444&atid=787791
[2]
https://sourceforge.net/tracker/index.php?func=detail&aid=2893180&group_id=153444&atid=787791
[3]
https://sourceforge.net/tracker/index.php?func=detail&aid=3232135&group_id=153444&atid=787791
[4]
https://sourceforge.net/tracker/index.php?func=detail&aid=2695290&group_id=153444&atid=787791
[5]
https://sourceforge.net/tracker/index.php?func=detail&aid=3183506&group_id=153444&atid=787791
[6]
https://github.com/codebrainz/geany/commit/81c89cce736c88f235761ce5765f2361fe2bedc8
[7]
https://github.com/codebrainz/geany/commit/5dec6db43b498378be3496d4e8949683544c1044
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Make-encoding-not-care-about-embedded-NUL-control-ch.patch
Type: text/x-patch
Size: 19517 bytes
Desc: not available
URL: <http://lists.geany.org/pipermail/devel/attachments/20110515/ba21be5f/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-regex-patterns-for-XML-CSS-and-HTTP-Mail-headers.patch
Type: text/x-patch
Size: 2908 bytes
Desc: not available
URL: <http://lists.geany.org/pipermail/devel/attachments/20110515/ba21be5f/attachment-0001.bin>
More information about the Devel
mailing list