Hi all,
I was wondering if anyone had time to review some changes I've been working on?
There's two commits involved (attached patches and linked below):
1) Allow embedded NUL chars in files (without truncating). 2) Add more regex patterns for guessing encoding.
The first one is kind of large and affects several files/functions. One thing I did which should change is in a few places I added a 'gsize tmp_size' var to pass to the encodings_convert_to_utf8* functions, where a simple NULL size parameter would've sufficed, in these cases, the string is known to be NUL-terminated so this parameter isn't required. Unfortunately I made a commit after this so fixing this will come afterwards (though it's not such a big deal). Also, I'm not completely satisfied with the name of the sci_add_text2() function added to the sciwrappers files, but there was already a function named sci_add_text() which truncates the text at the first NUL, and the new function is a more direct wrapper of the SCI_ADDTEXT message. These changes break the plugin API, and would require two small changes to the geany-plugins GeanyVC: geanyvc.c:480 and geanyvc.c:498 as well as GeanyGenDoc: ggd-plugin.c:343. It should be obvious why this breakage was required I hope.
The following bug reports are related to the first set of changes: [1][2][3][4]. A few of the comments seem to suggest that it's not desired to support text files containing NUL control bytes, but IMO, it's nice to be able to show these anyway, especially since Scintilla has a nice way of dealing with these. IMO, as long as the file gets marked read-only, displaying as much as possible is better than truncating at the first NUL byte.
The second one is a simple refactoring of the code around the regex-based encoding detection and adding a few more patterns to detect a few more types of files. Especially needing review are the actual pattern strings I used since, quite frankly, I suck at regexps. One thing I noticed looking at the bug report here[5], after creating these patches/commits, is that it seems the <?xml ... encoding="" ?> is already supported by the CODING regex, so you can ignore the new XML one I added if that truly is the case (I'll remove it from the proper patch I send in the future). Like I said, my regex skills aren't great :)
If you want to use the GitHub UI to review the commits [6][7] otherwise, patches are attached. If you have a GitHub account, you can leave comments on the commits and/or specific lines of the commits.
Any feedback would be much appreciated as I plan to submit these changes as proper patches once I've ensured they are good enough.
In the meanwhile, I will continue testing...
Cheers, Matthew Brush
[1] https://sourceforge.net/tracker/index.php?func=detail&aid=3232135&gr... [2] https://sourceforge.net/tracker/index.php?func=detail&aid=2893180&gr... [3] https://sourceforge.net/tracker/index.php?func=detail&aid=3232135&gr... [4] https://sourceforge.net/tracker/index.php?func=detail&aid=2695290&gr... [5] https://sourceforge.net/tracker/index.php?func=detail&aid=3183506&gr... [6] https://github.com/codebrainz/geany/commit/81c89cce736c88f235761ce5765f2361f... [7] https://github.com/codebrainz/geany/commit/5dec6db43b498378be3496d4e89496835...