[Github-comments] [geany] Use multi-byte variant of CreateProcess on Windows (#809)

zhekov notifications at xxxxx
Mon Dec 14 18:32:26 UTC 2015


glib uses CreateProcessW, and because of that, or of some improper character handling, FiF with non-UTF8 text often fails with "Invalid byte sequence in conversion input" (this is documented in spawn.c). So my first suggestion for this PR would be to test FiF with texts in various encodings. The most likely reason for this failure is as this:

CreateProcessA simply passes a sequence of bytes, and grep (make, gcc, whatever) receives them as-is.

For CreateProcessW, the chars are first converted to UNICODE, and then back to 8-bit, since grep (and most console programs, especially UNIX tools) expect char **argv, not wchar_t **argv. Both conversions may fail, if at least 1 byte is not part of the current 8-bit code page. 

So, W is not a magic bullet.

For #784: the FiF utf8 dir is "converted" to locale with Geany's utils_get_locale_from_utf8(), which is specifically defined to do nothing under Win32. Unless the directory is in UTF-8 encoding, instead of "ANSI", or "UNICODE-not-convertable-to-ANSI" (the standard encodings under Win32), I would suspect this PR will not fix the problem.

g_spawn*() specifically expects all text to be passed as UTF-8. Which seems like a good idea, but as soon as you pass any text that is convertable to UNICODE, but not to the current code page, the second W conversion (UNICODE to grep's etc. char **argv) will fail - or maybe produce strings some replacement characters, so that grep (and the other tools) will be run after all. Which is even worse.

TL;DR:

The quick fix will be to convert the directory with g_locale_from_utf8(), detect any unconvertable directory names (should be rare), and fail early.

The full fix... That will be to implement a proper locale file name support, as much as it's possible under glib (our locale support under POSIX is badly broken). But that is not an easy thing to do, and will require a lot of work.

As long as most of the tools started by Geany use char **argv and not wchar_t **argv, it would be better to avoid UNICODE, and stick to 8-bit.

---

For fun - a nice excerpt from the g_spawn*() documentation:

"For C programs built with Microsoft's tools it is enough to make the [spawned] program have a wmain() instead of main() [1]. wmain() has a wide character argument vector as parameter. At least currently, mingw doesn't support wmain(), so if you use mingw to develop the spawned program, it should call g_win32_get_command_line() to get arguments in UTF-8."

Lovely. :) Especially considering that 99.9% of the development tools do not use glib.

[1] completely non-standard, of course


---
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/809#issuecomment-164519344
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.geany.org/pipermail/github-comments/attachments/20151214/11450e63/attachment.html>


More information about the Github-comments mailing list