[geany/geany] Provide utf-8 command line arguments on Windows (#1258)

Re: [Github-comments]...

cshu

6 Oct 2016 6 Oct '16

9:45 a.m.

Windows filenames (NTFS) may contain unicode characters. Passing the filename to `geany.exe` as command line argument does not open the file.

At startup, it can get all arguments in UTF-16, and convert them to UTF-8. So filenames containing unicode characters can be passed to `geany.exe`. You can view, comment on, or merge this pull request online at:

https://github.com/geany/geany/pull/1258

-- Commit Summary --

* Provide utf-8 command line arguments on Windows

-- File Changes --

M src/main.c (36)

-- Patch Links --

https://github.com/geany/geany/pull/1258.patch https://github.com/geany/geany/pull/1258.diff

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Attachments:

attachment.htm (text/html — 2.7 KB)

Show replies by date

elextr

6 Oct 6 Oct

10:36 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

Does this work if the locale is a non-unicode code page?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-251900323

cshu

11:17 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

Should work, regardless of locale settings.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-251909482

Stefan Oberhumer

13 Oct 13 Oct

7:25 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

I'm not sure if it's necessary but have you checked you don't have to handle argv[0] the same way ? (Eg geany is in an directory containing unicode characters)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-253580245

cshu

14 Oct 14 Oct

4:02 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

I created the pull request mainly for the convenience of "open with" Geany on Windows

On the other hand it should be very rare for Geany's installation path to contain unicode character. And I'd like to let geany launch as fast as possible when I press Win+R and type `geany`. So I avoided the conversion logic when `argc == 0`.

I guess maintainers can make the final decision.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-253808396

Enrico Tröger

16 Oct 16 Oct

5:36 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

I think we should handle the argv[0] issue as well. But only if it is necessary at all, see below.

Can you provide a test case for this? I can open files on a NTFS drive with non-ASCII characters in their path in Geany either from the command line (cmd.exe) as well as using the "Open With" Explorer menu item.

So, there must be some more criteria which involve the behaviour.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-254053933

cshu

17 Oct 17 Oct

2:10 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

This can be reproduced when filename character is not supported by system locale's encoding. E.g. When system locale is "English (United States)", double clicking `read我.txt` opens an empty tab in Geany with name `read?.txt`. The file is not really opened.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-254087072

Enrico Tröger

25 Oct 25 Oct

12:08 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

In the initial description you talk about "unicode". Unicode characters can be saved using different encodings, this can be UTF-8, UTF-16 and so on. We cannot know which encoding the filename of a file on disk was used.

Yes, we can assume UTF-16 or we can assume UTF-8, maybe even both by trying. But then the next user wants UTF-32 BE, then UTF-32 LE and whatever else.

The behaviour is probably very similar on non-Windows platforms. I think usually filenames should follow the system's locale. Mixed charsets are never a good idea.

Anyway, I don't know how I could create a file on Windows with a filename not in the system's locale and so cannot really test it. I just tested your changes with non-ASCII filenames in the system's locale and it still works.

Before this could get merged, two remarks: - could you try to handle argv[0] as well as mentioned above - it'd be nice if you could move the Windows specific code into a new function in src/win32.c, then src/main.c gets less cluttered. Usually we try to have the weird Windows-specific ~~hidden~~abstracted in src/win32.c :)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-255880669

cshu

28 Oct 28 Oct

7:31 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

@cshu pushed 1 commit.

e9e7c11 most-logic-in-win32

-- You are receiving this because you are subscribed to this thread. View it on GitHub: https://github.com/geany/geany/pull/1258/files/1dd0769d7ea77eb54af0a61238370...

cshu

8:36 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

I moved most code to win32.c, and included argv[0] in conversion.

Different file systems can use various encoding for filenames. (NTFS, FAT32, etc.) But `CommandLineToArgvW` always provide filenames in UTF-16. So in the case of windows api functions, underlying file system usually should not be a problem.

It's easy to get some random CJK characters from typing code unit. (Ctrl+Shift+I on firefox/chrome, in the console, type '\u6666')

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-256847252

Enrico Tröger

9 Nov 9 Nov

12:28 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

eht16 commented on this pull request.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

+ +void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)

Maybe `win32_convert_argv_to_utf8()` would be a better name.

...

+{

+ int num_arg; + LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), &num_arg); + char **utf8argv = g_new0(char *, num_arg + 1); + int i = num_arg; + while(i) + { + i--; + utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, NULL, NULL, NULL); + } + *pargc = num_arg; + *pargv = utf8argv; + LocalFree(szarglist); +} + +void win32_free_argv_made_in_utf8(gint argc, gchar **argv)

The code is quite generic and not Windows-specific, so it could go into src/utils.c as `utils_free_argv()`, similar to `utils_free_pointers()`.

...

@@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)

#endif

gtk_main(); +#ifdef G_OS_WIN32 + win32_free_argv_made_in_utf8(argc, argv);

Maybe the call including the G_OS_WIN32 guard could go into a seperate function in src/libmain.c as it should be also in https://github.com/geany/geany/blob/master/src/libmain.c#L1090. Strictly there are more places where it should be called, e.g. every place where we call `exit()` but this would require some refactoring before which is out of scope of this PR (and technically it is not very necessary as the memory in cleaned up anyway on `exit()`).

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#pullrequestreview-7732004

Enrico Tröger

12:30 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

eht16 requested changes on this pull request.

Thanks for the hints for reproducing. After being able to reproduce the issue and testing your changes, it worked fine.

Thanks.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#pullrequestreview-7733367

Colomban Wendling

10:04 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

b4n requested changes on this pull request.

The idea seems good, and without having actually tested the PR I can confirm from other tests I did yetserday that the combination of `GetCommandLineW()` and `CommandLineToArgvW()` seems to work just fine.

...

+{

should actually use `g_strfreev()` as the `argv[]` array ends with a `NULL` element.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

+ +void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv) +{ + int num_arg; + LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), &num_arg); + char **utf8argv = g_new0(char *, num_arg + 1); + int i = num_arg; + while(i)

I'd prefer a `for()` loop like ```C for (int i = 0; i < num_arg; i++) utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, NULL, NULL, NULL); ```

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

Checking for error might be needed, as [Windows seems to allows invalid UTF-16 in filenames](https://en.wikipedia.org/wiki/UTF-8#WTF-8), and I'm afraid `g_utf16_to_utf8()` doesn't allow this. Another solution might be a custom "broken UTF-16 to WTF-8" converter, like I played with in https://github.com/b4n/wtf8tools. It seems most places in GLib don't have problem with WTF8, as it's structurally fine and "just" encodes weird stuff.

Anyway, something should be done because only the last element of `argv[]` should ever be `NULL`, and I wouldn't be surprised it'd crash somewhere if it wasn't respected.

OK, this is a low-risk issue I guess (invalid UTF-16 filenames), but still a theoretically possible one.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

+ +void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)

Agreed

...

@@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)

#endif

gtk_main(); +#ifdef G_OS_WIN32 + win32_free_argv_made_in_utf8(argc, argv);

I'm not sure it's worth having a function which is a no-op on non-Windows just for not duplicating one guarding of a free; if we really don't want duplication here we find a way to only have one exit path :)

...

@@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)

#endif

gtk_main(); +#ifdef G_OS_WIN32 + win32_free_argv_made_in_utf8(argc, argv);

Alternatively the conversion and freeing could be done in `main()` itself (from *main.c*), so `main_lib()` simply receives a nice argument list directly:

```C int main(int argc, char **argv) { int ret; #ifdef G_OS_WIN32 /* on Windows, get the Unicode argulment list and convert it to UTF-8 into argv[] */ LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), &argc);

argv = g_new(char *, argc + 1); for (int i = 0; i < argc; i++) argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, NULL, NULL, NULL); argv[argc] = NULL; LocalFree(szarglist); #endif

ret = main_lib(argc, argv);

#ifdef G_OS_WIN32 g_strfreev(argv); #endif

return ret; } ```

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#pullrequestreview-7773217

Colomban Wendling

10:16 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

b4n commented on this pull request.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

+ +void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)

Actually recent GLib (2.40+) have `g_win32_get_command_line()` which basically does just that.

(and doesn't bother with checking for error in the conversion, so it might be "alright" -- or maybe they find it OK because they don't return arg count and expect callers to use `g_strv_length()` which that would simply truncate the list, not sure)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Colomban Wendling

10:26 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

Hum, the [docs on `g_win32_get_command_line()`](https://developer.gnome.org/glib/unstable/glib-Windows-Compatibility-Functio...) mention that [`g_option_context_parse()`](https://developer.gnome.org/glib/unstable/glib-Commandline-option-parser.htm...) (that we use) expects locale-encoded strings, not UTF-8 ones. GLib 2.40+ has [`g_option_context_parse_strv()`](https://developer.gnome.org/glib/unstable/glib-Commandline-option-parser.htm...) which expects *GLib filename encoding* (that is, UTF-8 on Windows), but it's fairly recent.

(Or we could please @codebrainz and rewrite using GApplication which does this for us… but that would be a lot of changes :))

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-259368928

elextr

12:09 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

...

(Or we could please @codebrainz and rewrite using GApplication which does this for us… but that would be a lot of changes :))

or maybe qapplication ;-P

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-259389716

Matthew Brush

12:22 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

codebrainz commented on this pull request.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

IMO, it would be cleaner if it didn't hardcode UTF-16 assumptions. While it's highly unlikely to change any time soon, Win32 does provide functions to convert to/from various encodings to its native "wide" encoding without hard-coding what that is.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Colomban Wendling

12:55 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

b4n commented on this pull request.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

@codebrainz myeah, theoretically it might, but in practice it won't change (nobody can afford that honestly), and that Windows API is a lot more annoying to use for us. But if it does handle the invalid Unicode filenames and `g_utf16_to_utf8()` doesn't, it might be worth it.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Matthew Brush

4:33 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

codebrainz commented on this pull request.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

...

that Windows API is a lot more annoying to use for us

It's not _a lot_ more annoying, just `WideCharToMultiByte` with `CPUTF8`. Could even write a little function like `win32_wide_to_utf8` that hides the pre-flight pass and allocates the memory with GLib's allocator. It's maybe 5-10 lines of code to write that function.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Colomban Wendling

5:16 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

b4n commented on this pull request.

...

@@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)

return g_build_filename(g_get_user_config_dir(), "geany", NULL); }

Well it's totally doable, but it involves calling `WideCharToMultiByte()` twice (or over-alloc `len*4+1` bytes), allocating manually and the like. Indeed nothing complicated, but at least 4 lines or so worth of WINAPI code, where we basically have access to a 1-line thing. So if it gives us something, yeah sure, otherwise I'm not really sure it's worth it.

Hell, let's see how it'd look: ```C static gchar *wcstr_to_utf8(wchar_t *wcstr) { int len = WideCharToMultiByte(CP_UTF8, 0, wcstr, -1, NULL, 0, NULL, NULL); gchar *utf8str = g_malloc(len + 1); WideCharToMultiByte(CP_UTF8, 0, wcstr, -1, utf8str, len + 1, NULL, NULL); utf8str[len] = 0; // FIXME: is that useful? return utf8str; } ``` And according to the [docs](https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130%28v=vs.85%...) it probably doesn't really help:

...

**WC_ERR_INVALID_CHARS**: **Windows Vista and later:** Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION. Note that this flag only applies when CodePage is specified as CP_UTF8 or 54936 (for Windows Vista and later). It cannot be used with other code page values.

So my understanding would be that in case of invalid UTF-16 (as it is possible in filenames), it would result in uninteresting UTF-8, while we'd actually want "WTF-8".

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Enrico Tröger

10:28 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

eht16 commented on this pull request.

...

@@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)

#endif

gtk_main(); +#ifdef G_OS_WIN32 + win32_free_argv_made_in_utf8(argc, argv);

Yeah, also nice. But I'd still vote to put the Windows specific code in the first G_OS_WIN32 block into src/win32.c.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Enrico Tröger

10:31 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

eht16 commented on this pull request.

...

+{

Absolutely, I missed the point that the array is NULL-terminated. Sorry.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258

Enrico Tröger

4 Dec 4 Dec

10:58 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

@cshu still interested in this PR? I think it would be cool and the sooner we get it merged (regarding the next release) the better it is.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-264694525

cshu

3:16 p.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

@eht16 No, I leave it to you. This PR is a small change.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-264706407

Thomas Martitz

20 Apr 20 Apr

9:17 a.m.

New subject: [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

I wonder, what makes this Windows-specific? On-disk file names can be arbitrary encoding on all platforms, right?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/1258#issuecomment-295607028

2804

Age (days ago)

3000

Last active (days ago)

github-comments@lists.geany.org

24 comments

7 participants

tags (0)

participants (7)

Colomban Wendling
cshu
elextr
Enrico Tröger
Matthew Brush
Stefan Oberhumer
Thomas Martitz