Hi,
To offload the discussion from PR#441 [1] from this partly off-topic discussion and give it more visibility, I'm moving it here to the ML. There already was a thread on the subject that got mostly forgotten, see [2].
I apologize for the long and complex email, but I don't know how to present that in a more concise way without losing important information or clarity.
Note: I will use *NIX-style command lines as examples. The ones for Windows are slightly different, but the problematic is basically the same (though we don't use the shell there so it's a bit simpler).
1. So, what are we talking about?
Some of our commands can contain placeholders. The most important ones are the build commands, but also e.g. the terminal tool has '%c'. These placeholders are replaced by Geany with various things, like file names, paths, line numbers etc. Some of the commands are executed by a shell (build commands) and some aren't (terminal command).
2. What is the problem?
The replacement for these placeholders can be anything, so they might contain characters that have a meaning in a command. The most obvious example is quotes: imagine a file named `foo "bar.c`. If we just replace the placeholder with the raw value (as we currently do) in a build command, e.g. `gcc -c -o %e.o "%f"`, it gives this:
gcc -c -o foo "bar.o "foo "bar.c"
which is obviously incorrect (see [1] or [2] if it's not obvious for you ;) We need to escape (or quote) replacements in some way or another.
3. So, how can we fix this?
Several solutions have been suggested (well, actually 3.5 is new!), each with pros and cons:
3.1. Insert quoted replacement as needed where they appear
This is what I first implemented: track the quotes in the user command, close them before inserting a quoted/escaped replacement, and reopen it right after. With the above command, it would give this:
gcc -c -o 'foo "bar'.o ""'foo "bar.c'""
which is valid and as we want it (again, see [1] and [2] if it's not clear that it's valid).
This is what the patch at [3] (and [2], but the version there is outdated) does.
3.1.1. Pros
* compatible with any current user commands (no breakage of the user configuration, and fixes replacement in them); * the user doesn't have to worry about the issue.
3.1.2. Cons
* requires understanding of the shell quoting rules (including sub-shell like `` and $()), which might be non-trivial. * if it gets something wrong, the users are mostly screwed (the only thing they could do would be try to adapt the command in a way for Geany to get it right)
3.2. Insert quoted replacements everywhere blindly
Replace blindly each placeholder with an escaped/quoted version of itself. This is like 3.1 but doesn't try to manage quotes in the input command. It would give:
gcc -c -o 'foo "bar'.o "'foo "bar.c'"
(which is incorrect, see the cons below)
3.2.1. Pros
* easy to implement; * simple, the user can easily know exactly what happens.
3.2.2. Cons:
* Incompatible with (some) current user commands: as the placeholder is replaced without care, the placeholders need to appear outside other quotes. E.g. the above example would have to be altered to remove quoting around the %f placeholder not to be quoted twice: `gcc -c -o %e.o %f`.
3.3. Provide format modifier performing specific quoting
Dimitar suggested introducing format modifiers like %"f or %'f that would quote in various ways (see https://github.com/geany/geany/pull/441#issuecomment-87272057).
3.3.1. Pros
* The users can quote in various ways as they see fit; * Explicit control on how things are quoted/escaped; * Could support recursive quoting (quoting twice or even more), so Lex could use it in a `python -c` or similar called by the shell ;) e.g. `python -c "print(%""f.replace(' ', '_'))"` could end up in `python -c "print("foo\ \"bar.c".replace(' ', '_'))"` -- assuming we implement the escaping Python requires. happy? :)
3.3.2. Cons
* it doesn't fix existing user commands (needs to make use of the new format modifiers); * requires to implement various kind of quoting (which requires knowing several shell escaping rules); * complex and uncommon format modifiers (the users have to pick the right one, which might not be obvious); * the users can screw themselves up if they don't use the appropriate format (could work with some replacement but not others).
3.4. Parse command as an argument vector and replace in each argument
Instead of working on the command itself, use g_shell_parse_argv() and perform replacements in argv[1:] (each argument separately).
3.4.1. Pros
* no need for quoting or escaping the replacement (as each argument is done on its own).
3.4.2. Cons
* doesn't work with shell constructs, so it's not a solution for build commands (as an argv is an argv, which can't support sub-shell, piping and whatnot).
3.5. Use environment variables, not placeholders
Change the whole logic and use the environment to pass what currently uses placeholders. E.g. a command could be `gcc -c -o "${GEANY_FILENAME_NOEXT}.o" "$GEANY_FILENAME"`.
3.5.1. Pros
* quite easy to implement (but see cons); * no need for quoting or escaping replacements; * works the same as any environment or shell variable; * environment variables can then also be used by the called commands themselves, not only on their command line.
3.5.2. Cons
* it doesn't fix existing user commands (needs to make use of the new variables); * requires separate logic (e.g. 3.4) when the command is not run in a shell (as something has to read the replace on the command line itself); * the users can screw themselves up if they don't use the appropriate quoting around the variables (could work with some replacement but not others).
4. So, what?
I like 3.1 ("smart" replacement) and 3.5 (environment variables) mostly. 3.1 is the only solution here that doesn't require changes in user commands, so it is seamless on the user. 3.5 leaves the actual replacement to the shell (so the limitations are the ones of the shell), and also gives the information to the children, grandchildren, etc.
Note that we perhaps could use something like 3.1 simply to upgrade user commands to the format of one of the other solutions. This way an error in the implementation of 3.1 can be fixed by the user.
What do you think?
Regards, Colomban
[1] https://github.com/geany/geany/pull/441 [2] http://lists.geany.org/pipermail/devel/2012-December/007329.html [3] https://github.com/geany/geany/pull/441#issuecomment-87336184 and https://gist.github.com/b4n/4c100d6f1defd4751217