[Geany-Devel] Placeholder replacement in (build) commands

Colomban Wendling lists.ban at xxxxx
Mon Mar 30 17:18:16 UTC 2015


Hi,

To offload the discussion from PR#441 [1] from this partly off-topic
discussion and give it more visibility, I'm moving it here to the ML.
There already was a thread on the subject that got mostly forgotten, see
[2].

I apologize for the long and complex email, but I don't know how to
present that in a more concise way without losing important information
or clarity.

Note:  I will use *NIX-style command lines as examples.  The ones for
Windows are slightly different, but the problematic is basically the
same (though we don't use the shell there so it's a bit simpler).


1. So, what are we talking about?

Some of our commands can contain placeholders.  The most important ones
are the build commands, but also e.g. the terminal tool has '%c'.  These
placeholders are replaced by Geany with various things, like file names,
paths, line numbers etc.
Some of the commands are executed by a shell (build commands) and some
aren't (terminal command).


2. What is the problem?

The replacement for these placeholders can be anything, so they might
contain characters that have a meaning in a command.  The most obvious
example is quotes: imagine a file named `foo "bar.c`.  If we just
replace the placeholder with the raw value (as we currently do) in a
build command, e.g. `gcc -c -o %e.o "%f"`, it gives this:

	gcc -c -o foo "bar.o "foo "bar.c"

which is obviously incorrect (see [1] or [2] if it's not obvious for you ;)
We need to escape (or quote) replacements in some way or another.


3. So, how can we fix this?

Several solutions have been suggested (well, actually 3.5 is new!), each
with pros and cons:


3.1. Insert quoted replacement as needed where they appear

This is what I first implemented: track the quotes in the user command,
close them before inserting a quoted/escaped replacement, and reopen it
right after.  With the above command, it would give this:

	gcc -c -o 'foo "bar'.o ""'foo "bar.c'""

which is valid and as we want it (again, see [1] and [2] if it's not
clear that it's valid).

This is what the patch at [3] (and [2], but the version there is
outdated) does.

3.1.1. Pros

* compatible with any current user commands (no breakage of the user
configuration, and fixes replacement in them);
* the user doesn't have to worry about the issue.

3.1.2. Cons

* requires understanding of the shell quoting rules (including sub-shell
like `` and $()), which might be non-trivial.
* if it gets something wrong, the users are mostly screwed (the only
thing they could do would be try to adapt the command in a way for Geany
to get it right)


3.2. Insert quoted replacements everywhere blindly

Replace blindly each placeholder with an escaped/quoted version of
itself.  This is like 3.1 but doesn't try to manage quotes in the input
command.  It would give:

	gcc -c -o 'foo "bar'.o "'foo "bar.c'"

(which is incorrect, see the cons below)

3.2.1. Pros

* easy to implement;
* simple, the user can easily know exactly what happens.

3.2.2. Cons:

* Incompatible with (some) current user commands: as the placeholder is
replaced without care, the placeholders need to appear outside other
quotes.  E.g. the above example would have to be altered to remove
quoting around the %f placeholder not to be quoted twice: `gcc -c -o
%e.o %f`.


3.3. Provide format modifier performing specific quoting

Dimitar suggested introducing format modifiers like %"f or %'f that
would quote in various ways (see
https://github.com/geany/geany/pull/441#issuecomment-87272057).

3.3.1. Pros

* The users can quote in various ways as they see fit;
* Explicit control on how things are quoted/escaped;
* Could support recursive quoting (quoting twice or even more), so Lex
could use it in a `python -c` or similar called by the shell ;)  e.g.
`python -c "print(%\""f.replace(' ', '_'))"` could end up in `python -c
"print("foo\ \\\"bar.c".replace(' ', '_'))"` -- assuming we implement
the escaping Python requires.  happy? :)

3.3.2. Cons

* it doesn't fix existing user commands (needs to make use of the new
format modifiers);
* requires to implement various kind of quoting (which requires knowing
several shell escaping rules);
* complex and uncommon format modifiers (the users have to pick the
right one, which might not be obvious);
* the users can screw themselves up if they don't use the appropriate
format (could work with some replacement but not others).


3.4. Parse command as an argument vector and replace in each argument

Instead of working on the command itself, use g_shell_parse_argv() and
perform replacements in argv[1:] (each argument separately).

3.4.1. Pros

* no need for quoting or escaping the replacement (as each argument is
done on its own).

3.4.2. Cons

* doesn't work with shell constructs, so it's not a solution for build
commands (as an argv is an argv, which can't support sub-shell, piping
and whatnot).


3.5.  Use environment variables, not placeholders

Change the whole logic and use the environment to pass what currently
uses placeholders.  E.g. a command could be `gcc -c -o
"${GEANY_FILENAME_NOEXT}.o" "$GEANY_FILENAME"`.

3.5.1. Pros

* quite easy to implement (but see cons);
* no need for quoting or escaping replacements;
* works the same as any environment or shell variable;
* environment variables can then also be used by the called commands
themselves, not only on their command line.

3.5.2. Cons

* it doesn't fix existing user commands (needs to make use of the new
variables);
* requires separate logic (e.g. 3.4) when the command is not run in a
shell (as something has to read the replace on the command line itself);
* the users can screw themselves up if they don't use the appropriate
quoting around the variables (could work with some replacement but not
others).


4. So, what?

I like 3.1 ("smart" replacement) and 3.5 (environment variables) mostly.
 3.1 is the only solution here that doesn't require changes in user
commands, so it is seamless on the user.  3.5 leaves the actual
replacement to the shell (so the limitations are the ones of the shell),
and also gives the information to the children, grandchildren, etc.

Note that we perhaps could use something like 3.1 simply to upgrade user
commands to the format of one of the other solutions.  This way an error
in the implementation of 3.1 can be fixed by the user.


What do you think?

Regards,
Colomban


[1] https://github.com/geany/geany/pull/441
[2] http://lists.geany.org/pipermail/devel/2012-December/007329.html
[3] https://github.com/geany/geany/pull/441#issuecomment-87336184 and
https://gist.github.com/b4n/4c100d6f1defd4751217


More information about the Devel mailing list