[Geany-Devel] Escaping replacement for placeholder in build commands

Mon Dec 17 01:23:08 UTC 2012

[...]
>> Hi Colomban,
>>
>> Hehe, sorry, this email arrived just after another project just did
>> something similar to "sanitise" inputs and broke my setup, making me
>> very cranky, so I didn't want to have it happen to Geany :)
>
> Oh, ok, but we have you to make sure we don't brake anything,  you're
> Geany's regression test suite by yourself ;)

Heh, don't bet on it ;)

>
>> I agree that the security risk exists, but then it needs a peculiar
>> filename, the only way to fix that would be to ban any shell
>> metacharacters in filenames, but some users won't like that.
>
> No, the solution is to escape or quote the replacements, so that any
> shell metacharacter in them wouldn't be interpreted by the shell.
>

Ok, yes if done correctly.

>> As for spaces in filenames, see below.
>>
>>>> I think the idea is to just escape the filenames being replaced into %f
>>>> et al placeholders, not the whole command en masse. IIRC, the only thing
>>>> placed into the placeholders is filenames/paths, which should be able to
>>>> be safely escaped/sanitized without messing up the whole command.
>>>
>>> That's it.  Another example like Matthew's one:  what I want to do is
>>> only to make sure that e.g. "%f" is never replaced by something that
>>> will be interpreted as a command.  Those placeholders (%f, %d, %e and
>>> %p) represent filenames or paths and never user-typed commands, so they
>>> should never be interpreted as a command.
>>>
>>> For example, the simple command:
>>>
>>>         gcc "%f" -c -o "%e.o"
>>>
>>> Given the current file named:
>>>
>>>         holy "crap.c
>>>
>>> would currently expand to:
>>>
>>>         gcc "holy "crap.c" -c -o "holy "crap.o"
>>>
>>> Which, huh, doesn't mean what you want at all!  The commands will be
>>> understood as:
>>>
>>> argv[0] = gcc
>>> argv[1] = holy crap.c -c -o holy
>>> argv[2] = crap.o
>>>
>>> (if my shell unquoting skills are correct ;))
>>
>> Understand, I think thats right.
>>
>>>
>>> What the user actually expected was:
>>>
>>> argv[0] = gcc
>>> argv[1] = holy "crap.c
>>> argv[2] = -c
>>> argv[3] = -o
>>> argv[4] = holy "crap.o
>>>
>>> Obviously, it's not the same :(
>>
>> Thats obviously whats expected, and yes its not the same.  Although
>> inconvenient the above erroneous input would of course just give an
>> error, "holy crap.c -c -o holy" not found.  Well unless it is found :)
>
> It would probably just throw a "file not found" error in this case, but
> who know what's the filename and as you pointed out yourself, who knows
> what was the actual command.
>
>>> So again, what I want is to make sure the placeholders (%f & co.) are
>>> properly escaped/quoted/whatever so they never get misinterpreted like
>>> above.  And as I said, to actually achieve this we have to somewhat
>>> understand the quoting in the user command because we can't escape the
>>> same inside or outside shell quotes -- e.g. %f (without quotes) cannot
>>> be escaped the same way as "%f" (with the quotes).
>>
>> or %e.* for example?
>
> No, it's the same as unquoted %e, since it's outside a quote.
>
>>> But still nothing really clever trying to understand a meaning, just
>>> knowing the basics like '' and "" are quotes, and \ is an escape so we
>>> can know how to escape.  Actually the implementation I proposed simply
>>> closes any open quote before replacing a quoted placeholder, and then
>>> re-opens it.
>>>
>>> I.e. for the above example, it would replace as:
>>>
>>>         gcc ""'holy "crap.c'"" -c -o ""'holy "crap.o'""
>>>
>>> That's a bit ugly but you don't have to see it, and it's perfectly safe.
>>>  Also, just to make sure you see what I mean, if the user command didn't
>>> quote the %f and %e, and the filename simply contained a space:
>>>
>>> command:                gcc %f -c -o %e.o
>>> filename:               file name.c
>>> current replacement:    gcc file name.c -c -o file name.o
>>> fixed replacement:      gcc 'file name.c' -c -o 'file name'.o
>>>
>>> You see? :)
>>
>> Yeah, thats a normal command, but continuing my example from above,
>> what do you get if I have
>>
>> cmd %e.*
>
> It would give:
>
>         cmd 'xxx'.*

>
>> because if lets say the filename is xxx.c, if you get
>>
>> cmd "xxx.*"
>>
>> then the * won't be treated as a glob since its quoted, so my command
>> won't work.  It needs to be
>>
>> cmd "xxx."*
>
> Nope, it needs to be
>
>         cmd 'xxx'.*

yeah, ok the . is stripped from the %e.

>
> since we don't have to know what a . means, we just quote the
> placeholder replacement.
>
>> So is g_shell_quote smart enough for all possibilities of shell
>> metacharacters?  Including the full range of shell patterns, and
>> backquote command substitution and etc. Then again what if the
>> filename has a literal * in it, how does g_shell_quote know its
>> literal or if we meant a glob?
>
> `g_shell_quote()` escapes (or actually, quotes) the string we gives it,
> no matter what, we just don't pass it the user command.  I think we have
> a misunderstanding here, so let's be clear and since we all are
> programmers, let's write code :)   So here's a incredibly naive (and not
> correct enough, but see my real one for a less naive one) implementation
> in pseudo code:
>
>         def replace_placeholders(doc, cmd):
>                 f = os.path.basename(doc.filename)
>                 e = os.path.splitext(f)[0]
>                 d = os.path.dirname(doc.filename)
>                 p = app.project_path
>
>                 cmd.replace('%f', g_shell_quote(f))
>                 cmd.replace('%e', g_shell_quote(e))
>                 cmd.replace('%d', g_shell_quote(d))
>                 cmd.replace('%p', g_shell_quote(p))
>
>                 return cmd
>
> So you see that only the *replacement* is escaped/quoted, nothing else.
>  So nobody will touch your * literal or whatever.

Ok, I didn't get that before.  Pseudo code is good :)

So is it *allways* going to be correct to just quote part of the
command? (Apart from subcommands addressed below).

>
> OK, to chose appropriate quoting you may argue one needs to understand
> sub-command syntax.  Expanding the command:
>
>         cmd "$(cmd2 %f)"
>
> to
>
>         cmd "$(cmd2 "'xxx.c'")"
>
> is incorrect -- and that's what my patch would do, because it doesn't
> understand sub-shells.  We may or may not want to make sure to support
> sub-shells -- that's not that hard, but may also not be that useful, but
> whatever.

If we don't support any particular shell capabilities we need to be
very careful to document it, just so we have something to point to
when people complain, having not read the manual. :)

We do need to support backquote subcommands since that is needed for
GTK compilation to do pkg-config.

>
>>>>>> So, how to fix it?
>>>>>
>>>>> Don't, its not broke :)
>>>
>>> Yes it is, although it's not really annoying because commands have
>>> quotes around placeholders and filenames generally don't include "
>>> literals.  But try compiling a file whose name contains a " literal and
>>> you'll see it's broken ;)
>>
>> Heh, ok, but so is using a " literal in a filename in that case :)
>
> Agreed :)  However, I think one shouldn't need to explicitly quote the
> placeholders in the build commands (and I think we don't quote %e in
> `make %e.o` by default); so it also holds for spaces or any other shell
> metacharacter.
>

Heh, neither solution is always right, but the manual one is more
flexible, and uses much less code ;) for example you still have to add
code for the subshells and subcommands, but if its right then yes its
more convenient.

Cheers
Lex

>
> Cheers,
> Colomban
> _______________________________________________
> Devel mailing list
> Devel at lists.geany.org
> https://lists.geany.org/cgi-bin/mailman/listinfo/devel