On Sun, 19 Jan 2014 19:34:01 +0200 Arthur Rosenstein artros.misc@gmail.com wrote:
utils.c:643:33: error: expected ')' before 'xxx'
Double click on the message: seeks to the first 'x', Geany status bar shows line 643, column 43 (column # depends on the tab size).
But it *does* seek to 'а' if "боза" is locale.
Indeed, my bad. You're not treating "column numbers" as column numbers but rather UTF-8 byte offsets. That's definitely more common, but not universal still.
A truly universal solution should take into account whether the compiler emits utf-8 columns (java?/net?/python3?), or what is the file locale + whether the compiler takes into account the system locale + what is the system locale. Geany sets the GLib I/O channel encoding to NULL "safe to use with binary data", so at least that should not have any effect.
echo "test.py:7: E202 whitespace before ']'" | egrep '([^:]+):([0-9]+):([0-9]+):|([^:]+):([0-9]+): ' test.py:7: E202 whitespace before ']'
Did you actually test it? You're using g_match_info_fetch() to read the submatches of the first three *capture groups*. Those are indexed independently of the string you're trying to match. So, even though the pattern does match the error message in the file:line case, you're reading the source-location info from the capture groups that didn't match anything.
No, I assumed that Geany works as documented, and reads the first two matches. Looking at regex(1), it probably worked like that before switching to GRegex.
Note that in this case the solution is simple enough: "([^:]+):([0-9]+)(?::([0-9]+))?: ", but the general limitation is still there.
It's not a limitation, but a bug in the current regex parser. We should either document that the first two capturing groups are used, no matter if they match, or skip the non-matching groups (which is quite easy). After that, P11 can be updated - fixing the regex parser is outside it's scope.