On 19/01/2014 14:51, Dimitar Zhekov wrote:
On Sat, 18 Jan 2014 17:10:37 +0200 Arthur Rosenstein artros.misc@gmail.com wrote:
I took a quick glance at patch 11 [...] It treats compiler "column numbers" as if they're actually column numbers, while in reality they almost never are. Which means that it's not going to work right as soon as the code contains tabs or multibyte characters.
Hmmm... :)
<tab><tab>(s - text == 10 && "боза" xxx ... [utf-8]
utils.c:643:33: error: expected ')' before 'xxx'
Double click on the message: seeks to the first 'x', Geany status bar shows line 643, column 43 (column # depends on the tab size).
But it *does* seek to 'а' if "боза" is locale.
Indeed, my bad. You're not treating "column numbers" as column numbers but rather UTF-8 byte offsets. That's definitely more common, but not universal still.
Since it doesn't use named capture groups for the message parsing, it's a little less flexible in the type of messages it can parse
line [column] filename, or filename line [column]
If no regex is specified (and most people leave it blank I think), the "fallback" Geany parser requires only one of filename or line (default filename == that of the current document).
With a regex, both the filename and line are mandatory (though the regex Geany parser can be modified to make the line optional).
P11 adds an optional column without altering the above behaviour.
I think it's a better idea to reduce the surface area by moving everything to the regex based parser and getting rid of the hard-coded one. I left it as a fallback just to be on the safe side, but if regex based parsing proves to work well for a release or two it can be safely dropped.
and it also cannot handle different styles of error messages in the same regex.
expr1|expr2|...
Won't work. See blow.
Or if you mean errors vs. warnings, Geany does not support such a distinction, and P11 only adds an optional column.
That's not what I mean.
The example given in the documentation is actually not going to work when the column number is missing.
echo "test.py:7:24: E202 whitespace before ']'" | egrep '([^:]+):([0-9] +):([0-9]+): |([^:]+):([0-9]+): ' test.py:7:24: E202 whitespace before ']'
echo "test.py:7: E202 whitespace before ']'" | egrep '([^:]+):([0-9]+): ([0-9]+): |([^:]+):([0-9]+): ' test.py:7: E202 whitespace before ']'
Did you actually test it? You're using g_match_info_fetch() to read the submatches of the first three *capture groups*. Those are indexed independently of the string you're trying to match. So, even though the pattern does match the error message in the file:line case, you're reading the source-location info from the capture groups that didn't match anything. Note that in this case the solution is simple enough: "([^:]+):([0-9]+)(?::([0-9]+))?: ", but the general limitation is still there.
Overall, P11 is a minimal implementation, which handles most of the cases, with very small regex changes (an optional 3rd match). PR 191 is the maximum implementation. Both have pros and cons.
I think this is a slight overstatement. PR191 is really not *that* much bigger or radical, and a lot of the code that it adds is modular and can be used for other things (which I do, FWIW, in the Compiler Errors window for example).