[Geany-devel] New messages/output parsing proposition

Sun Aug 28 05:32:19 UTC 2011

On 28 August 2011 02:05, Dimitar Zhekov <dimitar.zhekov at gmail.com> wrote:
> Version 0.2
>
> 1. A message is parsed using a GRegex with the following subpaterns:
>

This looks good, minor comment below.

> file    file name, must match a non-empty text if defined
> line    line #, must match a number if defined
> col     column #
> warn    Compiler only: if matched with non-empty text, the entire
>        message is a warning rather than an error
> anon    an error or warning message not related to a particular
>        file; must not be specified with file or line, of course

Whats anon for? I guess its nice to make the error red, but we can't
click on it to go anywhere since it doesn't have a file so its
different to all the other red lines.

> copy    if matched with non-empty text, context menu Copy will copy
>        that text instead of the entire message
>

And interesting idea.

> All subpaterns are optional.
>
> If none of file, line or anon is defined, Geany will read the first
> two (or three, if there is a third) matches, and check whether the
> first or second match is purely digits. If so, it'll try to use the
> matches as <line> [column] <file>, or as <file> <line> [column].
>
>
> 2. The following expressions are attempted:
>

If I understand it correctly then I strongly disagree with this
section.  Trying several expressions and choosing a "best" match is
going to make configuration very complex and cause unexpected
interactions.  For example changing a command resulting in some
slightly different output may cause unexpected changes in the "best"
match giving wrong parses and confusion about where it is happening.

Only the regex associated with the command should be used, that is
whatever is shown in the configuration GUI for the command.  There is
one for the filetype dependent commands and one for the filetype
independent commands.  The GUI has a priority based override system
which is the same as for the commands themselves and includes the
project settings (which you don't mention).  See
http://wiki.geany.org/howtos/configurebuildmenu for the details.  Just
use build_get_regex().  Using a regex that is hidden from the UI or is
not associated with the command is bound to cause confusion.

The filetype commands are only intended to run commands that address
one file, so they don't need to try any other regex anyway.

The filetype independent (make) commands are the ones that might run
into multiple languages.  Whilst it is a nice idea to try and identify
the language and therefore get a good decode, it is liable to
unexpected interactions as it looks at all filetypes and it assumes
that the tool being used is one of those in the filetype sections.
Better to assume developers working with mixed languages are competent
enough to select the right expressions and or them together.

> On compile:
>
> The best match regex from the previous line, if any (see below).
> User regex for the file type.
> Default regex for the file type.
> User independent commands regex.
>
> On build:
>
> All compile regex-es.
> All user regex-es for other file types.
> All default regex-es for other file types.
>
> On messages tab (examples, subject to change):
>
> Any plugin-defined regex-es.
> (?<file>[^:]+):(?<line>[0-9]+)(?:[:,](?<col>[0-9]+))?\W
> (?<file>[^)]+)\(?<line>[0-9]+)(?:,(?<col>[0-9]+)\)[\W\D]
>
>
> 3. Maching quality:
>

See 2. above.

> Best: file with one of the known extensions for the file type, or
> file, line an column. Once such a match is found, no more expressions
> are attempted.
>
> Average: file and line.
>
> Weak: one of file, line or anon. Be sure not to specify too generic
> expressions.
>
> No match: none of file, line or anon.
>
>
> 4. Recommendations for writing regular expressions:
>
> Read GLib Reference Manual, Regular expression syntax.

Excellent advice!!

>
> For more precise parsing, try to match the start or end of line, or
> some unambigous text.
>
> Exclude the delimiter following a subpattern: "[^:]+:" instead of
> ".+:" (note that Geany assumes colon-less document names), or use
> (?U) for non-greedy matching.

(?U) makes all matches non-greedy.  You can use +? *? and ?? to only
make one match non-greedy, but testing for anything but the delimiter
is better.

>
>
> 5. Initial implementation:
>
> The default compiler expressions will match the current behaviour
> of parse_compiler_error_line(), with some small improvements. More
> precise regex-es should be written.

Agree.

>
> The list model will be removed. Default Messages regex-es plus the
> plugin defined ones should soffice.

Plugins can't set regexes ATM because none of build.h is in the API,
see other thread.  This is something to be re-visited.

>
> --
>
> On Fri, 26 Aug 2011 11:37:07 +1000
> Lex Trotman <elextr at gmail.com> wrote:
>
>> That assumes Geany uses a regex engine that supports named captures,
>> eg GRegex.
>
> Now we're talking! Thanks, Lex.
>
>> [:]+ S/B [^:]+  ie "one or more not a :" not "one or more :"
>
> Sure, lapsus keyboardus.
>
>> > "libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$"
>
>> Needs to escape the * after libtool, plus several instances of the
>> problem in example 1.
>
> The command is "libtool --mode=link", but I've seen it with 2 spaces,
> so it should actually be " +".

Ah, lapsus understandus, you actually meant it to be a repeat, but
yeah + is better.

Cheers
Lex