[Geany-devel] New messages/output parsing proposition

Thu Aug 25 17:34:59 UTC 2011

Hi,

0. Not having the Lex imagination, and not wanting to integrate a Real
Parser into Geany and push it onto the users, I ended up with a pretty
simple universal parsing proposition - mostly a combination of what
we already have.

1. A message is parsed using:

regex: Extended syntax regular expression with subexpressions.

line_idx: 1-based index of the line number subexpression. If zero,
Geany will read the first two (or three, if there is a third) matches,
and check whether the first or second match is purely digits. If so,
it'll try to use the matches as <line> [column] <filename>, or as
<filename> <line> [column].

If line_idx is non-zero, the matching text must start with a number,
and the following additional 1-based subexpression indexes may be
specified:

file_idx - File name index. If 0 (not recommended), the current editor
file name will be used. If non-zero, the matching text must be
non-empty.

column_idx - Column # index, 0 = none. Ignored if the matching text
does not start with a number.

warn_idx (error_regex only) - Warning index, 0 = none. If the matching
text is not empty, the parsed message will be considered a warning
rather that an error.

2. Some examples:

"^([:]+):([0-9]+): "
file_idx=1, line_idx=2
the standard grep-line syntax

"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$" \
	"|^([:]+):([0-9]+):([0-9]+:)? (warning: )?"
file_idx=1, line_idx=2, col_idx=3, warn_idx=4 :)
gcc/g++ syntax, ignoring libtool --mode=link (see msgwindow.c)

3. Recommendations for writing regular expressions:

The default file_idx is 0, so if you specity line_idx > 0, be sure to
set file_idx > 0 too (unless the message does not contain a filename).

For more precise parsing, try to match the start or end of line, or
some unambigous text.

Exclude the delimiter following an subexpression: "[^:]+:" instead of
".+:" (note that Geany assumes colon-less document names).

4. Rationalle for an explicitly specified line/column # to start with
a number instead being purely digits:

You can always specify a purely-digit subexpression.

For some messages, it may be easier to define the line and column
including some trailing text in the subexpression.

5. Initial implementation:

The buildin parsing will match the current behaviour, with some
small improvements.

The list model should either be removed and left to filename-line-
[column] parsing, or used to store the parsing results. Personally I
prefer the former. Combining the parts into a string and parsing them
back seems a bit awkward, but such things happen in programming.

6. Possible extensions:

More than one user-defined regex and set of indexes: error_regex_1,
line_idx_1 etc.

When building, attempt all default expressions, and check if the
filename matches the well-known extensions for this expression. For
the 1st line in a build output, the current-file-type-expression is
tried first. For the next lines, the last successful (best-matching)
expression is tried first.

copy_idx - Index of the text to be copied on right-click Copy, 0 if
none. If the matching text is empty, the entire line will be copied.

Allow the plugins to define their own expressions for the messages
tab (which will make the list model obsolete).

RFC.

--
E-gards: Jimmy