[Geany-devel] New messages/output parsing proposition

Dimitar Zhekov dimitar.zhekov at xxxxx
Thu Aug 25 17:34:59 UTC 2011


0. Not having the Lex imagination, and not wanting to integrate a Real
Parser into Geany and push it onto the users, I ended up with a pretty
simple universal parsing proposition - mostly a combination of what
we already have.

1. A message is parsed using:

regex: Extended syntax regular expression with subexpressions.

line_idx: 1-based index of the line number subexpression. If zero,
Geany will read the first two (or three, if there is a third) matches,
and check whether the first or second match is purely digits. If so,
it'll try to use the matches as <line> [column] <filename>, or as
<filename> <line> [column].

If line_idx is non-zero, the matching text must start with a number,
and the following additional 1-based subexpression indexes may be

file_idx - File name index. If 0 (not recommended), the current editor
file name will be used. If non-zero, the matching text must be

column_idx - Column # index, 0 = none. Ignored if the matching text
does not start with a number.

warn_idx (error_regex only) - Warning index, 0 = none. If the matching
text is not empty, the parsed message will be considered a warning
rather that an error.

2. Some examples:

"^([:]+):([0-9]+): "
file_idx=1, line_idx=2
the standard grep-line syntax

"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$" \
	"|^([:]+):([0-9]+):([0-9]+:)? (warning: )?"
file_idx=1, line_idx=2, col_idx=3, warn_idx=4 :)
gcc/g++ syntax, ignoring libtool --mode=link (see msgwindow.c)

3. Recommendations for writing regular expressions:

The default file_idx is 0, so if you specity line_idx > 0, be sure to
set file_idx > 0 too (unless the message does not contain a filename).

For more precise parsing, try to match the start or end of line, or
some unambigous text.

Exclude the delimiter following an subexpression: "[^:]+:" instead of
".+:" (note that Geany assumes colon-less document names).

4. Rationalle for an explicitly specified line/column # to start with
a number instead being purely digits:

You can always specify a purely-digit subexpression.

For some messages, it may be easier to define the line and column
including some trailing text in the subexpression.

5. Initial implementation:

The buildin parsing will match the current behaviour, with some
small improvements.

The list model should either be removed and left to filename-line-
[column] parsing, or used to store the parsing results. Personally I
prefer the former. Combining the parts into a string and parsing them
back seems a bit awkward, but such things happen in programming.

6. Possible extensions:

More than one user-defined regex and set of indexes: error_regex_1,
line_idx_1 etc.

When building, attempt all default expressions, and check if the
filename matches the well-known extensions for this expression. For
the 1st line in a build output, the current-file-type-expression is
tried first. For the next lines, the last successful (best-matching)
expression is tried first.

copy_idx - Index of the text to be copied on right-click Copy, 0 if
none. If the matching text is empty, the entire line will be copied.

Allow the plugins to define their own expressions for the messages
tab (which will make the list model obsolete).


E-gards: Jimmy

