Hi,
0. Not having the Lex imagination, and not wanting to integrate a Real Parser into Geany and push it onto the users, I ended up with a pretty simple universal parsing proposition - mostly a combination of what we already have.
1. A message is parsed using:
regex: Extended syntax regular expression with subexpressions.
line_idx: 1-based index of the line number subexpression. If zero, Geany will read the first two (or three, if there is a third) matches, and check whether the first or second match is purely digits. If so, it'll try to use the matches as <line> [column] <filename>, or as <filename> <line> [column].
If line_idx is non-zero, the matching text must start with a number, and the following additional 1-based subexpression indexes may be specified:
file_idx - File name index. If 0 (not recommended), the current editor file name will be used. If non-zero, the matching text must be non-empty.
column_idx - Column # index, 0 = none. Ignored if the matching text does not start with a number.
warn_idx (error_regex only) - Warning index, 0 = none. If the matching text is not empty, the parsed message will be considered a warning rather that an error.
2. Some examples:
"^([:]+):([0-9]+): " file_idx=1, line_idx=2 the standard grep-line syntax
"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$" \ "|^([:]+):([0-9]+):([0-9]+:)? (warning: )?" file_idx=1, line_idx=2, col_idx=3, warn_idx=4 :) gcc/g++ syntax, ignoring libtool --mode=link (see msgwindow.c)
3. Recommendations for writing regular expressions:
The default file_idx is 0, so if you specity line_idx > 0, be sure to set file_idx > 0 too (unless the message does not contain a filename).
For more precise parsing, try to match the start or end of line, or some unambigous text.
Exclude the delimiter following an subexpression: "[^:]+:" instead of ".+:" (note that Geany assumes colon-less document names).
4. Rationalle for an explicitly specified line/column # to start with a number instead being purely digits:
You can always specify a purely-digit subexpression.
For some messages, it may be easier to define the line and column including some trailing text in the subexpression.
5. Initial implementation:
The buildin parsing will match the current behaviour, with some small improvements.
The list model should either be removed and left to filename-line- [column] parsing, or used to store the parsing results. Personally I prefer the former. Combining the parts into a string and parsing them back seems a bit awkward, but such things happen in programming.
6. Possible extensions:
More than one user-defined regex and set of indexes: error_regex_1, line_idx_1 etc.
When building, attempt all default expressions, and check if the filename matches the well-known extensions for this expression. For the 1st line in a build output, the current-file-type-expression is tried first. For the next lines, the last successful (best-matching) expression is tried first.
copy_idx - Index of the text to be copied on right-click Copy, 0 if none. If the matching text is empty, the entire line will be copied.
Allow the plugins to define their own expressions for the messages tab (which will make the list model obsolete).
RFC.
-- E-gards: Jimmy