New messages/output parsing proposition

List overview All Threads

newer

older

Glade Code Generation and...

Geany-Plugins: Please fix compiler...

Dimitar Zhekov

25 Aug 2011 25 Aug '11

7:34 p.m.

Hi,

0. Not having the Lex imagination, and not wanting to integrate a Real Parser into Geany and push it onto the users, I ended up with a pretty simple universal parsing proposition - mostly a combination of what we already have.

1. A message is parsed using:

regex: Extended syntax regular expression with subexpressions.

line_idx: 1-based index of the line number subexpression. If zero, Geany will read the first two (or three, if there is a third) matches, and check whether the first or second match is purely digits. If so, it'll try to use the matches as <line> [column] <filename>, or as <filename> <line> [column].

If line_idx is non-zero, the matching text must start with a number, and the following additional 1-based subexpression indexes may be specified:

file_idx - File name index. If 0 (not recommended), the current editor file name will be used. If non-zero, the matching text must be non-empty.

column_idx - Column # index, 0 = none. Ignored if the matching text does not start with a number.

warn_idx (error_regex only) - Warning index, 0 = none. If the matching text is not empty, the parsed message will be considered a warning rather that an error.

2. Some examples:

"^([:]+):([0-9]+): " file_idx=1, line_idx=2 the standard grep-line syntax

"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$" \ "|^([:]+):([0-9]+):([0-9]+:)? (warning: )?" file_idx=1, line_idx=2, col_idx=3, warn_idx=4 :) gcc/g++ syntax, ignoring libtool --mode=link (see msgwindow.c)

3. Recommendations for writing regular expressions:

The default file_idx is 0, so if you specity line_idx > 0, be sure to set file_idx > 0 too (unless the message does not contain a filename).

For more precise parsing, try to match the start or end of line, or some unambigous text.

Exclude the delimiter following an subexpression: "[^:]+:" instead of ".+:" (note that Geany assumes colon-less document names).

4. Rationalle for an explicitly specified line/column # to start with a number instead being purely digits:

You can always specify a purely-digit subexpression.

For some messages, it may be easier to define the line and column including some trailing text in the subexpression.

5. Initial implementation:

The buildin parsing will match the current behaviour, with some small improvements.

The list model should either be removed and left to filename-line- [column] parsing, or used to store the parsing results. Personally I prefer the former. Combining the parts into a string and parsing them back seems a bit awkward, but such things happen in programming.

6. Possible extensions:

More than one user-defined regex and set of indexes: error_regex_1, line_idx_1 etc.

When building, attempt all default expressions, and check if the filename matches the well-known extensions for this expression. For the 1st line in a build output, the current-file-type-expression is tried first. For the next lines, the last successful (best-matching) expression is tried first.

copy_idx - Index of the text to be copied on right-click Copy, 0 if none. If the matching text is empty, the entire line will be copied.

Allow the plugins to define their own expressions for the messages tab (which will make the list model obsolete).

RFC.

-- E-gards: Jimmy

Show replies by date

Lex Trotman

26 Aug 26 Aug

3:37 a.m.

New subject: [Geany-devel] New messages/output parsing proposition

Hi Dimitar,

Some good ideas.

On 26 August 2011 03:34, Dimitar Zhekov dimitar.zhekov@gmail.com wrote:

...

Hi,

Not having the Lex imagination,

Dunno about imagination, its more experience [1] :-)

and not wanting to integrate a Real

...

Parser into Geany and push it onto the users, I ended up with a pretty simple universal parsing proposition - mostly a combination of what we already have.

In general, I like the idea of being able to explicitly specify which capture groups supply which information, but unfortunately a single number doesn't work in many cases, eg

a rough C and Python file and line number capturing regex is:

(^([^:]+):([0-9]+):)|(, ('([^']+)', ([0-9]+))

for C file_idx = 2, line_idx = 3 and for Python file_idx = 5, line_idx = 6 so a single number won't work.

Instead the capture groups should be identified by name (with the option for non-unique names) eg:

(?J)(^(?<file>[^:]+):(?<line>[0-9]+):)|(, ('(?<file>[^']+)', (?<line>[0-9]+))

That assumes Geany uses a regex engine that supports named captures, eg GRegex.

Now since Geany 0.21 will depend on glib >=2.16 and GRegex was defined in 2.14 after 0.21 release we should remove the options for different regex engines and use only GRegex, making life simpler all round.

Also this allows the information to be specified in the regex, no extra fields needed in the UI dialogs to specify the indexes.

...

A message is parsed using:

regex: Extended syntax regular expression with subexpressions.

line_idx: 1-based index of the line number subexpression. If zero, Geany will read the first two (or three, if there is a third) matches, and check whether the first or second match is purely digits. If so, it'll try to use the matches as <line> [column] <filename>, or as <filename> <line> [column].

This can be the fallback if there are no named captures so existing regexii will still work.

[...]

...

Some examples:

"^([:]+):([0-9]+): " file_idx=1, line_idx=2 the standard grep-line syntax

[:]+ S/B [^:]+ ie "one or more not a :" not "one or more :"

...

"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$" \ "|^([:]+):([0-9]+):([0-9]+:)? (warning: )?" file_idx=1, line_idx=2, col_idx=3, warn_idx=4 :) gcc/g++ syntax, ignoring libtool --mode=link (see msgwindow.c)

Needs to escape the * after libtool, plus several instances of the problem in example 1.

...

Recommendations for writing regular expressions:

The default file_idx is 0, so if you specity line_idx > 0, be sure to set file_idx > 0 too (unless the message does not contain a filename).

For more precise parsing, try to match the start or end of line, or some unambigous text.

Exclude the delimiter following an subexpression: "[^:]+:" instead of ".+:" (note that Geany assumes colon-less document names).

Colon-less filenames is a reasonable assumption, although some stupid tool keeps creating a directory called ":" on my system, which may cause problems, but I havn't figured out which tool it is.

...

Rationalle for an explicitly specified line/column # to start with

a number instead being purely digits:

You can always specify a purely-digit subexpression.

For some messages, it may be easier to define the line and column including some trailing text in the subexpression.

Could use non-capturing groups for the stuff you don't want, just put the digits in a capturing group.

...

Initial implementation:

The buildin parsing will match the current behaviour, with some small improvements.

Yep, user defined regexen must still work, we don't want everybody to have to go back and edit all their regexen after they upgrade to 0.22.

...

The list model should either be removed and left to filename-line- [column] parsing, or used to store the parsing results. Personally I prefer the former. Combining the parts into a string and parsing them back seems a bit awkward, but such things happen in programming.

Possible extensions:

More than one user-defined regex and set of indexes: error_regex_1, line_idx_1 etc.

IMHO named is better, easier to understand and specify etc.

...

When building, attempt all default expressions, and check if the filename matches the well-known extensions for this expression. For the 1st line in a build output, the current-file-type-expression is tried first. For the next lines, the last successful (best-matching) expression is tried first.

To be exact, the decoding isn't filetype dependent, its compiler/linker/tool dependent.

So extracting the filetype by consensus from the error message and using that to decode the message isn't necessarily right, eg when your makefile is using a different compiler to the one that the standard filetype is defined for. Checking with multiple compilers is a good idea, eg clang is (in my limited experience) more standards enforcing than gcc so both should be tried.

Tool dependence means it depends on the command, which is why I suggested in another post that per-command expressions might be needed. Whilst verbose thats very simple :-)

...

copy_idx - Index of the text to be copied on right-click Copy, 0 if none. If the matching text is empty, the entire line will be copied.

Allow the plugins to define their own expressions for the messages tab (which will make the list model obsolete).

RFC.

Cheers Lex

[1] experience is a nice way of saying have fallen into that trap, made that mistake, implemented it the wrong way before etc.

Dimitar Zhekov

27 Aug 27 Aug

6:05 p.m.

New subject: [Geany-devel] New messages/output parsing proposition

Version 0.2

1. A message is parsed using a GRegex with the following subpaterns:

file file name, must match a non-empty text if defined line line #, must match a number if defined col column # warn Compiler only: if matched with non-empty text, the entire message is a warning rather than an error anon an error or warning message not related to a particular file; must not be specified with file or line, of course copy if matched with non-empty text, context menu Copy will copy that text instead of the entire message

All subpaterns are optional.

If none of file, line or anon is defined, Geany will read the first two (or three, if there is a third) matches, and check whether the first or second match is purely digits. If so, it'll try to use the matches as <line> [column] <file>, or as <file> <line> [column].

2. The following expressions are attempted:

On compile:

The best match regex from the previous line, if any (see below). User regex for the file type. Default regex for the file type. User independent commands regex.

On build:

All compile regex-es. All user regex-es for other file types. All default regex-es for other file types.

On messages tab (examples, subject to change):

Any plugin-defined regex-es. (?<file>[^:]+):(?<line>[0-9]+)(?:[:,](?<col>[0-9]+))?\W (?<file>[^)]+)(?<line>[0-9]+)(?:,(?<col>[0-9]+))[\W\D]

3. Maching quality:

Best: file with one of the known extensions for the file type, or file, line an column. Once such a match is found, no more expressions are attempted.

Average: file and line.

Weak: one of file, line or anon. Be sure not to specify too generic expressions.

No match: none of file, line or anon.

4. Recommendations for writing regular expressions:

Read GLib Reference Manual, Regular expression syntax.

For more precise parsing, try to match the start or end of line, or some unambigous text.

Exclude the delimiter following a subpattern: "[^:]+:" instead of ".+:" (note that Geany assumes colon-less document names), or use (?U) for non-greedy matching.

5. Initial implementation:

The default compiler expressions will match the current behaviour of parse_compiler_error_line(), with some small improvements. More precise regex-es should be written.

The list model will be removed. Default Messages regex-es plus the plugin defined ones should soffice.

On Fri, 26 Aug 2011 11:37:07 +1000 Lex Trotman elextr@gmail.com wrote:

...

That assumes Geany uses a regex engine that supports named captures, eg GRegex.

Now we're talking! Thanks, Lex.

...

[:]+ S/B [^:]+ ie "one or more not a :" not "one or more :"

Sure, lapsus keyboardus.

...

...
"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$"

...

Needs to escape the * after libtool, plus several instances of the problem in example 1.

The command is "libtool --mode=link", but I've seen it with 2 spaces, so it should actually be " +".

-- E-gards: Jimmy

Lex Trotman

28 Aug 28 Aug

7:32 a.m.

New subject: [Geany-devel] New messages/output parsing proposition

On 28 August 2011 02:05, Dimitar Zhekov dimitar.zhekov@gmail.com wrote:

...

Version 0.2

A message is parsed using a GRegex with the following subpaterns:

This looks good, minor comment below.

...

file file name, must match a non-empty text if defined line line #, must match a number if defined col column # warn Compiler only: if matched with non-empty text, the entire message is a warning rather than an error anon an error or warning message not related to a particular file; must not be specified with file or line, of course

Whats anon for? I guess its nice to make the error red, but we can't click on it to go anywhere since it doesn't have a file so its different to all the other red lines.

...

copy if matched with non-empty text, context menu Copy will copy that text instead of the entire message

And interesting idea.

...

All subpaterns are optional.

If none of file, line or anon is defined, Geany will read the first two (or three, if there is a third) matches, and check whether the first or second match is purely digits. If so, it'll try to use the matches as <line> [column] <file>, or as <file> <line> [column].

The following expressions are attempted:

If I understand it correctly then I strongly disagree with this section. Trying several expressions and choosing a "best" match is going to make configuration very complex and cause unexpected interactions. For example changing a command resulting in some slightly different output may cause unexpected changes in the "best" match giving wrong parses and confusion about where it is happening.

Only the regex associated with the command should be used, that is whatever is shown in the configuration GUI for the command. There is one for the filetype dependent commands and one for the filetype independent commands. The GUI has a priority based override system which is the same as for the commands themselves and includes the project settings (which you don't mention). See http://wiki.geany.org/howtos/configurebuildmenu for the details. Just use build_get_regex(). Using a regex that is hidden from the UI or is not associated with the command is bound to cause confusion.

The filetype commands are only intended to run commands that address one file, so they don't need to try any other regex anyway.

The filetype independent (make) commands are the ones that might run into multiple languages. Whilst it is a nice idea to try and identify the language and therefore get a good decode, it is liable to unexpected interactions as it looks at all filetypes and it assumes that the tool being used is one of those in the filetype sections. Better to assume developers working with mixed languages are competent enough to select the right expressions and or them together.

...

On compile:

The best match regex from the previous line, if any (see below). User regex for the file type. Default regex for the file type. User independent commands regex.

On build:

All compile regex-es. All user regex-es for other file types. All default regex-es for other file types.

On messages tab (examples, subject to change):

Any plugin-defined regex-es. (?<file>[^:]+):(?<line>[0-9]+)(?:[:,](?<col>[0-9]+))?\W (?<file>[^)]+)(?<line>[0-9]+)(?:,(?<col>[0-9]+))[\W\D]

Maching quality:

See 2. above.

...

Best: file with one of the known extensions for the file type, or file, line an column. Once such a match is found, no more expressions are attempted.

Average: file and line.

Weak: one of file, line or anon. Be sure not to specify too generic expressions.

No match: none of file, line or anon.

Recommendations for writing regular expressions:

Read GLib Reference Manual, Regular expression syntax.

Excellent advice!!

...

For more precise parsing, try to match the start or end of line, or some unambigous text.

Exclude the delimiter following a subpattern: "[^:]+:" instead of ".+:" (note that Geany assumes colon-less document names), or use (?U) for non-greedy matching.

(?U) makes all matches non-greedy. You can use +? *? and ?? to only make one match non-greedy, but testing for anything but the delimiter is better.

...

Initial implementation:

The default compiler expressions will match the current behaviour of parse_compiler_error_line(), with some small improvements. More precise regex-es should be written.

Agree.

...

The list model will be removed. Default Messages regex-es plus the plugin defined ones should soffice.

Plugins can't set regexes ATM because none of build.h is in the API, see other thread. This is something to be re-visited.

...

--

On Fri, 26 Aug 2011 11:37:07 +1000 Lex Trotman elextr@gmail.com wrote:

...
That assumes Geany uses a regex engine that supports named captures, eg GRegex.

Now we're talking! Thanks, Lex.

...
[:]+ S/B [^:]+ ie "one or more not a :" not "one or more :"

Sure, lapsus keyboardus.

...
...
"libtool *--mode=link| from ([:]+):([0-9]+)[,:]([0-9]+[,:])?$"

...
Needs to escape the * after libtool, plus several instances of the problem in example 1.

The command is "libtool --mode=link", but I've seen it with 2 spaces, so it should actually be " +".

Ah, lapsus understandus, you actually meant it to be a repeat, but yeah + is better.

Cheers Lex

Dimitar Zhekov

8:44 p.m.

New subject: [Geany-devel] New messages/output parsing proposition

On Sun, 28 Aug 2011 15:32:19 +1000 Lex Trotman elextr@gmail.com wrote:

...

Whats anon for? I guess its nice to make the error red, but we can't click on it to go anywhere since it doesn't have a file so its different to all the other red lines.

To have it colored as error or warning. A bit of FX. :)

...

...

The following expressions are attempted:

If I understand it correctly then I strongly disagree with this section. Trying several expressions and choosing a "best" match is going to make configuration very complex and cause unexpected interactions. For example changing a command resulting in some slightly different output may cause unexpected changes in the "best" match giving wrong parses and confusion about where it is happening.

Best ::= one of the matches defined as Best, which are unlikely to produce wrong results. Primary purpose: avoid matching against all filetype expressions on build.

An inflexible matching with "slightly" different output will give you the expected beaviour: worse parsing, or none at all.

...

Only the regex associated with the command should be used, that is whatever is shown in the configuration GUI for the command.

I know how the project / filetype / default hierarhy works, but should have been more clear:

User regex for the file type ::= project, fallback filetype file User independent commands regex ::= project, fallback geany.conf

...

Just use build_get_regex().

For compile, that's exactly what I'm going to do.

...

Using a regex that is hidden from the UI or is not associated with the command is bound to cause confusion.

Currently msgwin_parse_compiler_error_line() attempts filetypes_parse_error_message(), which uses build_get_regex(), *and* then falls back to parse_compiler_error_line().

So in "On compile", I'm reproducing the present logic, extended with best match. The latter really seems like an overkill for a single file, and I'll remove it.

...

The filetype independent (make) commands are the ones that might run into multiple languages. [...] Better to assume developers working with mixed languages are competent enough to select the right expressions and or them together.

And always make the proper changes for a slightly different output. Why even write a new parser then? With enough | separated patterns, even the current one will work just fine. It would be nice to have warn/anon/copy, but they are only extras.

...

(?U) makes all matches non-greedy.

(?option) affects the text to the end of the current subpattern, or to the entire pattern if specified outside a subpattern.

It may be better to set ungreedy by default, (?-U) is still available, but I didn't want to modify the default POSIX-compatible behaviour.

...

...
The list model will be removed. Default Messages regex-es plus the plugin defined ones should soffice.

Plugins can't set regexes ATM because none of build.h is in the API, see other thread. This is something to be re-visited.

That's for the Messages only, "On messages tab".

-- E-gards: Jimmy

Lex Trotman

29 Aug 29 Aug

3:16 a.m.

New subject: [Geany-devel] New messages/output parsing proposition

Hi Dimitar,

I'm afraid I am still confused by a few of your answers.

...

...
Whats anon for? I guess its nice to make the error red, but we can't click on it to go anywhere since it doesn't have a file so its different to all the other red lines.

To have it colored as error or warning. A bit of FX. :)

Bet we get bug reports that users can't click on it :-)

...

...
...

The following expressions are attempted:

If I understand it correctly then I strongly disagree with this section. Trying several expressions and choosing a "best" match is going to make configuration very complex and cause unexpected interactions. For example changing a command resulting in some slightly different output may cause unexpected changes in the "best" match giving wrong parses and confusion about where it is happening.

Best ::= one of the matches defined as Best, which are unlikely to produce wrong results. Primary purpose: avoid matching against all filetype expressions on build.

Thats easy to avoid, don't do it :)

...

An inflexible matching with "slightly" different output will give you the expected beaviour: worse parsing, or none at all.

1. and the fact that it isn't right will be visible to the user, black error messages 2. then they will know where to look to fix it instead of wondering which regex might be being used 3. trying a set of random regexes from different languages and sources isn't the way to go. a. you don't know which one it is going to use, remember these are user settings, they may not obey your listed rules for good regexes. b. there is a performance hit testing multiple regexes

...

...
Only the regex associated with the command should be used, that is whatever is shown in the configuration GUI for the command.

I know how the project / filetype / default hierarhy works, but should have been more clear:

User regex for the file type ::= project, fallback filetype file

Sorry whats a fallback filetype file? The hierarchy is project filetype setting, user filetype file setting, system filetype file setting, default (is currently blank).

...

User independent commands regex ::= project, fallback geany.conf

Do you mean project filetype independent setting, user geany.conf filetype independent setting, system geany.conf independent setting (currently blank)?

...

...
Just use build_get_regex().

For compile, that's exactly what I'm going to do.

...
Using a regex that is hidden from the UI or is not associated with the command is bound to cause confusion.

Currently msgwin_parse_compiler_error_line() attempts filetypes_parse_error_message(), which uses build_get_regex(), *and* then falls back to parse_compiler_error_line().

I should have been clear that I think that the current behavior is wrong, it should be: if there is a regex use it, otherwise use some default fallback.

If the user has set the regex to specify what is wanted as an error, they don't want Geany then second guessing them as well, and adding extra errors that are wrong. If they havn't specified a regex then a default fallback is ok.

...

So in "On compile", I'm reproducing the present logic, extended with best match. The latter really seems like an overkill for a single file, and I'll remove it.

Agree, since filetype commands are, well, filetype specific, its best not to confuse it.

...

...
The filetype independent (make) commands are the ones that might run into multiple languages. [...] Better to assume developers working with mixed languages are competent enough to select the right expressions and or them together.

And always make the proper changes for a slightly different output.

users, you just can't trust 'em :-)

...

Why even write a new parser then? With enough | separated patterns, even the current one will work just fine. It would be nice to have warn/anon/copy, but they are only extras.

Well, I am coming to the conclusion after these discussions that you don't need to write a whole new parser (just saved you lots of work :), rather add the ability to use multiple occurrences of named captures to regexes. Without that the | can't work. Warn in a different colour is worthwhile, anon and copy are, as you say, extras which wouldn't take much to add after named captures are added.

Then if you really want the fun, the fallback when no regex is specified could be upgraded from the current hardcoded routine to use your multiple language search, best match, super dooper[1] technology, but with a preference to turn it on/off in case it produces unwanted results.

...

...
(?U) makes all matches non-greedy.

(?option) affects the text to the end of the current subpattern, or to the entire pattern if specified outside a subpattern.

Yes, I was just adding other options to what I thought was your prototype documentation.

...

It may be better to set ungreedy by default, (?-U) is still available, but I didn't want to modify the default POSIX-compatible behaviour.

Yes, it is best to leave it with the standard behavior, then it will work for those who don't read the manual (90%).

...

...
...
The list model will be removed. Default Messages regex-es plus the plugin defined ones should soffice.

Plugins can't set regexes ATM because none of build.h is in the API, see other thread. This is something to be re-visited.

That's for the Messages only, "On messages tab".

I don't understand what you mean, sorry.

Cheers Lex

[1] http://www.urbandictionary.com/define.php?term=super+dooper

Dimitar Zhekov

8:04 p.m.

New subject: [Geany-devel] New messages/output parsing proposition

On Mon, 29 Aug 2011 11:16:05 +1000 Lex Trotman elextr@gmail.com wrote:

...

...
...
Whats anon for? [...]

To have it colored as error or warning. A bit of FX. :)

Bet we get bug reports that users can't click on it :-)

I can trust them that much... :)

...

...
Best ::= one of the matches defined as Best, which are unlikely to produce wrong results. Primary purpose: avoid matching against all filetype expressions on build.

Thats easy to avoid, don't do it :)

Yes, it can be easily avoided by by writing the expressions to require one of the known extensions for the file type, and classifying the filename-line-column match as "Good". So the "Best" becomes absolute.

...

...
...
Only the regex associated with the command should be used, that is whatever is shown in the configuration GUI for the command.

I know how the project / filetype / default hierarhy works, but should have been more clear:

User regex for the file type ::= project, fallback filetype file

Sorry whats a fallback filetype file? The hierarchy is project filetype setting, user filetype file setting, system filetype file setting, default (is currently blank).

The home filetype or the system filetype file.

What is the "default (is currently blank)"?.. I searched Geany for global fallback expression(s) short time ago, but all I could find was parse_compiler_error_line().

...

I should have been clear that I think that the current behavior is wrong, it should be: if there is a regex use it, otherwise use some default fallback.

...

...
...
The filetype independent (make) commands are the ones that might run into multiple languages. [...] Better to assume developers working with mixed languages are competent enough to select the right expressions and or them together.

I imagine myself navigating thru the build dialog, copy-pasting all filetype expressions that can be relevant to the project, for every project. And then changing all of them on each slightly different outout... And then I'll finally sit back and say: "There is nobody to blameth, cause I brouth this on myself". :)

...

...
Why even write a new parser then? With enough | separated patterns, even the current one will work just fine. It would be nice to have warn/anon/copy, but they are only extras.

Well, I am coming to the conclusion after these discussions that you don't need to write a whole new parser (just saved you lots of work :), rather add the ability to use multiple occurrences of named captures to regexes. Without that the | can't work.

With enough | expressions, anything will work. I'll can very long, but a regex limit is 64K...

...

Then if you really want the fun, the fallback when no regex is specified could be upgraded from the current hardcoded routine to use your multiple language search, best match, super dooper[1] technology, but with a preference to turn it on/off in case it produces unwanted results.

You saved me even more work. With a single regex defined in 54 system filetypes, and 4 mentions of the message parsing in 1320 tracker items, It odesn't look like anybody needs a new parser, much less anyone is going to define complex new-style regex-es.

So I'd better head to more practical things: column support and optional underlining for the Messages tab, and optional mark all Find Usage matches.

BTW, how about closing 3039654, and writing some comment for 3156609?

...

...
...
...
The list model will be removed. Default Messages regex-es plus the plugin defined ones should soffice.

Plugins can't set regexes ATM because none of build.h is in the API, see other thread. This is something to be re-visited.

That's for the Messages only, "On messages tab".

I don't understand what you mean, sorry.

In the Message Window, the "Messages" tab on the left. It's parsing is completely different from the "Compiler" tab.

-- E-gards: Jimmy

Lex Trotman

30 Aug 30 Aug

1:59 a.m.

New subject: [Geany-devel] New messages/output parsing proposition

...

...
Thats easy to avoid, don't do it :)

Yes, it can be easily avoided by by writing the expressions to require one of the known extensions for the file type, and classifying the filename-line-column match as "Good". So the "Best" becomes absolute.

Good point, the expression could handle multiple extensions too, eg ^(?<file>.+?(?:cpp|hpp|h|cxx|hxx)):(?<line>\d+):(?<col>\d+)? this makes it deterministic.

[...]

...

What is the "default (is currently blank)"?.. I searched Geany for global fallback expression(s) short time ago, but all I could find was parse_compiler_error_line().

Sorry, build_get_regex() used to also look at default, but I forgot it got removed.

[...]

...

I imagine myself navigating thru the build dialog, copy-pasting all filetype expressions that can be relevant to the project, for every project. And then changing all of them on each slightly different outout... And then I'll finally sit back and say: "There is nobody to blameth, cause I brouth this on myself". :)

Indeed, but its better that shouting, "Where the [long string of expletives deleted] is it getting that from ..." :-)

[...]

...

With enough | expressions, anything will work. I'll can very long, but a regex limit is 64K...

Just how many languages do you mix? You only need the ones you use in the project. Most sane people tend to limit themselves to only a few, one or two code plus one or two documentation. Like Geany does. People who don't limit themselves become insane :-D

...

...
Then if you really want the fun, the fallback when no regex is specified could be upgraded from the current hardcoded routine to use your multiple language search, best match, super dooper[1] technology, but with a preference to turn it on/off in case it produces unwanted results.

You saved me even more work. With a single regex defined in 54 system filetypes, and 4 mentions of the message parsing in 1320 tracker items, It odesn't look like anybody needs a new parser, much less anyone is going to define complex new-style regex-es.

Good analysis, as I said in the past, there is only a single filetype regex because the hard coded method came first and no-one has had time to change anything. Its not because they think the current system is best.

Of course we don't know how many user defined regexes are out there working perfectly and silently.

...

So I'd better head to more practical things: column support and optional underlining for the Messages tab, and optional mark all Find Usage matches.

Well, using named captures and so allowing multi-language regexes is a useful improvement. Having to select a C++ file before building is annoying (Python only throws one error at a time so its better to get the C++ errors parsed). The named captures also helps your column support be more robust.

...

BTW, how about closing 3039654, and writing some comment for 3156609?

Sure, I'm currently behind a corporate firewall which doesn't let me login to sourceforge (or the wiki) and looks like being another couple of weeks before I'm finished. So if someone else wants to close the first and note on the second that the documentation already says new entries override old ones, see http://www.geany.org/manual/current/index.html#old-settings then it won't be forgotten, thanks.

...

[...]

...

In the Message Window, the "Messages" tab on the left. It's parsing is completely different from the "Compiler" tab.

Ahhh, I had forgotten that the messages window parsed anything.

Cheers Lex

4895

Age (days ago)

4899

Last active (days ago)

devel@lists.geany.org

7 comments

2 participants

tags (0)

participants (2)

Dimitar Zhekov
Lex Trotman