Had to create the new pull request quickly - not having a huge ctags-related pull request in Geany brings bad luck!
This pull request brings the changes from upstream ctags made in the last 2 years so that parsers are compatible with uctags parsers. This isn't a complete sync to avoid too big pull request, just the minimal changes required so syncing parsers isn't blocked. In addition, to verify that subparser support works, this pull request uses html, javascript and css parsers from upstream and enables subparser support for HTML.
In general most of the patches are fortunately boring, the "interesting" ones are:
8822e4e - using kindIndex instead of kindDefinition. This one took a bit of work and I hope I didn't introduce any problems on the way. In some functions which aren't relevant to Geany I left just function stubs to avoid extra syncing with upstream. This patch also disables regex parsing - the parsing is only disabled inside ctags, I kept the regex parsers there for now until we decide what we want to do about them.
72d4b5d - using upstream javascript parser. I haven't verified completely if the generated tags are correct (seems OK to someone who doesn't really know javascript) but better to check by somebody else.
5de9b3e - Support subparsers in Geany. The biggest part of the patch is additional mapping of subparser tag types to the tag type used by Geany. This is necessary because we don't want subparser tag types to clash with main parser tag types.
I tried to copy all the parsers from upstream to Geany to see if it compiles (I have only tried the compilation part, not if the parsers actually work correctly) and the only ones that didn't compile because of upstream changes were:
1. asm - would need upstream preprocessor.c/h implementation. But it probably wouldn't be hard to convert it to our lcpp.c/h
2. make and tcl parsers - need subparser.c/h and it would require more syncs with upstream
3. c.c and cxx parsers - I haven't actually tried these beasts You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/2018
-- Commit Summary --
* Use latest version of htable * Use latest version of mio * Use latest version of objpool * Use latest version of ptrarray * Use latest version of vstring * Rename fieldSpec to fieldDefinition * Rename kindOption to kindDefinition * Rename kinds field in parserDefinition to kindTable * Rename structure fields about field in parserDefinition * Use kindIndex instead of kindDefinition * Rename roleDesc to roleDefinition * Add XTAG_ANONYMOUS used by jscript * Include stdint.h in entry.h * Don't use hash value as an Anonymous field identifier * Call anonReset in main part * Use upstream javascript parser * Use upstream css parser * Create correctly sized MIO for 0 size * Always enable promise API and subparsers for Geany * Support subparsers in Geany and add HTML parser demonstrating this feature * Rename truncateLine field of tagEntryInfo * Add dummy mbcs.h and trace.h * Introduce an accessor to `locate' field of `Option' * Add numarray.c/h * Add getLanguageForFilename() and getLanguageForCommand()
-- File Changes --
M ctags/Makefile.am (7) M ctags/main/ctags-api.c (10) M ctags/main/ctags-api.h (1) M ctags/main/dependency.c (12) M ctags/main/entry.c (86) M ctags/main/entry.h (25) M ctags/main/field.c (63) M ctags/main/field.h (6) M ctags/main/htable.c (71) M ctags/main/htable.h (24) A ctags/main/inline.h (26) M ctags/main/kind.c (27) M ctags/main/kind.h (27) M ctags/main/lcpp.c (16) M ctags/main/lcpp.h (2) M ctags/main/lregex.c (34) M ctags/main/lxcmd.c (36) M ctags/main/lxpath.c (15) A ctags/main/mbcs.h (1) M ctags/main/mio.c (32) M ctags/main/mio.h (2) A ctags/main/numarray.c (175) A ctags/main/numarray.h (48) M ctags/main/objpool.c (7) M ctags/main/objpool.h (5) M ctags/main/options.c (11) M ctags/main/options.h (5) M ctags/main/parse.c (210) M ctags/main/parse.h (26) M ctags/main/promise.c (4) M ctags/main/ptrarray.c (28) M ctags/main/ptrarray.h (6) M ctags/main/read.c (12) M ctags/main/read.h (6) A ctags/main/trace.h (1) A ctags/main/trashbox.c (246) A ctags/main/trashbox.h (41) M ctags/main/types.h (8) M ctags/main/vstring.c (59) M ctags/main/vstring.h (31) M ctags/main/xtag.c (2) M ctags/main/xtag.h (1) M ctags/parsers/abaqus.c (6) M ctags/parsers/abc.c (6) M ctags/parsers/asciidoc.c (8) M ctags/parsers/asm.c (12) M ctags/parsers/basic.c (12) M ctags/parsers/c.c (50) M ctags/parsers/conf.c (10) M ctags/parsers/css.c (47) M ctags/parsers/diff.c (6) M ctags/parsers/docbook.c (6) M ctags/parsers/erlang.c (12) M ctags/parsers/fortran.c (14) M ctags/parsers/go.c (8) M ctags/parsers/haskell.c (8) M ctags/parsers/haxe.c (16) M ctags/parsers/html.c (540) M ctags/parsers/jscript.c (1209) M ctags/parsers/json.c (8) M ctags/parsers/latex.c (10) M ctags/parsers/lua.c (6) M ctags/parsers/make.c (8) M ctags/parsers/markdown.c (6) M ctags/parsers/matlab.c (8) M ctags/parsers/nsis.c (10) M ctags/parsers/objc.c (8) M ctags/parsers/pascal.c (10) M ctags/parsers/perl.c (12) M ctags/parsers/php.c (12) M ctags/parsers/powershell.c (8) M ctags/parsers/python.c (28) M ctags/parsers/r.c (6) M ctags/parsers/rest.c (8) M ctags/parsers/ruby.c (14) M ctags/parsers/rust.c (8) M ctags/parsers/sh.c (6) M ctags/parsers/sql.c (8) M ctags/parsers/tcl.c (6) M ctags/parsers/txt2tags.c (7) M ctags/parsers/verilog.c (8) M ctags/parsers/vhdl.c (10) M src/tagmanager/tm_parser.c (85) M src/tagmanager/tm_parser.h (2) M src/tagmanager/tm_source_file.c (10) M tests/ctags/Makefile.am (8) M tests/ctags/complex-return.js.tags (2) M tests/ctags/js-class-related-unterminated.js.tags (3) M tests/ctags/js-let.js.tags (2) M tests/ctags/js-string-continuation.js.tags (2) M tests/ctags/jsFunc_tutorial.js.tags (10) M tests/ctags/simple.html.tags (4) M tests/ctags/simple.js.tags (3)
-- Patch Links --
https://github.com/geany/geany/pull/2018.patch https://github.com/geany/geany/pull/2018.diff
Asciidoc parser is broken, it no longer shows the document outline, instead the headings are grouped by section levels. Same with ReST.
@elextr Thanks for noticing - apparently scope info is missing, this also affects current master. It seems we don't have asciidoc and rest unit tests :(.
elextr commented on this pull request.
@@ -78,7 +78,7 @@ static void makeAsciidocTag (const vString* const name, const int kind)
{ tagEntryInfo e;
- initTagEntry (&e, vStringValue (name), &(AsciidocKinds [kind])); + initTagEntry (&e, vStringValue (name), kind);
e.lineNumber--; /* we want the line before the '---' underline chars */
I suspect the scope information that used to be stored in extensions about here is the cause of the breakage of asciidoc structuring and similar change for rest. Needs to be fixed, note that if upstream won't accept it then these two parsers will have to remain Geany specific.
@techee overlapping comments :)
Would tests have picked up extension stuff?
techee commented on this pull request.
@@ -78,7 +78,7 @@ static void makeAsciidocTag (const vString* const name, const int kind)
{ tagEntryInfo e;
- initTagEntry (&e, vStringValue (name), &(AsciidocKinds [kind])); + initTagEntry (&e, vStringValue (name), kind);
e.lineNumber--; /* we want the line before the '---' underline chars */
No, this is correct - it's the "Use kindIndex instead of kindDefinition" patch mentioned above. It's now enough to use the index (which is "kind" here) instead of the value from the structure.
But the problem seems to be this patch:
https://github.com/geany/geany/commit/3e8365c7f2285c32ef1372f45afd698449a01a...
Upstream there's some additional comment why using cork isn't good in this case:
https://github.com/universal-ctags/ctags/blob/master/parsers/asciidoc.c
I'll prepare a patch tomorrow to fix that.
Would tests have picked up extension stuff?
Yep, this would appear in the tags file. I'll grab some tests from uctags (and maybe check if we have at least 1 test for every file type) so problems like that don't appear in the future - especially important now that we make many changes in our ctags implementation
elextr commented on this pull request.
@@ -78,7 +78,7 @@ static void makeAsciidocTag (const vString* const name, const int kind)
{ tagEntryInfo e;
- initTagEntry (&e, vStringValue (name), &(AsciidocKinds [kind])); + initTagEntry (&e, vStringValue (name), kind);
e.lineNumber--; /* we want the line before the '---' underline chars */
yes, the lines that patch was removing was what I was referring to.
@techee since master is also broken please make your patch a separate PR on master so it can be fixed.
Still hijacking this PR for the moment:
Oops. Normally all we'd have to do is properly fill `extensionFields.scopeIndex`, like that: ``` diff --git a/ctags/parsers/rest.c b/ctags/parsers/rest.c index 1a9dd2e22..a192babf3 100644 --- a/ctags/parsers/rest.c +++ b/ctags/parsers/rest.c @@ -34,11 +34,19 @@ typedef enum { SECTION_COUNT } restKind;
+static scopeSeparator RestScopeSeparators [] = { + { KIND_WILDCARD, ":::" } +}; + static kindOption RestKinds[] = { - { true, 'n', "namespace", "chapters"}, - { true, 'm', "member", "sections" }, - { true, 'd', "macro", "subsections" }, - { true, 'v', "variable", "subsubsections" } + { true, 'n', "namespace", "chapters", + ATTACH_SEPARATORS(RestScopeSeparators) }, + { true, 'm', "member", "sections", + ATTACH_SEPARATORS(RestScopeSeparators) }, + { true, 'd', "macro", "subsections", + ATTACH_SEPARATORS(RestScopeSeparators) }, + { true, 'v', "variable", "subsubsections", + ATTACH_SEPARATORS(RestScopeSeparators) } };
static char kindchars[SECTION_COUNT]; @@ -49,7 +57,7 @@ static NestingLevels *nestingLevels = NULL; * FUNCTION DEFINITIONS */
-static void popNestingLevelToKind(const int kind) +static NestingLevel *getNestingLevel(const int kind) { NestingLevel *nl; tagEntryInfo *e; @@ -63,14 +71,14 @@ static void popNestingLevelToKind(const int kind) else break; } + return nl; }
static void makeRestTag (const vString* const name, const int kind) { + const NestingLevel *nl = getNestingLevel (kind); int r = CORK_NIL;
- popNestingLevelToKind(kind); - if (vStringLength (name) > 0) { tagEntryInfo e; @@ -79,6 +87,9 @@ static void makeRestTag (const vString* const name, const int kind)
e.lineNumber--; /* we want the line before the '---' underline chars */
+ if (nl) + e.extensionFields.scopeIndex = nl->corkIndex; + r = makeTagEntry (&e); } nestingLevelsPush(nestingLevels, r); ```
Unfortunately, although this results in correct scope being emitted, somehow the symbols tree fails to handle nesting when a separator occurs. I didn't dig yet, but it seems like a bug in Geany. To whether it makes sense to emit more than one scope level for parsers like reST… well, IMO it does; the challenge here is that scope separation is recorded with specific characters in the scope string, and "languages" like reST allow any character in a tag, making this a little fragile. We're using ":::" right now, and ETX (ASCII 03) for Asciidoc and Txt2Tags. The latter two seem less fragile, but still not fully error proof. So in the end, even though I don't like it too much, it might indeed male sense to only keep one level in the scope here.
I've opened a PR: #2019. It lacks test cases and should be tested by some Asciidoc guy :) @techee @elextr I likely won't have time to touch this before late evening at best, so feel free to work on improving this, test cases fixes and stuff. I'll mention if I'm back on it before.
@b4n @techee I used the asciidoc.c from upstream ctags on top of this PR (and changed the `map_ASCIIDOC` in tm_parser.c to match) IT WORKS!!!! :1st_place_medal: :grinning:
So all we have to do is merge this quickly and import the upstream :smile: (well and @b4n make the change to tm_parser.c properly).
Sorry, hoped to work on this earlier but didn't have time in the last two days (will address the issues hopefully today).
@b4n @techee I used the asciidoc.c from upstream ctags on top of this PR (and changed the map_ASCIIDOC in tm_parser.c to match) IT WORKS!!!! 🥇 😀
Cool! If it's the case (I haven't checked yet), I'd suggest using the upstream versions and if there's a better way to implement this, then we could go the (hopefully new standard) "implement in uctags, import to Geany" way. @b4n What do you think?
So all we have to do is merge this quickly and import the upstream 😄 (well and @b4n make the change to tm_parser.c properly).
And even more cool thing is having @elextr excited to get this merged into Geany soon :-).
@b4n Is it necessary to have a separate fix for master if it gets fixed in this branch? Master is mildly broken because of this now but it's not something super-critical like a crash and I think it could wait until this pull request is merged.
why the TxtToTags cases are removed? or t2t isn't Txt2Tags?
Ah, OK, the test got broken and without checking I somehow assumed it was Actionscript. Seems there's a problem with scope separators in this test - I'll look into this.
@techee pushed 1 commit.
1df0db9e555c69abc47b3e5071ede4981595b188 txt2tags: Fix scope separator definition and re-enable tests
@techee pushed 3 commits.
89ed899a7836e325c90ebf2b2f71362fa4ec1663 Rename rest.c to rst.c to match upstream filename 1cb2b2c464956129e666b6adffbd95068fdcbadb Use upstream asciidoc and rst parsers aea36b485c0f5ff3d6007bdc0342298c20f30461 Add asciidoc and rst unit tests
The upstrem rest and asciidoc really seem to fix the problem. I've also added simple unit tests for them. I tried these unit tests also against Geany 1.34 tag to be sure we have identical output with the new parsers - we do.
elextr commented on this pull request.
static TMParserMapEntry map_ASCIIDOC[] = {
- {'n', tm_tag_namespace_t}, - {'m', tm_tag_member_t}, - {'d', tm_tag_macro_t}, - {'v', tm_tag_variable_t}, - {'s', tm_tag_struct_t}, + {'c', tm_tag_namespace_t}, + {'s', tm_tag_member_t}, + {'S', tm_tag_macro_t}, + {'t', tm_tag_variable_t}, + {'T', tm_tag_struct_t}, + {'u', tm_tag_undef_t}, + {'a', tm_tag_undef_t},
@techee the last two don't work, is that because you used `tm_undef_t`, and if so why not allow them to work?
Ok, apart from the comment in the code this works for me for Asciidoc, and Rest, and Python, and C, and C++ (as badly as ever :). Thats all my common languages covered :tada:
Will keep using it and might try others when I get time.
techee commented on this pull request.
static TMParserMapEntry map_ASCIIDOC[] = {
- {'n', tm_tag_namespace_t}, - {'m', tm_tag_member_t}, - {'d', tm_tag_macro_t}, - {'v', tm_tag_variable_t}, - {'s', tm_tag_struct_t}, + {'c', tm_tag_namespace_t}, + {'s', tm_tag_member_t}, + {'S', tm_tag_macro_t}, + {'t', tm_tag_variable_t}, + {'T', tm_tag_struct_t}, + {'u', tm_tag_undef_t}, + {'a', tm_tag_undef_t},
Yes, that's the reason - I disabled them intentionally for now. We didn't have these two in the parser before and I didn't want to spend time exploring what exactly would happen with them in the sidebar if we mapped them to something existing. Can be improved in the future.
elextr commented on this pull request.
static TMParserMapEntry map_ASCIIDOC[] = {
- {'n', tm_tag_namespace_t}, - {'m', tm_tag_member_t}, - {'d', tm_tag_macro_t}, - {'v', tm_tag_variable_t}, - {'s', tm_tag_struct_t}, + {'c', tm_tag_namespace_t}, + {'s', tm_tag_member_t}, + {'S', tm_tag_macro_t}, + {'t', tm_tag_variable_t}, + {'T', tm_tag_struct_t}, + {'u', tm_tag_undef_t}, + {'a', tm_tag_undef_t},
Ok, leave them for now, but they actually worked for me when I mapped them to two random things, though I have deleted that experiment, so I don't know what I used.
Hint to everybody else, try this PR with a different language for xmas and new year, and be sure to report your results :)
@techee pushed 4 commits.
3bbb3aef5674a9732927b2e9f01faa11f7351678 Rename conf.c to iniconf.c to match upstream filename 8df51d5419f67ad5b46c32c99e252ffb600b558d Add tests of conf, diff, md parsers from universal ctags eb7425baa2c2917dafd7707391a432202962d220 Add more ctags unit tests b8aa20041af887c4c4566c028ffe48e468bcf14b Remove abc ctags parser
I have added some more unit tests - again, I made sure we get the same output when these tests are applied on top of Geany 1.34 tag to be sure the recent ctags changes didn't break something.
@techee pushed 2 commits.
13d78bf42a42598f131d33210175baa21019d6a8 Rename latex.c to tex.c to match upstream filename c620252ccd54652e4b93fb88d3dd6aac6c5d2d55 Rename entry points of parsers to match upstream names
@techee pushed 1 commit.
608fe0f08e4fb2a00c5f5a5d42e8cd6725b120e4 Initialize trashbox
Hint to everybody else, try this PR with a different language for xmas and new year, and be sure to report your results :)
Good hint - so I tried C/C++ to have it simple ;-). Here's the result:
https://github.com/techee/geany/tree/cxx_parser
There are definitely some things that could be improved about it (e.g. anonymous names), but parsing itself is about as good as the upstream ctags parser ability.
@elextr Since you like finding all the possible cases where the current C parser fails, would you give it a try?
codebrainz commented on this pull request.
@@ -27,6 +27,7 @@
#include "ptag.h" #include "read.h" #include "routines.h" +#include "trashbox.h"
Hehe, in Canadian English, that is about the weirdest API name I've seen.
b4n commented on this pull request.
@@ -1,124 +0,0 @@
-/*
I'm not sure this is so much of a joke: yes, it's not recording much, but an [ABC file](http://abcnotation.com/) can apparently contain several tunes, and I assume then that a subsequent title starts the new tune, and this parse then provides a TOC, making it somewhat useful.
b4n requested changes on this pull request.
Looks pretty good but * ABC parser might be useful, as mentioned in the previous comment; * Some parsers removed because they are using RE which doesn't work anymore might still be useful, and we should find a solution for this…
@@ -3484,3 +3484,8 @@ extern void verbose (const char *const format, ...)
{ } /* GEANY DIFF END */ + +extern bool canUseLineNumberAsLocator (void) +{ + return (Option.locate != EX_PATTERN); +}
missing newline
{
- const kindOption *kindOpt; - - if (hasRegexKind (language, kind)) - return isRegexKindEnabled (language, kind); - else if (hasXcmdKind (language, kind)) - return isXcmdKindEnabled (language, kind);
Willn't this be still relevant for RE if we still keep it in the end?
@@ -2742,8 +2793,12 @@ extern void anonGenerate (vString *buffer, const char *prefix, int kind)
vStringCopyS(buffer, prefix);
- unsigned int uHash = anonHash((const unsigned char *)getInputFileName()); - sprintf(szNum,"%08x%02x%02x",uHash,lang -> anonumousIdentiferId, kind); +/* GEANY DIFF */ +/* unsigned int uHash = anonHash((const unsigned char *)getInputFileName()); + sprintf(szNum,"%08x%02x%02x",uHash,lang -> anonumousIdentiferId, kind); */ + sprintf(szNum,"%u", lang -> anonumousIdentiferId);
Why did you change this? The upstream anon ID logic seems more stable and unique, and I don't really see a problem with it?
@b4n whats the problem with ctags regex "parsers" again?
@elextr they are disabled by this patchset: from the OP, *“[…] This patch also disables regex parsing […]”*.
they are disabled by this patchset: from the OP, “[…] This patch also disables regex parsing […]”.
Oh, hidden away in a commit message. @techee, if the regex parsers work in upstream and this is intended to bring Geany into sync with upstream, why won't they work here?
@elextr the annoying part is that it brings an extra bundled copy of GNU Regex for platforms that don't have `regcomp()` (Windows is the most obvious candidate), yet GLib already does something very similar to provide GRegex. So on some situations we could end with 2 copies of it :confused:
Currently we have a GRegex-based version of it, but it's not necessarily 100% compatible, and certainly doesn't have all the new stuff from upstream. The question @techee raises is: is it worth maintaining this for the few parsers that use it? What should we do? To be fair, the most important use case for upstream is dynamic parser definition (with various command-line switches) allowing to create parsers fairly easily for fairly simple things (although it got more and more powerful lately) -- and we don't have support for this, at least not yet.
GLib already does something very similar to provide GRegex. So on some situations we could end with 2 copies of it
So can't we just make it a shallow shim over glib instead of including the ctags one?
techee commented on this pull request.
@@ -1,124 +0,0 @@
-/*
Oh, that ABC! I thought it was [this ABC](https://en.wikipedia.org/wiki/ABC_(programming_language)) and looking for "T" at the beginning of a line made absolutely no sense. So yeah, it's a pretty esoteric language and the parser doesn't do much but when I know what the language is, it makes sense. I'll remove this commit.
techee commented on this pull request.
@@ -2742,8 +2793,12 @@ extern void anonGenerate (vString *buffer, const char *prefix, int kind)
vStringCopyS(buffer, prefix);
- unsigned int uHash = anonHash((const unsigned char *)getInputFileName()); - sprintf(szNum,"%08x%02x%02x",uHash,lang -> anonumousIdentiferId, kind); +/* GEANY DIFF */ +/* unsigned int uHash = anonHash((const unsigned char *)getInputFileName()); + sprintf(szNum,"%08x%02x%02x",uHash,lang -> anonumousIdentiferId, kind); */ + sprintf(szNum,"%u", lang -> anonumousIdentiferId);
The problem with it is that in the symbol tree you'll see anonymous types like:
``` anon_enum_162324ab457c anon_enum_8734cb98f023 anon_enum_fc820958ba67 ```
instead of
``` anon_enum_1 anon_enum_2 anon_enum_3 ```
and the latter is nicer for users.
We already detect whether a tag is anonymous in tm_tag_is_anon() and don't perform autocompletion for it so there's no problem with the same anonymous names from different files.
Even this is not perfect - `anonumousIdentiferId` is used for all anonymous types so we'll get things like
``` anon_enum_1 anon_struct_2 anon_enum_3 ```
and it would be nicer to have
``` anon_enum_1 anon_struct_1 anon_enum_2 ```
But in the new cxx parser the anonymous name isn't different for individual types and the above would be
``` Anonymous1 Anonymous2 Anonymous3 ```
so at least for C and C++ this problem will disappear.
techee commented on this pull request.
{
- const kindOption *kindOpt; - - if (hasRegexKind (language, kind)) - return isRegexKindEnabled (language, kind); - else if (hasXcmdKind (language, kind)) - return isXcmdKindEnabled (language, kind);
The new code is how it looks upstream. I don't think the original code very relevant for us - it just checks some command line options to see if the parser is enabled.
b4n commented on this pull request.
@@ -2742,8 +2793,12 @@ extern void anonGenerate (vString *buffer, const char *prefix, int kind)
vStringCopyS(buffer, prefix);
- unsigned int uHash = anonHash((const unsigned char *)getInputFileName()); - sprintf(szNum,"%08x%02x%02x",uHash,lang -> anonumousIdentiferId, kind); +/* GEANY DIFF */ +/* unsigned int uHash = anonHash((const unsigned char *)getInputFileName()); + sprintf(szNum,"%08x%02x%02x",uHash,lang -> anonumousIdentiferId, kind); */ + sprintf(szNum,"%u", lang -> anonumousIdentiferId);
OK then.
b4n commented on this pull request.
{
- const kindOption *kindOpt; - - if (hasRegexKind (language, kind)) - return isRegexKindEnabled (language, kind); - else if (hasXcmdKind (language, kind)) - return isXcmdKindEnabled (language, kind);
Ah OK great then
b4n commented on this pull request.
@@ -1,124 +0,0 @@
-/*
Oh, I didn't know that other one :)
So can't we just make it a shallow shim over glib instead of including the ctags one?
Probably we could make some wrapper around it but there are minor semantic differences between the APIs which are easy to miss and which we probably won't notice in the future when something in upstream uctags changes and we just grab the sources from there.
All the parsers upstream from the "parsers" directory are non-regex ones and the regex ones are only those from
https://github.com/universal-ctags/ctags/tree/master/optlib
but none of the languages seem too interesting for Geany (we have a hand-written markdown parser). So we won't miss much from the upstream by killing the regex parser support. So the only problem are Actionscript and Cobol parsers. As I said, personally, I would drop them. But the second best solution would be rewriting them as non-regex parsers (but preferably not by me - @b4n you said something like it would be fun rewriting those - I would really hate myself for stealing the fun from you :-)
@b4n you said something like it would be fun rewriting those - I would really hate myself for stealing the fun from you :-)
Hehe :smile: I gave COBOL a little try, but it now it seems a lot more of a nightmare to do it right (from what I gathered, not only the official syntax uses a pretty random notation, but it also seem close to impossible to parse without knowing every single bit of it -- e.g. it's not easy to locate "statements" or alike). But well… I'll still try and find a way.
Hehe 😄 I gave COBOL a little try, but it now it seems a lot more of a nightmare to do it right (from what I gathered, not only the official syntax uses a pretty random notation, but it also seem close to impossible to parse without knowing every single bit of it -- e.g. it's not easy to locate "statements" or alike). But well… I'll still try and find a way.
If you really insist on having this fun, couldn't you just use the readLineFromInputFile() ctags interface and more or less only simulate the regex expressions from the existing regex parser (without actually having to learn how the language looks like)?
Hehe smile I gave COBOL a little try, but it now it seems a lot more of a nightmare to do it right (from what I gathered, not only the official syntax uses a pretty random notation, but it also seem close to impossible to parse without knowing every single bit of it -- e.g. it's not easy to locate "statements" or alike). But well… I'll still try and find a way.
If you really insist on having this fun, couldn't you just use the readLineFromInputFile() ctags interface and more or less only simulate the regex expressions from the existing regex parser (without actually having to learn how the language looks like)?
Or just use glib regex and use the regexen from the ctags Cobol modified as needed for glib?
techee commented on this pull request.
@@ -1,124 +0,0 @@
-/*
I didn't know either of them, I just searched for "ABC language" and this one appeared. The parser is now back together with a unit test.
@b4n I hope I addressed all the minor problems (apart from the missing parsers) - let me know if I forgot about something.
@elextr Since you like finding all the possible cases where the current C parser fails, would you give it a try?
@techee ahh, no, not until this gets committed, _then_ you can make a PR to use the new C++ parser instead of the one in c.c. After all once this is committed we will need another ctags PR to replace it :grin:
Latest commits don't seem to have broken anything, and using a rubbish example from the web, latex seems to be working (well at least I have symbols).
Seems as though nobody else cares enough to check their favourite language still works, so as soon as @b4n accepts it should be merged to allow it to mature for a bit before the next release.
@techee ahh, no, not until this gets committed, then you can make a PR to use the new C++ parser instead of the one in c.c.
Sure. I just thought you might be interested to try it before and see if it fixes some of the problems with the existing C/C++ parser for you.
@techee
``` In particular, the regex parser has been disabled (for now?) because its current implementation doesn't allow accessing kindDefinitions using index and allowing this would require big changes in its implementation. The affected parsers are Cobol, ActionScript and HTML. For HTML we can use the token-based parser from upstream, and we should consider whether Cobol and ActionScript are worth the effort to maintain a separate regex implementation using GRegex (IMO these languages are dead enough not to justify the extra effort). ```
Basically it looks like this is stalled because it removes features, so how are those languages handled upstream?
Merged #2018 into master.
Merged because
1. it fixes three languages, Asciidoc, Rest (Geanys own documentation) and Latex and progresses the process of making it easier to maintain parsers from upstream.
2. It has been working for me for months
3. If we keep delaying every ctags update for a year we will always have huge updates :(
The downside is it loses Cobol, but that appears to require major work which can be better provided as a pull request on top of this when agreement on either making a non-regex parser or accepting two regex parsers (since ctags uses a different variant to Glib).
Summary: we need to make progress, not block because its not perfect.
I don't think we should release with this until the broken languages are fixed.
This PR shouldn't have been merged until that was done, as per @b4n's review requesting changes first, or at least until there was some kind of consensus that it should be merged.
See #2119. As @b4n refused a second regex engine the "requested changes" amount to whole new parsers. Thats not in the scope of this PR. Now they can be made as clean standalone PRs instead of buried inside an already too big PR.
@elextr please don't squash-merge this kind of big PRs with great commit splitting, we now lost the whole history here, and although yes, some commits were big already, most were very well split apart. I'll also make bisecting a potential issue a lot harder.
Anyway, as per the merge itself: I obviously am not a fan, but well, I also admit we need to move forward at some point, and this kind of forces (my) hand. Sad it happens right when I'm back with some time for Geany dev… well I guess them I should have *started* with this one yesterday instead of #2092.
- it fixes three languages, Asciidoc, Rest (Geanys own documentation) and Latex and progresses the process of making it easier to maintain parsers from upstream.
Let's just be fair: I already proposed a standalone fix for Asciidoc and Rest (#2019) without this PR, and you rejected it because merging this was a better choice -- which is fair enough, but then not so much an argument for pushing this one :) I wasn't aware that Latex was broken in master, probably to blame on my lack of Geany time lately.
- It has been working for me for months
That's a great point :)
- If we keep delaying every ctags update for a year we will always have huge updates :(
yes, but with this PR, so long as it can still be merged in Geany, we should have a clean-ish diff from uctags, meaning update should be fairly easy no matter how big they are, as most can just be applied as-is.
The downside is it loses Cobol
An ActionScript as well. I don't think we can release in the current state without those, especially as we have at least one active COBOL user that even submitted changes recently.
Anyway, we now have to pull up our sleeves and fix what needs to be fixed before the release. Help is welcome.
Yes, I know there is one Cobol user, and I agree its not nice for them, but keeping Cobol blocked everything else, so it was not nice for everyone else, not just the three languages that didn't work, but also all improvements from upstream for many other languages.
And since it will be a separate PR the Cobol/Actionscript or whatever else can potentially be a short term fix, and it will be easy to remove later when a better one comes along :)
PRs with great commit splitting
Sorry. I did look, but maybe because I really don't understand the ctags code organisation they seemed to just be progressive things, so you could never revert just one, the whole lot would stop working.
happens right when I'm back with some time for Geany
Neat :)
Talking to @codebrainz in another forum he suggested that the release can be delayed if needed, and I would agree, the changes (other than this :) are not so significant that it would hurt for them to wait. And besides we missed the last one by several months, why break a good record :)
Anyway, we now have to pull up our sleeves and fix what needs to be fixed before the release. Help is welcome.
Lets discuss on #2119
please don't squash-merge this kind of big PRs with great commit splitting
Could not the squashed merge be reverted and re-merged with the individual commits? (not that I agree with merging this at all until it doesn't cause serious regressions).
I didn't port the regex parser from ctags because it seemed too much work to me (for the two not-so-important languages - at least to me) but it might not be that hard if I take the current version of the parser from uctags and apply the GRegex-related changes on top of that. I was just a little worried I might miss some semantic differences between the two regex engines.
I might have some time to have a look at it in the next week but no guarantee.
I don't know where I can ask: what about "Hacking Geany" > "Adding a filetype"? Is this section needs updating?
@Skif-off the part that possibly is affected is the `Adding a TagManager parser` section, but its very general and looks like its written for those who already know what they are doing (which is not me as far as adding ctags parsers, so I may be wrong).
Since its so general I don't think its affected by any changes to the ctags layout. What it might possibly need is an improvement, but thats something to make a separate issue/pull request.
@elextr, yes, section ```Adding a TagManager parser``` is interesting firstly (possible changes in updating ```add_top_level_items()``` & adding ```TMParserMapEntry ```). Ok, thanks, I thought so, but I decided to clarify this moment, just in case :)
github-comments@lists.geany.org