Alright, hopefully my last Colomban-killer patchset. I tried to sync as much as possible from ctags/main here (no big parser changes at this point, only those caused by some change in main).
I'm not sure how much it's worth reviewing patch by patch (maybe except those patches that touch some Geany code or parsers) - it's much easier to diff this against universal-ctags and see the differences - there just a few of them. I made some minor changes to uctags as well so better to diff against https://github.com/techee/ctags/tree/geany_sync2 which contains those changes.
I tried to librarify ctags a bit by adding CTAGS_LIB macro by which I protect the library-specific stuff. There are however still some things which are either missing in uctags or which we should change in Geany:
* varType missing in uctags * GRegex vs GNU regex - I kept GRegex in lregex.c (but synced all the rest) but we should think whether not to convert back to GNU regex * isIgnoreToken() works differently than in uctags. If it simply just dropped certain tags, we could move the logic into TM but right now, it helps the C parser with parsing by suggesting whether the contents of braces should be dropped too. I'm not sure if we need to preserve this functionality (never used the ignored.tags file myself) but if we move to the new cxx parser we'd have to keep this diff unless there's some interest for this in uctags. * lcpp contains lots of changes as it's tightly related to c.c. However, as it's used by c.c only and quite independent of the rest, if we move to the cxx parser we could keep a separate copy of lcpp for the obsolete c.c parser.
Except 829ea7d I tried to do all the work piece by piece so nothing should be lost. There wasn't much work on parsers except the mentioned patch where parsers using nestlevel had to be updated to its new API and using cork (which works so every uctags parser could be now ported to Geany).
I think it's too late for Geany 1.29 but it would be good to have it ready for early 1.30 so it can be tested during the whole release cycle. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/1263
-- Commit Summary --
* Grab the complete uctags vString implementation * Add generic pointer array * Eliminate uses of g_stat() * Drop some more unused functions from routines.c/h * Remove most uses of glib calls * Remove unused functions from options.c/h * Grab uctags implementation of strlist * Grab uctags keyword.c/h and add types.h with type declarations * Make tempFile() return MIO * Make combinePathAndFile() return char * instead of vString * Use parse separator utilities in routines.c * Make sure that file extension is taken from file (not preceding directory) * Use uctags implementation of strstr() * Don't initialize ExecutableProgram and ExecutableName * Sync the rest of routines.c/h * Grab uctags ctags.h and add repoinfo.c/h * Grab args.c/h uctags implementation * Define gcc attributes inside gcc-attr.h * Move debug related code to debug.c/h * Remove all unused code from main.c * Grab all MIO changes from uctags * Implement iFileGetLine() using gets() * Grab ctags version of general.h and make related changes * Grab ctags implementation of sort.c/h * Create geany.c/h and put isIgnoreToken() * Sync options.c/h (and introduce a lot of new garbage) * Grab uctags kind.c/h * Some sync of lcpp.c/h * Move eTagFile from entry.h to entry.c * Sync the beginning of entry.c/h * entry: Take over uctags tag writing code * entry: Some more or less formal changes * entry: make makeTakEntry() implementation closer to uctags * entry: Sync initTagEntry() code * entry: Add the remaining code from uctags * read: change macros into functions and move them into c * c.c, lcpp.c: Avoid using File.mio * read: move sInputFileInfo, sInputFile and File to read.c * read: Add some extra data structures * read: use makeFileTag() * read: Some function renaming * read: implement openBuffer() using openInputFile() * read: Add function to initialize common parts of input/source files and use it * read: introduce langStack for source files * read: add lineFposMap * read: add the remaining stuff from uctags * parse: remove things unused by Geany for simpler merge * parse: some simple syncs and include syncs * parse: other small syncs * lregex: sync includes * lregex: replace regexBroken with regexAvailable * lregex: some simple syncs * lregex: remove unused functions * lregex: Initialize lregex structures the universal-ctags way * lregex: Add missing fields to regexPattern * lregex: sync some function prototypes * lregex: introduce findRegexTagsMainloop() * lregex: some harmless syncs * lregex: add new flag processing * lregex: sync the remaining parts of the existing code * lregex: Add the remaining missing functions from uctags * Grab uctags version of parse.c/h and nestlevel.c/h plus additional fallout * grab uctags debug.c/h * Add typeRef to sTagEntryInfo * read: backport patch from uctags * Some minor syncs * Sync error.c completely and create a custom error printer * tmp * Some initial work on ctags as a library * Move ctags "API" from Geany to ctags * Make LanguageTable variable private in parse.c * Add the remaining missing files from uctags * Annotate geany-specific diffs
-- File Changes --
M configure.ac (9) M ctags/Makefile.am (42) M ctags/main/args.c (50) M ctags/main/args.h (6) A ctags/main/ctags-api.c (141) A ctags/main/ctags-api.h (56) M ctags/main/ctags.h (21) A ctags/main/debug.c (128) M ctags/main/debug.h (82) A ctags/main/dependency.c (102) A ctags/main/dependency.h (45) A ctags/main/e_msoft.h (71) M ctags/main/entry.c (1233) M ctags/main/entry.h (175) A ctags/main/error.c (59) A ctags/main/error.h (26) A ctags/main/field.c (921) A ctags/main/field.h (111) A ctags/main/flags.c (107) A ctags/main/flags.h (29) A ctags/main/fmt.c (354) A ctags/main/fmt.h (25) A ctags/main/gcc-attr.h (30) M ctags/main/general.h (112) A ctags/main/htable.c (271) A ctags/main/htable.h (47) M ctags/main/keyword.c (22) M ctags/main/keyword.h (4) A ctags/main/kind.c (158) M ctags/main/kind.h (73) M ctags/main/lcpp.c (254) M ctags/main/lcpp.h (42) M ctags/main/lregex.c (806) A ctags/main/lxcmd.c (1227) A ctags/main/lxpath.c (217) M ctags/main/main.c (464) A ctags/main/mbcs.c (112) A ctags/main/mbcs.h (22) M ctags/main/mio.c (382) M ctags/main/mio.h (40) M ctags/main/nestlevel.c (61) M ctags/main/nestlevel.h (19) A ctags/main/objpool.c (79) A ctags/main/objpool.h (39) M ctags/main/options.c (3474) M ctags/main/options.h (172) A ctags/main/output-ctags.c (224) A ctags/main/output-etags.c (107) A ctags/main/output-json.c (189) A ctags/main/output-xref.c (60) A ctags/main/output.h (50) M ctags/main/parse.c (2846) M ctags/main/parse.h (262) A ctags/main/pcoproc.c (296) A ctags/main/pcoproc.h (29) A ctags/main/promise.c (102) A ctags/main/promise.h (27) A ctags/main/ptag.c (234) A ctags/main/ptag.h (63) A ctags/main/ptrarray.c (152) A ctags/main/ptrarray.h (46) M ctags/main/read.c (962) M ctags/main/read.h (121) A ctags/main/repoinfo.c (12) A ctags/main/repoinfo.h (1) M ctags/main/routines.c (585) M ctags/main/routines.h (71) A ctags/main/selectors.c (382) A ctags/main/selectors.h (35) M ctags/main/sort.c (120) M ctags/main/sort.h (15) M ctags/main/strlist.c (167) M ctags/main/strlist.h (20) A ctags/main/types.h (29) M ctags/main/vstring.c (214) M ctags/main/vstring.h (29) M ctags/parsers/abc.c (1) M ctags/parsers/asciidoc.c (23) M ctags/parsers/c.c (39) M ctags/parsers/conf.c (1) M ctags/parsers/fortran.c (14) M ctags/parsers/lua.c (8) M ctags/parsers/markdown.c (1) M ctags/parsers/php.c (4) M ctags/parsers/powershell.c (1) M ctags/parsers/python.c (107) M ctags/parsers/rest.c (21) M ctags/parsers/ruby.c (38) M ctags/parsers/txt2tags.c (59) M src/symbols.c (2) M src/tagmanager/Makefile.am (4) D src/tagmanager/tm_ctags_wrappers.c (171) D src/tagmanager/tm_ctags_wrappers.h (65) M src/tagmanager/tm_parser.c (18) M src/tagmanager/tm_source_file.c (74) M src/tagmanager/tm_tag.c (2) M src/tagmanager/tm_workspace.c (4) M tests/ctags/bug1585745.cpp.tags (6) M tests/ctags/bug507864.c.tags (7) M tests/ctags/cpp_destructor.cpp.tags (2) M tests/ctags/simple.d.tags (2)
-- Patch Links --
https://github.com/geany/geany/pull/1263.patch https://github.com/geany/geany/pull/1263.diff
Will have a look at the travis failures - it may take a few rounds as I didn't test on Windows and there were too many changes.
@techee pushed 1 commit.
0633a28 Remove -Werror=aggregate-return from travis flags
@techee pushed 1 commit.
09e13d5 Don't compile bigger part of main for CTAGS_LIB
@techee pushed 1 commit.
0dd9f3a Eliminate use of regex from read.c
@techee pushed 1 commit.
b8df6e0 Eliminate use of fnmatch() which isn't present on MINGW
Alright, so it finally builds and it also builds and runs on macOS, I've just tried.
Suggest do this on a separate branch from master and do it in small chunks that humans can look at.
If I was @b4n it would be simply rejected outright.
And beside the process, what have uctags done that suddenly needs +11800 lines of code?
Agree the lines changed is massive, but if it brings the [TM Ctags fork](https://github.com/geany/geany/tree/master/ctags) inline with [upstream](https://github.com/universal-ctags/ctags) to make syncing easier, +1. I pity anyone who tries to review the entire changeset :)
@codebrainz @techee but what does all that code actually DO? Why has it accumulated over 10 years (if thats the reason)? What benefit does it bring to include it in Geany?
@elextr it minimizes the effort of maintaining a fork of CTags. The closer our fork is to upstream, the easier it is to backport changes from upstream into our fork. Even if there's a bunch of stuff we don't use, it doesn't matter, if we don't use it then it shouldn't affect our resulting binary.
@codebrainz @techee but what does all that code actually DO? Why has it accumulated over 10 years (if thats the reason)? What benefit does it bring to include it in Geany?
In fact I think it has accumulated over something like 15 years - it was first part of the tag manager (which is from 2001), then it was taken over and modified by Anjuta and then by Geany. Each project was adding its own stuff on top and the differences started getting bigger. I think the original fork also just deleted lots of lines from ctags which weren't used by TM. I could do the same now too but it makes the code harder to compare against uctags which is why I kept it and why there are 10000 more LOCs.
But there are also features we want - it's mostly things which can be used by parsers like a simpler mechanism to create scope information (called cork) and the possibility to run a subparser from a parser (I used this in the upstream HTML parser to use real parsers for embedded javascript and CSS).
During this year I started adding things we need in Geany into upstream ctags (like TM and other things) so there's not much missing there we would need in Geany. The ultimate goal is there's no Geany fork at all and that we just treat ctags similarly to Scintilla - development just upstream, taking over all the code once a while.
I pity anyone who tries to review the entire changeset :)
This is why I'd suggest reviewing it as if it was a single commit
"Take over upstream ctags and apply Geany changes on top"
which is how it will appear when you take meld and compare upstream ctags/main with geany-ctags/main. The diff is just a few hundreds LOCs.
I did it this painful way myself to make sure nothing what Geany needs was lost on the way.
The only changes worth reviewing in greater detail are those that affect parsers (mostly cork introduction), lcpp.c changes and that's about it.
@techee thanks for the explanation, can you explain more what is "cork" and what it provides? This is more for understanding than directly relevant to the PR.
You pointed out the concern, is there anything that is not needed in Geany, thats in that code because ctags needs it (like for instance the stuff that outputs ctags files). I know Geany abandoned the "small and lightweight" some time ago, but this is adding two extra `editor.c`s worth of code to the executable.
I don't suppose you could get uctags to provide a `libctags` like Geanys `libmain`?
@techee thanks for the explanation, can you explain more what is "cork" and what it provides? This is more for understanding than directly relevant to the PR.
The "cork" thing is something that helps constructing scope information by referencing the parent tag. You write something like:
``` parent_tag = makeTag(...); chlid_tag = makeTag(parent_tag, ...); ```
and arbitrary nesting of these and it automatically generates scope for you (you also have to statically specify scope separator before). Don't ask me why it's called cork though ;-).
You pointed out the concern, is there anything that is not needed in Geany, thats in that code because ctags needs it (like for instance the stuff that outputs ctags files). I know Geany abandoned the "small and lightweight" some time ago, but this is adding two extra editor.cs worth of code to the executable.
Binary-size wise libgeany.so size looks this way:
``` 17942824 before 18533832 after ```
which is 3.2% increase so nothing dramatic. While it brings more code, the point is we should have less code to maintain ourselves because we could rely on uctags maintaining the code.
I could probably use more ifdefs to strip more unnecessary code from the binary but first I'd like to do as little as possible and get some feedback both from Geany project and uctags (where I plan to publish the basic library support patches for further discussion which direction to take).
I don't suppose you could get uctags to provide a libctags like Geanys libmain?
That's the long-term plan but many things have to be resolved first - the diffs we need for Geany right now I mentioned in the first post, then probably ctags should be modularized a little because it doesn't expect that something like a library based on it could exist and you cannot easily remove code we don't care about right now.
Binary-size wise libgeany.so size looks this way:
17942824 before 18533832 after ```
I guess that's with default `-O2` and `-g` without stripping symbols?
Here's what I get with different options (always using `make install-strip`):
``` Master branch w/ '-O4': 2.8 MB This branch w/ '-O4': 3.0 MB This branch w/ '-O4 -flto': 2.8 MB This branch w/ '-O4 -flto -fwhole-program': 2.9 MB This branch w/ '-Os -flto': 2.6 MB ```
@codebrainz Yeah, thanks, I didn't notice it was an order of magnitude bigger. So it's something like 7% increase then.
The "cork" thing is something that helps constructing scope information by referencing the parent tag. You write something like:
parent_tag = makeTag(...); chlid_tag = makeTag(parent_tag, ...);
and arbitrary nesting of these and it automatically generates scope for you (you also have to statically specify scope separator before). Don't ask me why it's called cork though ;-).
So thinking about it, that means it still only handles named scopes (ie those that have parent tags), still not lexical scopes ie { to }? If so it still seems a lot of code.
So thinking about it, that means it still only handles named scopes (ie those that have parent tags), still not lexical scopes ie { to }?
Yep, no lexical support as far as I know.
If so it still seems a lot of code.
The great majority of the added code is nothing useful for Geany (processing command-line parameters of ctags, support of different output file formats, etc.). The only useful thing which this patch brings is the mentioned cork thing and the ability to run nested parsers from within another parser. Adding this would be less than 1000 LOCs (rough estimate).
But the point of this pull request isn't to bring anything "useful" - the point is to make ctags (eventually) truly external dependency we don't have to care about. Every time something changes upstream now, we have to backport the patch manually and it's painful (e.g. https://github.com/geany/geany/commit/662765852fb2f5ca72de8087646ab8bd6d82c8... and various renames of functions used by parsers we have to backport if we want to keep parsers compatible). The development of ctags is quite active now and changes like that happen quite often.
This pull request is just the first step - it will have to be followed by:
1. Syncing parsers. 2. If possible getting rid of Geany-specific code we still have. 3. Bringing the library support to ctags. 4. Refactoring ctags so only the necessary things compile for a library.
Once (1) and (2) is done, (3) and (4) can be done in the ctags project and we'll only have to take over the changes. I don't want to do (4) now as it will be better to discuss with the ctags guys. Also the current API is rather provisional and will need some discussion how to do it right in ctags. Still this pull request helps to make it clear what we need in Geany from ctags and makes it possible to do (1).
The great majority of the added code is nothing useful for Geany (processing command-line parameters of ctags, support of different output file formats, etc.). The only useful thing which this patch brings is the mentioned cork thing and the ability to run nested parsers from within another parser. Adding this would be less than 1000 LOCs (rough estimate).
That was what I suspected.
- Syncing parsers.
- If possible getting rid of Geany-specific code we still have.
- Bringing the library support to ctags.
- Refactoring ctags so only the necessary things compile for a library.
I suggest doing it in reverse order, wait until ctags is made into a library then do one change when it has a stable API. Bringing in all the unused crap and then removing it and interfacing Geany to whatever the ctags API is gonna be is wasted work IMO.
Personally, I don't see any problem here; the idea that we can get closer to upstream and have less of a fork to maintain is great and has been a long-time goal of the project and ongoing effort by @techee, and this big PR isn't a whole bunch of new code we have to maintain, it actually greatly reduces the amount of code we have to maintain since it becomes upstream's code, and most of it doesn't get emitted in the final binary anyway. It seems like a big win for Geany.
Waiting for UCtags to create a proper library, which is a long way off, just causes volunteers to do loads of extra work in the meantime for no good reason.
That said, if this PR includes any files that are not used in any way, it makes no sense to include them, we should do like we do with Scintilla and leave out completely unused files, IMO.
I suggest doing it in reverse order, wait until ctags is made into a library then do one change when it has a stable API. Bringing in all the unused crap and then removing it and interfacing Geany to whatever the ctags API is gonna be is wasted work IMO.
One of the important parts of this pull request was to learn what we actually need in the API which wasn't clear before and what the differences between Geany and uctags are (which wasn't clear at all before). I wouldn't know how to create the API without this.
There's nobody working on libraryfication of uctags upstream. Waiting for it to happen would be pointless - it won't happen by itself. The only project that uses uctags as a "library" is us and the and it's us (probably me) who will have to do it. Doing it properly will take some time - I'd most probably start doing it "improperly", i.e. bringing the API to uctags the way it's done in this pull request and then starting to refactor it to turn it into a proper library. This will take time however and having this pull request in will make it possible to start working on syncing parsers. In addition the state after this pull request is better than before - it's MUCH easier to bring changes to Geany from uctags than before. So even if it turns out I don't have time to work on uctags or it turns out the uctags project doesn't want the library change (or whatever else), it's better to have this in.
@techee pushed 3 commits.
40ed52c Remove output writer implementations efc40f4 #if 0 most of options.c 0a6acff Remove mbcs.c/h
That said, if this PR includes any files that are not used in any way, it makes no sense to include them, we should do like we do with Scintilla and leave out completely unused files, IMO.
I just did it for a few files (should be around -4000LOCs in total) but even though some more files look they might be removed too as we don't need their implementation, it's not so easy as they are called from other files (in functions which don't get called by Geany). But this is what would need quite some refactoring and it's something I'd leave for the uctags patches.
Just tried the binary sizes with the last 3 patches and got
``` 2696256 master 2776960 this branch ```
so about 3% binary increase which isn't too bad I think. Compilation time-wise it's something like this:
``` 0m33.961s master 0m35.677s this branch ```
This is real time on a quad-core machine.
b4n requested changes on this pull request.
Comments on the uctags/geany diff:
```diff +static bool createTagsWithFallback1 (const langType language, + passStartCallback passCallback, void *userData) +{ + int lastPromise = getLastPromise (); + unsigned int passCount = 0; + rescanReason whyRescan; + + if (LanguageTable [language]->useCork) + corkTagFile(); ``` Doesn't this miss `initializeParser (language);`?
--- In `runParserInNarrowedInputStream()`: shouldn't more be guarded? Here the meat of the function is disabled with `CTAGS_LIB`, but not all.
--- ```diff -static void attachEndFieldMaybe (int macroCorkIndex) -{ - if (macroCorkIndex != CORK_NIL) - { - tagEntryInfo *tag; - - tag = getEntryInCorkQueue (macroCorkIndex); - tag->extensionFields.endLine = getInputLineNumber (); - } -} - ``` Would be nice to have that :)
--- In `directiveDefine()`: Why use a completely different signature collection logic than CTags'? I get that might be related to c.c's handling of it, but not sure how what affects what.
@@ -38,7 +38,14 @@ AC_PROG_LN_S
# autoscan start
# Checks for header files. -AC_CHECK_HEADERS([fcntl.h fnmatch.h glob.h stdlib.h sys/time.h]) +AC_CHECK_HEADERS([fcntl.h glob.h stdlib.h sys/time.h errno.h limits.h]) + +# Checks for dependencies needed by ctags +AC_CHECK_HEADERS([fnmatch.h direct.h io.h sys/dir.h]) +AH_TEMPLATE([USE_STDBOOL_H], [whether or not to use <stdbool.h>.]) +AC_DEFINE([USE_STDBOOL_H]) +AH_TEMPLATE([CTAGS_LIB], [compile ctags as a library.]) +AC_DEFINE([CTAGS_LIB])
Should rather use the `AC_DEFINE(symbol, value, description)` than calling `AH_TEMPLATE` manually IMO: ```autoconf AC_DEFINE([USE_STDBOOL_H], [1], [whether or not to use <stdbool.h>.]) AC_DEFINE([CTAGS_LIB], [1], [compile ctags as a library.]) ```
while (1) { nl = nestingLevelsGetCurrent(nestingLevels); - if (nl && nl->type >= kind) + e = getEntryOfNestingLevel (nl); + if ((nl && (e == NULL)) || (e && (e->kind - AsciidocKinds) >= kind))
why do that when `e == NULL`?
{
- e.extensionFields.scopeKind = &(AsciidocKinds [nl->type]); - e.extensionFields.scopeName = vStringValue (nl->name); + e.extensionFields.scopeKind = &(AsciidocKinds [parent->kind - AsciidocKinds]); + e.extensionFields.scopeName = parent->name;
isn't the whole scope part dealt with by Cork anyway?
{
- e.extensionFields.scopeKind = &(AsciidocKinds [nl->type]); - e.extensionFields.scopeName = vStringValue (nl->name); + e.extensionFields.scopeKind = &(AsciidocKinds [parent->kind - AsciidocKinds]);
seems like a convoluted way to say `parent->kind` :)
@@ -290,12 +289,12 @@ static kindOption CKinds [] = {
{ true, 'g', "enum", "enumeration names"}, { true, 'm', "member", "class, struct, and union members"}, { true, 'n', "namespace", "namespaces"}, - { false, 'p', "prototype", "function prototypes"}, + { true, 'p', "prototype", "function prototypes"},
that's odd… but I would have imagine those would have been enabled before. Weird.
@@ -1221,11 +1219,10 @@ static void addOtherFields (tagEntryInfo* const tag, const tagType type,
{ tag->extensionFields.access = accessField (st); } - if ((true == st->gotArgs) && (true == Option.extensionFields.argList) &&
is that option gone? Not that we actually do need it, but well
contextual_fake_count = 0;
Assert (passCount < 3); - cppInit ((bool) (passCount > 1), isInputLanguage (Lang_csharp), isInputLanguage (Lang_cpp), &(CKinds [CK_DEFINE])); + + kind_for_define = CKinds+CK_DEFINE; + + cppInit ((bool) (passCount > 1), isInputLanguage (Lang_csharp), isInputLanguage(Lang_cpp), + kind_for_define);
why introduce the `kind_for_define` variable?
exception = (exception_t) setjmp (Exception); - retry = false;
shouldn't `rescan` be reset to `RESCAN_NONE` here? `setjmp()` might lead to jumping there directly.
vStringCatS(result, ".");
else vStringCatS(result, "/"); */ } - vStringCat(result, nl->name); - is_class = (nl->type == K_CLASS); + + e = getEntryOfNestingLevel (nl); + if (e) + { + vStringCatS(result, e->name); + is_class = ((e->kind - PythonKinds) == K_CLASS); + } + else + is_class = K_FUNCTION; /* ??? */
shouldn't that be merely `false`?
prev = nl;
} return is_class; }
/* Check indentation level and truncate nesting levels accordingly */ -static void checkIndent(NestingLevels *nls, int indent) +static void checkIndent(NestingLevels *nls, int indent, bool eof)
unused last (new) param
@@ -27,6 +27,17 @@
/* * DATA DEFINITIONS */ + +struct corkPair { + int index; +};
that struct looks useless.
@@ -825,5 +859,6 @@ extern parserDefinition *PythonParser (void)
def->kindCount = ARRAY_SIZE (PythonKinds); def->extensions = extensions; def->parser = findPythonTags; + def->useCork = true; return def; }
well, I guess all changes to *python.c* are not too relevant so long as the parser still works, as I plan to import my new rewritten parser prom UCtags once Cork is available on our side.
while (1) { nl = nestingLevelsGetCurrent(nestingLevels); - if (nl && nl->type >= kind) + e = getEntryOfNestingLevel (nl); + if ((nl && (e == NULL)) || (e && (e->kind - RestKinds) >= kind))
same question as why handling `e == NULL` on the left
{
- e.extensionFields.scopeKind = &(RestKinds [nl->type]); - e.extensionFields.scopeName = vStringValue (nl->name); + e.extensionFields.scopeKind = &(RestKinds [parent->kind - RestKinds]); + e.extensionFields.scopeName = parent->name;
again, same as for asciidoc
- {
- GStatBuf s; - - /* load file to memory and parse it from memory unless the file is too big */ - if (g_stat(file_name, &s) != 0 || s.st_size > 10*1024*1024) - parse_file = TRUE; - else - { - if (!g_file_get_contents(file_name, (gchar**)&text_buf, (gsize*)&buf_size, NULL)) - { - g_warning("Unable to open %s", file_name); - return FALSE; - } - free_buf = TRUE; - } - }
shouldn't we keep this?
@@ -1,4 +1,5 @@
# format=tagmanager -ENTSEQNO�16�(seq)�0�FUNCSTS -MEMTXT�16�(form_msg)�0� -MEMTXT�1024�(form_msg)�0� +ENTSEQNO�16�(eq)�0�FUNCSTS
should really be `(seq)` if anything
@@ -1,4 +1,5 @@
# format=tagmanager -ENTSEQNO�16�(seq)�0�FUNCSTS -MEMTXT�16�(form_msg)�0� -MEMTXT�1024�(form_msg)�0� +ENTSEQNO�16�(eq)�0�FUNCSTS +MEMTXT�16�(mail)�0� +MEMTXT�1024�(orm_msg)�0�FUNCSTS
here too, should be `(from_msg)`
@@ -20,5 +20,5 @@ obj
qar�64�Struct.Union�0�int quxx�64�Struct.Union�0�bool test.simple�256�0 -tfun�16�(T)�Class�0�auto +tfun�16�(T v)�Class�0�auto
not sure if it's better or worse. Source line is ```d auto tfun(T)(T v) { return v; } ```
@@ -114,6 +150,14 @@ extern const char *tagFileName (void)
* Pseudo tag support */
+extern void abort_if_ferror(MIO *const mio) +{ +#ifndef CTAGS_LIB + if (mio_error (mio)) + error (FATAL | PERROR, "cannot write tag file"); +#endif
It maybe worth still doing *something*? warn maybe?
- if (includeExtensionFlags ()
+ && isXtagEnabled (XTAG_QUALIFIED_TAGS) + && doesInputLanguageRequestAutomaticFQTag ()) + buildFqTagCache (tag); + +#ifdef CTAGS_LIB + getTagScopeInformation((tagEntryInfo *)tag, NULL, NULL); + + if (TagEntryFunction != NULL) + { + ctagsTag t; + + initCtagsTag(&t, tag); + length = TagEntryFunction(&t, TagEntryUserData); + } +#else
Couldn't we simply have our own writer? Might be more work, but maybe less divergence?
return NULL;
+ + start = strdup (vStringValue (signature));
not standard, maybe use `eStrdup()`?
extern bool isExcludedFile (const char* const name);
+extern bool isIncludeFile (const char *const fileName); +/* GEANY DIFF */ +/* extern const ignoredTokenInfo * isIgnoreToken (const char *const name); */ extern bool isIgnoreToken (const char *const name, bool *const pIgnoreParens, const char **const replacement);
why not use the new ctags impl? I mean, it seems to do the same, but differently, couldn't we use that?
Okay, first pass. I diffed against uctags/main as you suggested, plus the parsers diff.
- GRegex vs GNU regex - I kept GRegex in lregex.c (but synced all the rest) but we should think whether not to convert back to GNU regex
Well… I'd rather not bundle yet another library, especially as we have the same feature already available in a dependency.
- isIgnoreToken() works differently than in uctags.
Does it? Isn't just the API different?
If it simply just dropped certain tags, we could move the logic into TM but right now, it helps the C parser with parsing by suggesting whether the contents of braces should be dropped too. I'm not sure if we need to preserve this functionality (never used the ignored.tags file myself) but if we move to the new cxx parser we'd have to keep this diff unless there's some interest for this in uctags.
Well, the point of ignore tokens is to mitigate the confusion macros can cause on the C parser. For example, I have a custom ignore.tags with those: ``` G_GNUC_DEPRECATED_FOR+ G_GNUC_PRINTF+ G_GNUC_SCANF+ G_GNUC_FORMAT+ G_GNUC_* ... __STL_NOTHROW __THROW GIT_EXTERN=extern cairo_public GEANY_DEPRECATED GEANY_PRIVATE_FIELD GEANY_DEPRECATED_FOR+ ``` so various source files parse a lot better, because the parser just performs as if those weren't part of the source. See it as the poor man's C preprocessor.
Alright, finally I should have more time for Geany. If nobody noticed yet, I work like a switch - I'm terrible at doing things in parallel so either I work or Geany (which usually results in a flood of patches) or I do something else. The switch is in the Geany position now.
I'll go through all the comments, try to remember what I actually did 3 months ago and why and will address the issues in the next few days.
isIgnoreToken() works differently than in uctags.
Does it? Isn't just the API different?
One difference is in how isIgnoreToken() is used in c.c. In uctags parsers (or c.c in particular) don't call this function at all - this function is just called in main and based on it some tags are filtered-out. However, in Geany's c.c this function helps to the parser to determine what to do - if the ignored token is e.g. a macro with parameters, everything is skipped until ")".
This also translates to the ignore file format - in the Geany version there's the "+" which indicates that parentheses should be skipped and "=" which specifies with what the token should be replaced. There's nothing like that in uctags.
Personally I'd be for removing this functionality as if we want to preserve it, we'd have to add it to the new cpp uctags parser which doesn't use this. As I said, I don't use the ignored file myself so I'm not the right guy to ask whether it's something important for others.
techee commented on this pull request.
- if (includeExtensionFlags ()
+ && isXtagEnabled (XTAG_QUALIFIED_TAGS) + && doesInputLanguageRequestAutomaticFQTag ()) + buildFqTagCache (tag); + +#ifdef CTAGS_LIB + getTagScopeInformation((tagEntryInfo *)tag, NULL, NULL); + + if (TagEntryFunction != NULL) + { + ctagsTag t; + + initCtagsTag(&t, tag); + length = TagEntryFunction(&t, TagEntryUserData); + } +#else
This is something I was considering too. We could "write" tags into Geany's structures in the callback of the writer and use normal way of ctags initialization and parsing.
What however worries me is the MIO (TagFile.mio from entry.c) into which a writer is supposed to write - right now it's automatically opened/closed by ctags for any writer and that's not something we want. If we somehow avoid it and decide to keep it NULL, we have to be very careful that nothing from entry.c that refers it. That would be quite a big change and would make the diff bigger against uctags (and more fragile when something upstream changes).
I agree this is probably the right way it should be done and this is also the way it should probably be introduced into uctags. But I think having this shortcut in Geany which makes sure nothing with unexpected side-effects from ctags is called is better for now.
techee commented on this pull request.
while (1) { nl = nestingLevelsGetCurrent(nestingLevels); - if (nl && nl->type >= kind) + e = getEntryOfNestingLevel (nl); + if ((nl && (e == NULL)) || (e && (e->kind - AsciidocKinds) >= kind))
Because it can be set to CORK_NIL in makeAsciidocTag() when vStringLength (name) == 0.
techee commented on this pull request.
{
- e.extensionFields.scopeKind = &(AsciidocKinds [nl->type]); - e.extensionFields.scopeName = vStringValue (nl->name); + e.extensionFields.scopeKind = &(AsciidocKinds [parent->kind - AsciidocKinds]);
Indeed. Blame rst.c from uctags which I used as a template for these simple parsers. Don't blame me for not thinking while doing that :-).
techee commented on this pull request.
{
- e.extensionFields.scopeKind = &(AsciidocKinds [nl->type]); - e.extensionFields.scopeName = vStringValue (nl->name); + e.extensionFields.scopeKind = &(AsciidocKinds [parent->kind - AsciidocKinds]); + e.extensionFields.scopeName = parent->name;
Yeah, again copied from rst.c, probably unnecessary.
techee commented on this pull request.
@@ -290,12 +289,12 @@ static kindOption CKinds [] = {
{ true, 'g', "enum", "enumeration names"}, { true, 'm', "member", "class, struct, and union members"}, { true, 'n', "namespace", "namespaces"}, - { false, 'p', "prototype", "function prototypes"}, + { true, 'p', "prototype", "function prototypes"},
The thing is that the "enabled" field was completely ignored by ctags in Geany and everything was enabled.
Now thinking about it we should probably keep doing that - for a library it's more logical to get all tags and let the code decide what to do with them. Also if we migrate to the cpp parser, we'd have to introduce diffs like that there too.
techee commented on this pull request.
contextual_fake_count = 0;
Assert (passCount < 3); - cppInit ((bool) (passCount > 1), isInputLanguage (Lang_csharp), isInputLanguage (Lang_cpp), &(CKinds [CK_DEFINE])); + + kind_for_define = CKinds+CK_DEFINE; + + cppInit ((bool) (passCount > 1), isInputLanguage (Lang_csharp), isInputLanguage(Lang_cpp), + kind_for_define);
Yeah, not necessary for our c.c. It was taken from uctags where there was a branch where the value depended on whether parsing Vera or c.
techee commented on this pull request.
@@ -1221,11 +1219,10 @@ static void addOtherFields (tagEntryInfo* const tag, const tagType type,
{ tag->extensionFields.access = accessField (st); } - if ((true == st->gotArgs) && (true == Option.extensionFields.argList) &&
At least Option.extensionFields.argList doesn't exist. Since it was always TRUE for us, I didn't investigate more.
techee commented on this pull request.
@@ -825,5 +859,6 @@ extern parserDefinition *PythonParser (void)
def->kindCount = ARRAY_SIZE (PythonKinds); def->extensions = extensions; def->parser = findPythonTags; + def->useCork = true; return def; }
Yes, exactly, the new parser should be used. I just didn't want want to add one more big diff into this patch.
Will fix the above minor issues though.
techee commented on this pull request.
@@ -1,4 +1,5 @@
# format=tagmanager -ENTSEQNO�16�(seq)�0�FUNCSTS -MEMTXT�16�(form_msg)�0� -MEMTXT�1024�(form_msg)�0� +ENTSEQNO�16�(eq)�0�FUNCSTS +MEMTXT�16�(mail)�0� +MEMTXT�1024�(orm_msg)�0�FUNCSTS
Haven't noticed - will have a look at these.
techee commented on this pull request.
- {
- GStatBuf s; - - /* load file to memory and parse it from memory unless the file is too big */ - if (g_stat(file_name, &s) != 0 || s.st_size > 10*1024*1024) - parse_file = TRUE; - else - { - if (!g_file_get_contents(file_name, (gchar**)&text_buf, (gsize*)&buf_size, NULL)) - { - g_warning("Unable to open %s", file_name); - return FALSE; - } - free_buf = TRUE; - } - }
This is now done in ctags - see getMio(). The mem. size is just 1MB there instead of 10MB here but for normal sources it doesn't matter (and since ctags uses fgets() for input reading, even the file-based access is quite fast).
techee commented on this pull request.
exception = (exception_t) setjmp (Exception); - retry = false;
Ah, right. It's actually in the uctags version - I probably didn't notice this code path. Will add.
techee commented on this pull request.
@@ -114,6 +150,14 @@ extern const char *tagFileName (void)
* Pseudo tag support */
+extern void abort_if_ferror(MIO *const mio) +{ +#ifndef CTAGS_LIB + if (mio_error (mio)) + error (FATAL | PERROR, "cannot write tag file"); +#endif
We never write the tag file so this should never be called. I could probably remove the ifdefs around it, it was just I was worried someone else might decide to call it in the future in ctags since the function is exported.
techee commented on this pull request.
@@ -38,7 +38,14 @@ AC_PROG_LN_S
# autoscan start
# Checks for header files. -AC_CHECK_HEADERS([fcntl.h fnmatch.h glob.h stdlib.h sys/time.h]) +AC_CHECK_HEADERS([fcntl.h glob.h stdlib.h sys/time.h errno.h limits.h]) + +# Checks for dependencies needed by ctags +AC_CHECK_HEADERS([fnmatch.h direct.h io.h sys/dir.h]) +AH_TEMPLATE([USE_STDBOOL_H], [whether or not to use <stdbool.h>.]) +AC_DEFINE([USE_STDBOOL_H]) +AH_TEMPLATE([CTAGS_LIB], [compile ctags as a library.]) +AC_DEFINE([CTAGS_LIB])
OK
techee commented on this pull request.
return NULL;
+ + start = strdup (vStringValue (signature));
OK
techee commented on this pull request.
extern bool isExcludedFile (const char* const name);
+extern bool isIncludeFile (const char *const fileName); +/* GEANY DIFF */ +/* extern const ignoredTokenInfo * isIgnoreToken (const char *const name); */ extern bool isIgnoreToken (const char *const name, bool *const pIgnoreParens, const char **const replacement);
See my comment below. My preference would indeed be using the new implementation unless the loss of functionality is a big obstacle for someone.
static bool createTagsWithFallback1 (const langType language, ...
Doesn't this miss initializeParser (language);?
No. We call
``` /* make sure all parsers are initialized */ initializeParser (LANG_AUTO); ```
inside ctagsInit() in ctags-api.c for all languages only once and don't need to do it every time when parsing.
In runParserInNarrowedInputStream(): shouldn't more be guarded? Here the meat of the function is disabled with CTAGS_LIB, but not all.
What else do you think should be guarded? The rest seems to just manipulate the input file's MIO which shouldn't be a problem I think.
static void attachEndFieldMaybe (int macroCorkIndex) ... Would be nice to have that :)
Do you plan to use cork in c.c? I can add it (or it can be added later when cork in c.c is used).
I have been thinking how to proceed when we decide to switch to the uctags new cxx parser. Since we'll still need c.c for other languages and since the lcpp.c file differs significantly between Geany and uctags, it might be necessary to have lcpp-old.c (current Geany version) which would be used by c.c and lcpp.c (uctags version) which would be used by cxx. Then C/C++ could be removed from c.c and you suggested separating Vala to a new parser and then it might be easier to either merge both lcpp implementations.
In directiveDefine(): Why use a completely different signature collection logic than CTags'? I get that might be related to c.c's handling of it, but not sure how what affects what.
The difference is that in uctags the c.c parser collects the signature "manually" (its collection is spread across c.c and there are many calls of vStringPut (Signature, c)) while in our implementation signature collection is hidden from the parser inside lcpp.c and c.c just has to inform lcpp when to start and stop collecting. This is a bit nicer from the parser point of view because it means less lines in the parser itself.
techee commented on this pull request.
@@ -1221,11 +1219,10 @@ static void addOtherFields (tagEntryInfo* const tag, const tagType type,
{ tag->extensionFields.access = accessField (st); } - if ((true == st->gotArgs) && (true == Option.extensionFields.argList) &&
Just checked it an it's handled by renderExtensionFieldMaybe() inside main where isFieldEnabled(FIELD_SIGNATURE) is being checked. So there's no need for the check in the parser.
techee commented on this pull request.
@@ -20,5 +20,5 @@ obj
qar�64�Struct.Union�0�int quxx�64�Struct.Union�0�bool test.simple�256�0 -tfun�16�(T)�Class�0�auto +tfun�16�(T v)�Class�0�auto
This doesn't seem to be a valid D code (or valid code for any function-like declaration from c.c). The way signature collection works now it probably replaces the original signature "T" with the second signature "T v" when the second "(" appears. I could probably adjust the signature collection code but not sure if it's worth it as this shouldn't happen in a valid code.
@techee pushed 9 commits.
8455f8e lcpp, c: Fix signature collection d5823ce c.c: Reinitialize rescan variable in case longjmp() happens 634719a c.c: Eliminate useless kind_for_define variable cf6653a Don't check if kind is enabled for library 6176846 Remove unnecessary use of AH_TEMPLATE 3e8365c asciidoc, rest: Remove unnecessary scope assignments e5b5326 python: Rename corkPair struct to corkInfo df0ae9b python: minor cleanups bf4eb8f lcpp: use eStrdup() instead of strdup()
@b4n I think I've fixed all the issues from the comments - please let me know if I forgot about something.
techee commented on this pull request.
@@ -1,4 +1,5 @@
# format=tagmanager -ENTSEQNO�16�(seq)�0�FUNCSTS -MEMTXT�16�(form_msg)�0� -MEMTXT�1024�(form_msg)�0� +ENTSEQNO�16�(eq)�0�FUNCSTS +MEMTXT�16�(mail)�0� +MEMTXT�1024�(orm_msg)�0�FUNCSTS
I fixed the missing start letter (the collection didn't work correctly for ungetc()).
I'm not completely sure why the output is different than before - apparently the parser is terribly confused (both before and now too) and the output is strange.
Didn't mention it explicitly before but because of the sync with ctags, re-parsing now happens correctly in Geany (before wrong indexes were used so only one method was used for parsing). This is one change that might change the output but it could be something different too (haven't really checked as there's a difference just in this case which is wrong anyway so it doesn't matter much).
@techee awesome :) It'll take me a little time to also get back to that well enough to understand my questions and your answers, but I'll try and do that as soon as possible.
Great! Yeah, pure fear of the size of the patch made me not to touch it for such a long time (and it wasn't that bad at the end, thought it would take me a week to remember what it does).
By the way, has anything happened with travis? Seems no checks are running.
By the way, has anything happened with travis? Seems no checks are running.
@eht16 mode some changes to the organisation IIRC, did that break it?
techee commented on this pull request.
@@ -27,6 +27,17 @@
/* * DATA DEFINITIONS */ + +struct corkPair { + int index; +};
I kept the struct, just renamed it to corkInfo (int corkIndex is already used at various places in functions and I couldn't think of any good name). Will be replaced by the new parser anyway.
Travis seems to be triggered properly, I can see recent builds, e.g. https://travis-ci.org/geany/geany-plugins/jobs/202906210 and https://travis-ci.org/geany/geany/builds/202366300. The last build of this PR ran three days ago: https://travis-ci.org/geany/geany/builds/201830001.
For some reason, the build results are not shown here anymore. I don't see yet why.
@eht16 there have been no results displayed on the PRs since beginning of January. I checked other projects and they are all showing results fine, its just Geany. What changed on new years eve? ;)
On December 31st, 2016, I restricted third party access to the Geany Github organization because the previous settings were pretty much open (I noticed while coupling readthedocs.org with my personal Github account and got flooded with Geany repositories which was quite surprising).
I just re-granted organization access to the geanyadmin user which is used for the Travis integration. I'll try to trigger a build for this PR and ideally we will see the results then again.
@eht16 its already got the checks in progress message, thats better :)
Yeah, looking good. Travis integration should indeed be enabled for anyone because if some non-member creates a pull request, he should see the build status.
Woohoo, working again. Sorry for the trouble.
@eht16 any way of triggering the PRs that were committed to after Jan 01?
Sort of. In Travis (when logged in as the "geanyadmin" user), builds can be restarted. But we need to do this per build. If you want me to do, I could try to restart the last build of the recent PRs back to beginning of January.
Probably not worth it if its one at a time, was hoping for a bulk method, this was the biggest PR (by far).
End hijack of @techee PR.
@techee OK, I hope I'll have some time to try and tackle that one again. However, do you think I should review this as is, or first try and update it with more recent U-CTags? Basically it depends on how easy you think it'll be to sync with uctags master (hopefully easy enough as it's one of the goals :smile:), but if there's tricky part maybe it could be worth updating the year+ changes I caused and then tackle that monster.
@b4n Hey, cool! I would prefer if this pull request could be merged first and then I'd make another for the last 2 year upstream changes. This one is the hard part because it represents something like 12 years of upstream development and I'm worried I'd screw up if I started doing some changes on top of this.
b4n commented on this pull request.
{
- const char *p = parameter; - bool mode = true; - int c; + bool tagFileResized; + pushNarrowedInputStream (language, + startLine, startCharOffset, + endLine, endCharOffset, + sourceLineOffset); +#ifndef CTAGS_LIB + tagFileResized = createTagsWithFallback1 (language); +#endif + popNarrowedInputStream (); + return tagFileResized;
In runParserInNarrowedInputStream(): shouldn't more be guarded? Here the meat of the function is disabled with CTAGS_LIB, but not all.
What else do you think should be guarded? The rest seems to just manipulate the input file's MIO which shouldn't be a problem I think.
My point was that then the whole manipulation looks like a no-op, and so if guarding anything, the whole body could be guarded.
BTW, I see that now it returns a uninitialized value (`tagFileResized`).
{
- const char *p = parameter; - bool mode = true; - int c; + bool tagFileResized; + pushNarrowedInputStream (language, + startLine, startCharOffset, + endLine, endCharOffset, + sourceLineOffset); +#ifndef CTAGS_LIB + tagFileResized = createTagsWithFallback1 (language); +#endif + popNarrowedInputStream (); + return tagFileResized;
BTW-2, isn't this needed to run sub-parsers?
+#include <stdio.h> +#include <string.h> +#include <errno.h> + +static bool nofatalErrorPrinter (const errorSelection selection, + const char *const format, + va_list ap, void *data CTAGS_ATTR_UNUSED) +{ + fprintf (stderr, "%s: ", (selection & WARNING) ? "Warning: " : "Error"); + vfprintf (stderr, format, ap); + if (selection & PERROR) +#ifdef HAVE_STRERROR + fprintf (stderr, " : %s", strerror (errno)); +#else + perror (" ");
confusing indent for L36 and L34 as they are guarded by the `if` on L32
@@ -174,6 +175,23 @@ extern void cppUngetc (const int c)
Assert (Cpp.ungetch2 == '\0'); Cpp.ungetch2 = Cpp.ungetch; Cpp.ungetch = c; + if (collectingSignature) + vStringChop (signature); +} + +static inline int getcAndCollect ()
Picky: `(void)` for the arglist
if (! tag->kind->enabled)
return; +#endif
Maybe instead of removing this altogether the library consumer could have to enable kinds he cares about? Not sure, as I tend to agree with you that post-filtering seems better, but maybe filtering ahead has other advantages like performance. Anyway, looks good enough for us, but could make sense to keep in a final lib API.
@@ -20,5 +20,5 @@ obj
qar�64�Struct.Union�0�int quxx�64�Struct.Union�0�bool test.simple�256�0 -tfun�16�(T)�Class�0�auto +tfun�16�(T v)�Class�0�auto
The source is valid D code according to GDC. And from what I can guess, the signature is more likely to be `(T v)` than `(T)` (as `v` is used as a variable in the body); I'd guess the `(T)` part is similar to C++'s `<T>` in `tfun<T>(T v)`.
So I'd say this change is even better.
b4n requested changes on this pull request.
Okay, finally! Looks mostly good to me from reviewing it, minus the few things below. I'll have to test it, but if you could resolve the merge conflicts for me that'd be wonderful :]
@techee pushed 4 commits.
9b44e9f Fix parsing with sub-parsers 25e8ebc Fix indent 248c970 Add missing void 99e0f20 Merge branch 'master' into ctags_sync_main
techee commented on this pull request.
{
- const char *p = parameter; - bool mode = true; - int c; + bool tagFileResized; + pushNarrowedInputStream (language, + startLine, startCharOffset, + endLine, endCharOffset, + sourceLineOffset); +#ifndef CTAGS_LIB + tagFileResized = createTagsWithFallback1 (language); +#endif + popNarrowedInputStream (); + return tagFileResized;
You are right, sub-parsers wouldn't work. I tried to fix this in the submitted patch (even though I haven't tried any parser using sub-parsers yet so no guarantees everything related to sub-parsers actually works).
techee commented on this pull request.
+#include <stdio.h> +#include <string.h> +#include <errno.h> + +static bool nofatalErrorPrinter (const errorSelection selection, + const char *const format, + va_list ap, void *data CTAGS_ATTR_UNUSED) +{ + fprintf (stderr, "%s: ", (selection & WARNING) ? "Warning: " : "Error"); + vfprintf (stderr, format, ap); + if (selection & PERROR) +#ifdef HAVE_STRERROR + fprintf (stderr, " : %s", strerror (errno)); +#else + perror (" ");
Yeah, we'd need some proper API for that which we don't have. The problem I was running into was that in tag definitions of individual parsers some tags were disabled by default and "make check" failed because of that. So it was either disabling the enabled check here or changing all the affected parsers tag definitions and this was just easier.
Finally^2 (If we continue with this speed, we might have it merged around the time we are retired ;-)
I resolved the merge conflicts by merging master into this branch. It might not be clear what exactly I did, so:
1. The lcpp problem
https://github.com/geany/geany/commit/6f692112e39fa0714d882332b8f935419fd69a...
seemed to be fixed upstream so I kept the version from this pull request and discarded your patch.
2. I kept the fix from
https://github.com/geany/geany/commit/63850b3eb7302b6ce1c4223581be7687dc231a...
in parse.c - I didn't find this function in current ctags master so this code will probably be removed in the future ctags sync.
3. I added the unsigned from
https://github.com/geany/geany/commit/dbd0573dd6a5175ec9b536c4053bb66caac0f6...
to the corresponding function where this code is now.
If we continue with this speed, we might have it merged around the time we are retired ;-)
Yep, we have a chance of finishing it in this lifetime! :)
- The lcpp problem
https://github.com/geany/geany/commit/6f692112e39fa0714d882332b8f935419fd69a...
seemed to be fixed upstream so I kept the version from this pull request and discarded your patch.
Yeah, apparently it was fixed separately: https://github.com/universal-ctags/ctags/commit/4ceab02734d91bcbaf765be0f4e4...
OK for the other two. I'll check the rest of the changes in the coming days (yes yes I will -- or at least try hard)
As this has gone nowhere in two years, available resources are less than back then, and is holding up other ctags related stuff, I suggest it be rejected in its current form.
Instead a progressive approach should be developed to re-arrange the directories to match ctags (using git-mv so as not to lose history), then the ctags parsers can be updated one at a time.
@elextr I don't think so. We're almost there, and updating parsers basically relies on a lot of this PR (it's the point of it) because UCTags added quite a few facilities and changed things and parsers are using those.
And regarding the status of this, yes it took a long time, but I'm pretty sure we're almost there (last review was close to merge state), I'm committed to this and work a little on it, although these past few weeks I focused on what would make 1.34. It's one of the PRs I'd like to merge early in 1.35 if we don't encounter no new unexpected issue.
@elextr for example all the Python parser issues would be fixed by using the UCTags parser, but it relies on quite a few features we don't currently have. And maintaining a version that don't use those would create a difference gap again, with all the maintenance effort that go with it.
@b4n Well, I didn't get a sense that you were making progress, will hassle you immediately after 1.34 release so it can be committed ASAP to allow for some time to mature before 1.35.
[PS am only accepting it so I can have the new C++ added after the infrastructure is fixed :grin: ]
@elextr I agree with @b4n - parsers cannot be updated one at a time unless you have these changes in the ctags core. And I agree with you it would be nice if this could be merged at the beginning of the release cycle so it can be tested.
I haven't checked what has changed in ctags in the last two years and if there's something more we need to import in order to be able to grab the ctags parsers but once these patches are merged, I'll try to prepare a new patch set so we can sync the parsers easily. And I'll do my best to do it in a timely manner and not in several months ;-).
b4n requested changes on this pull request.
OK, here's my latest with the help of GCC. See also https://github.com/techee/geany/pull/1 which contains some small stuff that I think are relevant, but please review them, don't take them as gospel.
Apart from that, I think the next thing is merging an see how it goes, as as far as I can tell it works very well.
pushNarrowedInputStream (language,
startLine, startCharOffset, endLine, endCharOffset, sourceLineOffset); #ifndef CTAGS_LIB tagFileResized = createTagsWithFallback1 (language); +#else + /* Simple parsing without rescans - not used by any sub-parsers anyway */
I don't really care right now because as you pointed out there's no use case yet, but you mentioning in the commit message that "the C and Fortran parsers" being the only parsers using retry mode and that they do not support sub-parsers makes the retry case irrelevant is not correct: nothing prevents a parser from calling the C or Fortran parsers as sub-parsers. IIUC, the upstream Flex or so parser actually does this.
But as said, don't worry too much about that right now; we can fix this stuff when we actually have use for it.
else if (c == SINGLE_QUOTE)
break; else if (c == NEWLINE) { - ungetcToInputFile (c); - break; - } - else if (count == 1 && strchr ("DHOB", toupper (c)) != NULL) - veraBase = c; - else if (veraBase != '\0' && ! isalnum (c)) - {
I can't find these lines being suppressed upstream, so I assume it's a local change. However, you should then remove the variable declaration for `veraBase` as well.
vString *const parent, int is_class_parent, const char *arglist)
{ tagEntryInfo tag; + int corkIndex; + int fqCorkIndex = CORK_NIL; + const struct corkInfo nilInfo = {CORK_NIL};
unused variables `fqCorkIndex` and `nilInfo`, any reason to have them?
vString *const parent, int is_class_parent, const char *arglist)
{ tagEntryInfo tag; + int corkIndex; + int fqCorkIndex = CORK_NIL; + const struct corkInfo nilInfo = {CORK_NIL};
I see they were upstream in the old Python parser, but they were used at the time.
if (vStringLength (name) > 0) { + tagEntryInfo *parent = getEntryOfNestingLevel (nl);
This variable is unused, but upstream (*rst.c*) uses it to mess with the scope (which seems stupid, but well). Unused variables have that issue that people have a tendency to remove them… which generally seems legitimate. BTW, if we remove this, then `nl` gets unused, but we still gotta call `getNestingLevel()` because it does the `nestingLevelsPop()` stuff needed. What do you think?
if (vStringLength (name) > 0) { + tagEntryInfo *parent = getEntryOfNestingLevel (nl);
same here than in *rest.c*, `parent` is unused, same question applies.
@techee pushed 9 commits.
33a102ee918b0f12b2bda0abba7a4b86bce60efd Add dummy definitions to Assert* macros for suppressing compiler warnings 749ab56058f22b0048d1627b8e930fbcd070833a main: remove duplicated declarations f93aa1478809fb5c5842ac9dd3bad35beab29067 Remove an unused function that was removed upstream f61bd540b760005a6bb78bde8c813d48e759d5a9 Remove two unused variables e792d4fdf959516a193d2c2e4ac3f737a6e1954d Don't export some local functions 89a934edeec10d4d584896dd29d02fa15f2c2fca Guard some more code for non-library builds only 1c3634ebb9d5b179cbddcbbb567fe81446dd069f Update functions and declarations checks for current ctags 66a08f6570c189e66b31863d60ec68052e1d2d5f Remove an unused variable and fix a related comment e47612b7079b5b8888fdb0de718129a6ea835688 Remove some unused code from asciidoc and rest parsers
techee commented on this pull request.
pushNarrowedInputStream (language,
startLine, startCharOffset, endLine, endCharOffset, sourceLineOffset); #ifndef CTAGS_LIB tagFileResized = createTagsWithFallback1 (language); +#else + /* Simple parsing without rescans - not used by any sub-parsers anyway */
OK, good point. I wanted to avoid re-parsing within subparsers because this would complicate things in the tag manager - if re-parsing happens in the main parser, we clear the tag list and start over. However, if such a thing happened within a subparser, we'd have to have some way to clear just the part of the tags generated by the subparser while leaving the tags generated by the main parser and we currently don't have anything like that in place.
Personally I'd just sacrifice re-parsing in this case (i.e. when using flex running cpp parser as the subparser) to simplify things. If I remember correctly, re-parsing happens quite rarely, at least when I tried the linux kernel sources.
techee commented on this pull request.
if (vStringLength (name) > 0) { + tagEntryInfo *parent = getEntryOfNestingLevel (nl);
I've added a patch to remove those for now. If we need the functions in the future after we start syncing parsers, we can grab them from upstream.
@b4n Thanks for the patches, I've added them to this pull request and also removed the currently unused code from asciidoc and rest parsers.
I guess it's kind of safe to assume now that there won't be too many changes in this pull request, right? If so (and if I have some time left this week), I can start checking what changes have been made in the current ctags master in the last two years which we might need to have all the necessary stuff from main so we can just take over parsers from ctags.
I had a brief look at what needs to be done so we can grab ctags parsers - the history of the Python parser summarizes it quite nicely:
https://github.com/universal-ctags/ctags/commits/master/parsers/python.c
There seem to have been some refactorings/renames in main we'll need to get but there doesn't seem to be anything major fortunately. I'll make a pull request to make these changes first so parser syncing can start.
Question: there seem to be quite massive changes in the lregex.c/h upstream and this one isn't so trivial to merge because we use GRegex and ctags uses GNU Regex. Syncing this would mean taking care of the different semantics of the two regex libraries which might be quite a lot of work. What if we just removed support of regex-based parsers from Geany and #if0'd the bodies of all functions from lregex.c? Currently we use regex parsers for HTML, Cobol and ActionScript - for HTML there's a token-based parser upstream we can use which would mean we'd just kill parsing support for Cobol and ActionScript. I may be biased but I don't think these particular languages are so important to justify all the work related to the regex-based parsers.
Just for info, I've synced ctags to the state where the majority of parsers can just be copied from upstream ctags without any modification. The code is here
https://github.com/techee/geany/commits/ctags_sync3
and I'll convert it to a proper pull request once this pull request is merged. As part of this branch I used the upstream versions of html/jscript/css parsers to be able to debug subparser support and it seems to be working pretty well now after some patches.
BTW: We should consider using git-subtree for future updates. This allows us to track an upstream git repository while having local changes. Then updating becomes a simple merge.
b4n commented on this pull request.
if (vStringLength (name) > 0) { + tagEntryInfo *parent = getEntryOfNestingLevel (nl);
It's incorrect, because I stated in my commend above, we still have to call `getNestingLevel()` because it's that function that calls `nestingLevelsPop()` when appropriate. Just removing the call removes this side effect, resulting in never removing nest levels.
This is pretty much exactly what I tried to warn about in the comment above when sating that *"have a tendency to remove [unused variables]"*, suggesting it's not so simple here. What we can do is call the function without storing its value, and possible rename it to be clearer.
techee commented on this pull request.
if (vStringLength (name) > 0) { + tagEntryInfo *parent = getEntryOfNestingLevel (nl);
Ah, OK, I misunderstood (and sadly didn't read the code enough and fell to the trap). I'll fix that and re-push this patch.
techee commented on this pull request.
if (vStringLength (name) > 0) { + tagEntryInfo *parent = getEntryOfNestingLevel (nl);
Done.
I guess it's kind of safe to assume now that there won't be too many changes in this pull request, right?
yeah, should be pretty close to merge.
Question: there seem to be quite massive changes in the lregex.c/h upstream and this one isn't so trivial to merge because we use GRegex and ctags uses GNU Regex. Syncing this would mean taking care of the different semantics of the two regex libraries which might be quite a lot of work. What if we just removed support of regex-based parsers from Geany and #if0'd the bodies of all functions from lregex.c? Currently we use regex parsers for HTML, Cobol and ActionScript - for HTML there's a token-based parser upstream we can use which would mean we'd just kill parsing support for Cobol and ActionScript. I may be biased but I don't think these particular languages are so important to justify all the work related to the regex-based parsers.
Hum… on one hand it's kind of sad not to allow regex parsers because some u-ctags parser use it (12 built-in ones apparently), and it makes creating a basic parser easier; on the other I already said that I'd rather not have an additional copy of the GNU regex lib and I understand that having a GRegex module is trickier. Ideally we'd submit an alternative GRegex-based engine upstream and keep it up-to-date over there.
Anyway, for the case at hand: I don't really like dropping support for languages we have, but me being myself I'll give a shot at translating those parsers as "real" ones, which should make them faster as well, and unless they have highly complicated regexes, it should be fairly easy. So I'd say you can for now assume we can at least temporarily drop lregex, and we'll see what we can do after.
b4n approved this pull request.
Hum… on one hand it's kind of sad not to allow regex parsers because some u-ctags parser use it (12 built-in ones apparently), and it makes creating a basic parser easier; on the other I already said that I'd rather not have an additional copy of the GNU regex lib and I understand that having a GRegex module is trickier. Ideally we'd submit an alternative GRegex-based engine upstream and keep it up-to-date over there.
Doesn't the regex syntax differ between them? If so, it would mean also to support different regexes for all the regex-based parsers which is probably not going to happen.
Anyway, for the case at hand: I don't really like dropping support for languages we have, but me being myself I'll give a shot at translating those parsers as "real" ones, which should make them faster as well, and unless they have highly complicated regexes, it should be fairly easy.
The diff against ctags looked really scary but maybe if I just took the uctags version and GRegex-ified it, it might not be so bad. I just wasn't particularly motivated by the languages we use - ActionScript (which is used by Flash and is about to die) and Cobol which might be used by some legacy mainframe software but that's about it. I also don't think it's worth writing proper parsers for these. It might be different if we want to use more regex-based parsers from uctags but my current thinking is something like: "If a language is popular enough, it probably has a proper parser in uctags and then we should support it in Geany. If the language is not popular enough to have an uctags parser, there's no need to support it in Geany".
So I'd say you can for now assume we can at least temporarily drop lregex, and we'll see what we can do after.
This is what I did in the ctags_sync3 branch.
Merged #1263 into master.
Doesn't the regex syntax differ between them?
Hum, good point, it might. Although, it probably is "mostly" the same if using the same extension mode. IIRC GRegex is using gnu regex, but maybe modified.
The diff against ctags looked really scary but maybe if I just took the uctags version and GRegex-ified it, it might not be so bad.
That'd be a good surprise ^^
I just wasn't particularly motivated by the languages we use - ActionScript (which is used by Flash and is about to die) and Cobol which might be used by some legacy mainframe software but that's about it.
I'd be more interested in Cobol than ActionScript, but yeah both are probably niches in 2018.
I also don't think it's worth writing proper parsers for these.
Meh, were would the fun be then? :(
It might be different if we want to use more regex-based parsers from uctags but my current thinking is something like: "If a language is popular enough, it probably has a proper parser in uctags and then we should support it in Geany. If the language is not popular enough to have an uctags parser, there's no need to support it in Geany".
Makes some sense. And actually, the largest use case for the regex engine in ctags is custom language definitions on the CLI/options file, and we don't support that, so it's no so useful.
So… more than 2 years after, this is merged! Thanks, and we're ready for more :)
I'd be more interested in Cobol than ActionScript, but yeah both are probably niches in 2018.
@b4n, Cobol at least would seem to be still alive, see #2021
And anyway whats wrong with ctags using a different regex library, its something we link to, and its a dynamic dll, not as if its anything inside Geany itself. Hell, its probably already in memory for most Linuxen.
Hi everyone !
If I may, and in my humble opinion, it would be a mistake to drop support for COBOL :
- There's a huge heritage of code lines (220 out of 310 billion code lines developed worldwide, estimate at the beginning of this year) - There are still many companies on the mainframe (including banks and insurance companies) - Geany is one of the few to take it into account and it is an argument - Companies running on AIX mainframe solutions (Linux) need a local IDE - On my last three missions, I managed to replace Notepad+++ by Geany (integration of the compilation exe on F5), especially with the filedef I provide you today
So yes I know that COBOL isn't modern, sexy but it is widely used. And in this case, why not also abandon Assembler, Fortran or Pascal ? (and I spare you the REXX filedef I developed) ;)
@DansLeRuSH its great that you have contributed filetype improvements. But if Cobol is so useful then Cobol users should step up and provide support for the parser.
Geany is a totally volunteer project, its is not supported by any companies. Those banks and companies you mention should be able to have their people contribute if they want support to continue (sadly I suspect they all use Eclipse).
The suggestion to abandon Cobol is because something needs to be done to continue support, and in a volunteer project, if no one contributes that, then support can not continue.
I'm not an expert on whats needed, but IIUC the problem is that the Cobol parser uses a different Regex library from the rest of Geany, and, as I asked above I'm not sure why thats an issue, and what is needed to make it work. Perhaps open a specific issue to discuss it if its an issue [pun intended].
@DansLeRuSH its great that you have contributed filetype improvements.
You're welcome, I love Geany and if I could help :blush:
But if Cobol is so useful then Cobol users should step up and provide support for the parser.
You're absolutely right ! I told so to my past and actual colleagues. Perhaps improving actual possibilities could help :thinking:
Geany is a totally volunteer project, its is not supported by any companies. Those banks and companies you mention should be able to have their people contribute if they want support to continue
The main problem here is that those companies love to give a lot of money to historical software suppliers, have no trust in opensource projects and there's a gap between CIO/supplier trade agreements and developpers needs. But yes I'm agree with you.
(sadly I suspect they all use Eclipse).
You're partly right if it's a classical mainframe on z/OS. The problem here is that IBM has built a solution on top of Eclipse (IDz) but it's not open source and really expensive, small companies can't afford that.
The suggestion to abandon Cobol is because something needs to be done to continue support, and in a volunteer project, if no one contributes that, then support can not continue.
I'm aware of that and abover all, thank you for the hard work !
I'm not an expert on whats needed, but IIUC the problem is that the Cobol parser uses a different Regex library from the rest of Geany, and, as I asked above I'm not sure why thats an issue, and what is needed to make it work. Perhaps open a specific issue to discuss it if its an issue [pun intended].
I'm a proud daddy and I haven't so much time but I could help sometimes If you need :vulcan_salute: :nerd_face:
@DansLeRuSH just a final comment on this closed PR, when we say Cobol is removed, what we mean is that support for symbols is removed, but editing highlighting etc is still supported. The filetype is still available.
The filetype is still available.
Which means the changes in #2021 to the Cobol filetype are still useful since they apply to the highlighting and other non-parser functionality.
github-comments@lists.geany.org