This is a rebased #2132. I once again updated ctags main to the latest version so it corresponds to current uctags master, then modified some parsers because in the last 1.5 years there have been some minor changes upstream which had to be addressed in the parsers. I also added the `geany_` prefix to all the parsers as I suggested somewhere so we can distinguish parsers which are not completely synced with the upstream version from parsers which can just be copied over from uctags. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/2666
-- Commit Summary --
* Update to latest ctags main * Add patch to modify anonymous tag name * Move EXTERNAL_PARSER_LIST to the beginning of BuiltInParsers * Various parser updates to make them compatible with latest ctags main * Rename vStringItem to vStringChar * attachParserField now takes an extra parameter * useCork is now a bit field * Rename nestingLevelsGetNth() to nestingLevelsGetNthFromRoot() * Update cobol and flex parsers to use latest ctags main definitions * Update TM to use latest universal ctags * Update tests * Update HACKING * Add the geany_ prefix to all parser files
-- File Changes --
M HACKING (14) M configure.ac (3) M ctags/Makefile.am (167) A ctags/ctags_changes.patch (24) M ctags/main/args.c (6) R ctags/main/args_p.h (6) A ctags/main/cmd.c (22) A ctags/main/colprint.c (295) A ctags/main/colprint_p.h (37) D ctags/main/ctags-api.c (144) D ctags/main/ctags-api.h (57) M ctags/main/ctags.h (10) M ctags/main/debug.c (103) M ctags/main/debug.h (32) M ctags/main/dependency.c (353) M ctags/main/dependency.h (26) A ctags/main/dependency_p.h (58) M ctags/main/e_msoft.h (23) M ctags/main/entry.c (1190) M ctags/main/entry.h (207) A ctags/main/entry_p.h (77) A ctags/main/entry_private.c (38) M ctags/main/error.c (43) R ctags/main/error_p.h (0) M ctags/main/field.c (929) M ctags/main/field.h (82) A ctags/main/field_p.h (80) M ctags/main/flags.c (107) R ctags/main/flags_p.h (16) M ctags/main/fmt.c (76) R ctags/main/fmt_p.h (0) M ctags/main/gcc-attr.h (8) M ctags/main/general.h (24) A ctags/main/gvars.h (29) M ctags/main/htable.c (61) M ctags/main/htable.h (46) A ctags/main/interactive_p.h (29) M ctags/main/keyword.c (36) M ctags/main/keyword.h (20) A ctags/main/keyword_p.h (26) M ctags/main/kind.c (750) M ctags/main/kind.h (89) A ctags/main/kind_p.h (82) M ctags/main/lregex.c (2767) A ctags/main/lregex.h (47) A ctags/main/lregex_p.h (77) D ctags/main/lxcmd.c (1227) M ctags/main/lxpath.c (87) A ctags/main/lxpath.h (110) A ctags/main/lxpath_p.h (28) M ctags/main/main.c (321) D ctags/main/main.h (26) A ctags/main/main_p.h (22) A ctags/main/mbcs.c (113) M ctags/main/mbcs.h (29) A ctags/main/mbcs_p.h (32) A ctags/main/mini-geany.c (346) M ctags/main/mio.c (58) M ctags/main/mio.h (4) M ctags/main/nestlevel.c (38) M ctags/main/nestlevel.h (8) M ctags/main/numarray.c (17) M ctags/main/numarray.h (14) M ctags/main/objpool.c (8) M ctags/main/options.c (2330) M ctags/main/options.h (168) A ctags/main/options_p.h (198) D ctags/main/output-ctags.c (59) D ctags/main/output.h (50) A ctags/main/param.c (58) A ctags/main/param.h (38) A ctags/main/param_p.h (36) M ctags/main/parse.c (4695) M ctags/main/parse.h (260) A ctags/main/parse_p.h (177) A ctags/main/parsers_p.h (155) D ctags/main/pcoproc.c (296) D ctags/main/pcoproc.h (29) A ctags/main/portable-dirent_p.h (944) A ctags/main/portable-scandir.c (240) M ctags/main/promise.c (224) M ctags/main/promise.h (12) A ctags/main/promise_p.h (26) M ctags/main/ptag.c (204) R ctags/main/ptag_p.h (39) M ctags/main/ptrarray.c (9) M ctags/main/ptrarray.h (7) A ctags/main/rbtree.c (468) A ctags/main/rbtree.h (230) M ctags/main/read.c (691) M ctags/main/read.h (52) A ctags/main/read_p.h (81) M ctags/main/repoinfo.h (2) M ctags/main/routines.c (192) M ctags/main/routines.h (76) A ctags/main/routines_p.h (98) A ctags/main/seccomp.c (80) M ctags/main/selectors.c (228) M ctags/main/selectors.h (16) M ctags/main/sort.c (104) R ctags/main/sort_p.h (0) A ctags/main/stats.c (85) A ctags/main/stats_p.h (28) M ctags/main/strlist.c (46) M ctags/main/strlist.h (1) A ctags/main/subparser.h (68) A ctags/main/subparser_p.h (48) A ctags/main/tokeninfo.c (207) A ctags/main/tokeninfo.h (97) A ctags/main/trace.c (120) M ctags/main/trace.h (101) M ctags/main/trashbox.h (47) A ctags/main/trashbox_p.h (30) M ctags/main/types.h (28) A ctags/main/unwindi.c (358) A ctags/main/unwindi.h (84) M ctags/main/vstring.c (14) M ctags/main/vstring.h (23) A ctags/main/writer-ctags.c (434) A ctags/main/writer-etags.c (202) A ctags/main/writer-json.c (294) A ctags/main/writer-xref.c (106) A ctags/main/writer.c (0) A ctags/main/writer_p.h (0) M ctags/main/xtag.c (0) M ctags/main/xtag.h (0) A ctags/main/xtag_p.h (0) R ctags/parsers/geany_abaqus.c (0) R ctags/parsers/geany_abc.c (0) R ctags/parsers/geany_asciidoc.c (0) R ctags/parsers/geany_asm.c (0) R ctags/parsers/geany_basic.c (0) R ctags/parsers/geany_bibtex.c (0) R ctags/parsers/geany_c.c (0) R ctags/parsers/geany_cobol.c (0) R ctags/parsers/geany_css.c (0) R ctags/parsers/geany_diff.c (0) R ctags/parsers/geany_docbook.c (0) R ctags/parsers/geany_erlang.c (0) R ctags/parsers/geany_flex.c (0) R ctags/parsers/geany_fortran.c (0) R ctags/parsers/geany_go.c (0) R ctags/parsers/geany_haskell.c (0) R ctags/parsers/geany_haxe.c (0) R ctags/parsers/geany_html.c (0) R ctags/parsers/geany_iniconf.c (0) R ctags/parsers/geany_jscript.c (0) R ctags/parsers/geany_json.c (0) R ctags/parsers/geany_lcpp.c (0) R ctags/parsers/geany_lcpp.h (0) R ctags/parsers/geany_lua.c (0) R ctags/parsers/geany_make.c (0) R ctags/parsers/geany_markdown.c (0) R ctags/parsers/geany_matlab.c (0) R ctags/parsers/geany_nsis.c (0) R ctags/parsers/geany_objc.c (0) R ctags/parsers/geany_pascal.c (0) R ctags/parsers/geany_perl.c (0) R ctags/parsers/geany_php.c (0) R ctags/parsers/geany_powershell.c (0) R ctags/parsers/geany_python.c (0) R ctags/parsers/geany_r.c (0) R ctags/parsers/geany_rst.c (0) R ctags/parsers/geany_ruby.c (0) R ctags/parsers/geany_rust.c (0) R ctags/parsers/geany_sh.c (0) R ctags/parsers/geany_sql.c (0) R ctags/parsers/geany_tcl.c (0) R ctags/parsers/geany_tex.c (0) R ctags/parsers/geany_txt2tags.c (0) R ctags/parsers/geany_verilog.c (0) R ctags/parsers/geany_vhdl.c (0) M src/tagmanager/Makefile.am (0) A src/tagmanager/tm_ctags.c (0) A src/tagmanager/tm_ctags.h (0) M src/tagmanager/tm_parser.c (0) M src/tagmanager/tm_parser.h (0) R src/tagmanager/tm_parsers.h (0) M src/tagmanager/tm_source_file.c (0) M src/tagmanager/tm_source_file.h (0) M src/tagmanager/tm_tag.c (0) M src/tagmanager/tm_workspace.c (0) M tests/ctags/Package.pm.tags (0) M tests/ctags/bug1938565.sql.tags (0) M tests/ctags/random.sql.tags (0) M tests/ctags/refcurs.sql.tags (0)
-- Patch Links --
https://github.com/geany/geany/pull/2666.patch https://github.com/geany/geany/pull/2666.diff
@techee pushed 1 commit.
68f23972fae45a51ba7a24965cbc70030f59b400 Don't use tag identifier F
@b4n approved this pull request.
LGTM! :tada: :partying_face:
04eec181710a1e8537b7f4f83e98658e3cf18907, 2927ec4dbb139528a07209d551897098425f3ebe, 99a993fbfed30075610a30432ffe5ae1c3ae2fc3, 3847b6c2a9e090a3362d35dac51c3859f628cacd and ccd5330a21318a0f39bcaf330dac1188e25ab6aa could be squashed in 353ceff30518ea0291a1f8cae279d66dd0a1c469 if you like, but it's fine with me as is -- and possibly even better, I'm not completely sure which way I like best.
Apart from that, there's basically no change in the non-main part diff, so there's not much to add to v4 :)
I'll also try and take a look at importing upstream bundled regex library to at least get Jenkins happy, and we can always use another dep later -- unless @codebrainz or @eht16 have another solution at hand?
I'll also try and take a look at importing upstream bundled regex library
It only on windows. Could it just need a newer msys? After all regex is a standard system library.
I'd like to propose a import strategy again, that allows future updates via `git merge` or `git subtree pull`, basically what I did for scintilla4 (see #2600). What do you think?
I proposed it in the previous PR already, here are the advantages again:
- Files where we don't make changes always merge fine (for dirs we don't use). - Our changes and upstream changes are merged by git's methods - deletions/renamings can cause tree conflicts which are messy to deal with
See #2600 for a proof-of-concept, it really works well if you ask me. Upstream releases are imported in a side branch (without any changes, plain, unmodified upstream) and we merge that into geany's master. Our modifications to the project are done directly on geany's master and will be preserved due to git's merging.
@techee pushed 1 commit.
3f0bb8ed4c2073387eba6686774497f974da7424 Add a script performing update of Geany ctags from universal ctags
Independently of @kugel- 's proposal I created a script which updates Geany's ctags from universal ctags. I'm not completely sure if we are ready to do it the way @kugel- proposes - our parsers are still very different, upstream ctags contains many files we are not interested in and personally I don't completely see the advantage of this approach (but I may be missing something).
@techee So for the moment I suggest to bundle GNU regex and fnmatch just the same as upstream ctags does, so we at least have something that build, and should hopefully behave the same as upstream. Here is what I got (that build happily, yet I can't test as we don't run the cross builds): https://github.com/b4n/geany/tree/techee/ctags_sync5%2Bgnu_regex. It's just a copy of upstream ctags *gnu_regex* and *fnmatch* directories, and build system integration.
@kugel- I'm not familiar with subtrees, yet they look tempting. However, here we only use a subset of upstream, so how would we only get what we want? If deleting/renaming is messy, I would expect this to be a huge pain if we have the whole of upstream. And if we only copy a subset of upstream anyway, I start to wonder if it's any better to use a subtree. For the moment I see subtrees tempting *if we can use the upstream repository unmodified*, kind of like a submodule but easier for users. If we need to manually import anyway, or if we lose history, I'm really not sure how it's better than plain copying. We don't actually need any of the power of Git to import changes, we have a very minimal patch that would hopefully disappear one day, and that IMO should stay very visible -- not buried in a years old merge.
I might give this a look to get a feel for it as again I'm not familiar with using those so I might miss some of its advantages though.
I any case though, I don't think that's something that should be done in this PR, or before merging this PR. I'm totally OK with the current state, and if we decide to use a subtree we can do that later, and I really don't want an additional barrier to get that one merged.
@b4n I had a brief look at the regex patches and they look good to me (haven't tried them though) - should I add them to this pull request?
...we have a very minimal patch that would hopefully disappear one day...
I actually have a patch modifying our tag manager to achieve that - it's not completely trivial and maybe worth considering if we want to do it that way so I'll post it separately once this pull request is merged.
@b4n I had a brief look at the regex patches and they look good to me (haven't tried them though) - should I add them to this pull request?
If you're OK with them, yes, please.
...we have a very minimal patch that would hopefully disappear one day...
I actually have a patch modifying our tag manager to achieve that - it's not completely trivial and maybe worth considering if we want to do it that way so I'll post it separately once this pull request is merged.
Depending on what the patch you're talking about involves, would you think upstream would accept a mean of selecting which method is used? It actually could make sense to use their hash version when generating tag files so there's no (likely) duplicates, but use plain numbers when generating a TOC, so maybe there could be an option somehow. Just a thought.
@eht16 or @codebrainz could you try a Windows build to see if that works reasonably well? The Travis build is OK with my changes mentioned above, but I didn't actually test it, so although I'm fairly optimistic it'll work just as well as upstream CTags do, I can't be sure.
@b4n pushed 2 commits.
33fc269ea8d4d03453489ca55a9ca208ec15aae2 Add GNU regex bundle from upstream ctags and use it if missing b1c9096394c9819342747f1215274af469a19483 Add bundled fnmatch from upstream ctags
@b4n I had a brief look at the regex patches and they look good to me (haven't tried them though) - should I add them to this pull request?
If you're OK with them, yes, please.
Actually I can do it myself, and as you seemed to approve I just did.
Tested on Windows: https://download.geany.org/snapshots/geany-1.38_setup-pr2666.exe
I got: ``` /bin/sh ../libtool --silent --tag=CC --mode=compile ccache gcc -DHAVE_CONFIG_H -I. -I.. -I./main -I./parsers -DEXTERNAL_PARSER_LIST_FILE="../src/tagmanager/tm_parsers.h" -DG_LOG_DOMAIN="CTags" -I./gnu_regex -I./fnmatch -mms-bitfields -pthread -mms-bitfields -IC:/msys64/mingw32/include/gtk-2.0 -IC:/msys64/mingw32/lib/gtk-2.0/include -IC:/msys64/mingw32/include/pango-1.0 -IC:/msys64/mingw32/include/fribidi -IC:/msys64/mingw32/include -IC:/msys64/mingw32/include/cairo -IC:/msys64/mingw32/include/atk-1.0 -IC:/msys64/mingw32/include/cairo -IC:/msys64/mingw32/include/lzo -IC:/msys64/mingw32/include -IC:/msys64/mingw32/include/freetype2 -IC:/msys64/mingw32/include -IC:/msys64/mingw32/include/libpng16 -IC:/msys64/mingw32/include/harfbuzz -IC:/msys64/mingw32/include -IC:/msys64/mingw32/include/pixman-1 -IC:/msys64/mingw32/include/gdk-pixbuf-2.0 -IC:/msys64/mingw32/include -IC:/msys64/mingw32/include/glib-2.0 -IC:/msys64/mingw32/lib/glib-2.0/include -IC:/msys64/mingw32/include -DGLIB_VERSION_MIN_REQUIRED=GLIB_VERSION_2_32 -DG_DISABLE_DEPRECATED -DGTK_DISABLE_DEPRECATED -DGEANY_EXPORT_SYMBOL="__declspec(dllexport)" -DGEANY_API_SYMBOL=GEANY_EXPORT_SYMBOL -g -DGEANY_DEBUG -UGEANY_DEBUG -MT main/options.lo -MD -MP -MF $depbase.Tpo -c -o main/options.lo main/options.c main/options.c:616:5: error: redefinition of 'asprintf' 616 | int asprintf(char **strp, const char *fmt, ...) | ^~~~~~~~ In file included from main/args_p.h:17, from main/options_p.h:23, from main/options.c:16: C:/msys64/mingw32/i686-w64-mingw32/include/stdio.h:251:5: note: previous definition of 'asprintf' was here 251 | int asprintf(char **__ret, const char *__format, ...) | ^~~~~~~~ make[2]: *** [Makefile:997: main/options.lo] Error 1 make[2]: *** Waiting for unfinished jobs.... make[2]: Leaving directory '/home/enrico/geany/ctags' make[1]: *** [Makefile:598: all-recursive] Error 1 make[1]: Leaving directory '/home/enrico/geany' make: *** [Makefile:482: all] Error 2 ```
After adding a `asprintf` check: ``` diff --git a/configure.ac b/configure.ac index d835b86a..8ef4360e 100644 --- a/configure.ac +++ b/configure.ac @@ -52,7 +52,7 @@ AC_TYPE_SIZE_T AC_STRUCT_TM
# Checks for library functions. -AC_CHECK_FUNCS([fgetpos fnmatch mkstemp strerror strstr realpath]) +AC_CHECK_FUNCS([asprintf fgetpos fnmatch mkstemp strerror strstr realpath])
# Function checks for u-ctags AC_CHECK_FUNCS([strcasecmp stricmp], [break]) ``` it built fine and runs. Maybe the additional check fits better a few lines below in the u-ctags specific section. I just did it there for quickly testing.
After a very quick look it seems the old code had more specific checks: `#if defined(_WIN32) && !(defined(__USE_MINGW_ANSI_STDIO) && defined(__MINGW64_VERSION_MAJOR))` Now, the code checks for `HAVE_ASPRINTF`, hence the new Autotools check.
Just discovered, the old check is still there: https://github.com/geany/geany/pull/2666/files#diff-c5db35c46946eafa2045f735... whcih then defines `HAVE_ASPRINTF`. Obviously, `main/e_msoft.h` was not used on my system. I did not yet check why (short on time currently).
@b4n pushed 1 commit.
c5303c9499bb6f0d632e0ca5b42e15f84d64ac52 Check for asprintf and tempnam and cleanup function checks
Depending on what the patch you're talking about involves, would you think upstream would accept a mean of selecting which method is used? It actually could make sense to use their hash version when generating tag files so there's no (likely) duplicates, but use plain numbers when generating a TOC, so maybe there could be an option somehow. Just a thought.
The patch is not something too horrible, something like 80LOCs, but not some trivial one-liner so better to post it separaterly. Basically it just simply goes through the tags and renames the anonymous ones the way we need but in addition, we also have to update scopes where these tags are involved.
Updating upstream ctags is an option too, I was just worried about "polluting" ctags with things which are too specific for Geany use.
And even if we kept the current few-line patch, it isn't a too terrible thing either.
@eht16 yay, thanks! I just committed your fix slightly modified to improve overall ordering of the checks, but it should have the same result.
@eht16 however, did the bundled regex got used, or did your MSYS have `regcomp()`? Same question for `fnmathc()`. If MSYS has it, it's less relevant for us to bundle it, and maybe we should have a way of using that in the Travis builds instead?
The patch is not something too horrible, something like 80LOCs, but not some trivial one-liner so better to post it separaterly. Basically it just simply goes through the tags and renames the anonymous ones the way we need but in addition, we also have to update scopes where these tags are involved.
OK, I would have expected something like this indeed. The only possibly worrysome part would be not missing any location that has to be updated, in case tag names get used some other places in the future.
Updating upstream ctags is an option too, I was just worried about "polluting" ctags with things which are too specific for Geany use.
Sure, but maybe having more "readable" anonymous names is something that could benefit other consumers of ctags, even CLI ones. There are other tools that use ctags to construct a TOC, and in a TOC I would argue that a readable name is somewhat better. Though, maybe it's not so true because it's only useful when a human *looks* at the TOC, if the TOC is also used to jump to locations across files boundaries then they need to be unique again. I'm kind of worried about how annoying it'd be to do, but maybe that's even something we'd like to do: transform anonymous names when displaying them, yet keeping a fairly unique internal name behind the scenes to avoid any clashes. Well, that's just thoughts as they are coming :)
And even if we kept the current few-line patch, it isn't a too terrible thing either.
Yeah it's no biggie, and we'll roll with that at least for now.
OK, I would have expected something like this indeed. The only possibly worrysome part would be not missing any location that has to be updated, in case tag names get used some other places in the future.
Exactly, that was my worry too. The patch seems to work for our parsers but one never knows what new parser or update to an existing parser could break.
Sure, but maybe having more "readable" anonymous names is something that could benefit other consumers of ctags, even CLI ones. There are other tools that use ctags to construct a TOC, and in a TOC I would argue that a readable name is somewhat better. Though, maybe it's not so true because it's only useful when a human looks at the TOC, if the TOC is also used to jump to locations across files boundaries then they need to be unique again.
I'm kind of worried about how annoying it'd be to do, but maybe that's even something we'd like to do: transform anonymous names when displaying them, yet keeping a fairly unique internal name behind the scenes to avoid any clashes. Well, that's just thoughts as they are coming :)
I'm already scared only by reading about it :-).
@eht16 yay, thanks! I just committed your fix slightly modified to improve overall ordering of the checks, but it should have the same result.
Confirmed. Your changes work fine !
@eht16 however, did the bundled regex got used, or did your MSYS have `regcomp()`? Same question for `fnmathc()`. If MSYS has it, it's less relevant for us to bundle it, and maybe we should have a way of using that in the Travis builds instead?
``` ... checking fnmatch.h usability... no checking fnmatch.h presence... no checking for fnmatch.h... no ... checking for regcomp... no ... ```
Full build log with all the boring details at https://gist.github.com/eht16/895090eddc23c6d241eb2cbec25159c0#file-build-lo.... At the end of the build log is the result of `make check` with all tests failed :(. But luckily this was just because the sample files have LF line endings and the generated tag files have CRLF. As a quick hack I added `--strip-trailing-cr` to `runner.sh` (https://gist.github.com/eht16/895090eddc23c6d241eb2cbec25159c0#file-runner-s...) and the results looked way better. Still a few tests failed.
I didn't check why the tests fail but here is the log: https://gist.github.com/eht16/895090eddc23c6d241eb2cbec25159c0#file-test-sui...
The failed tests seem to be those which use CRLF for end of line. I suspect some macro isn't correctly defined during windows build and that the Unix implementation is taken.
Obviously, main/e_msoft.h was not used on my system. I did not yet check why (short on time currently).
Do we have the `WIN32` macro defined? See
https://github.com/universal-ctags/ctags/blob/e3505ad72572983190d99cd704e726...
Do we have the WIN32 macro defined? See https://github.com/universal-ctags/ctags/blob/e3505ad72572983190d99cd704e726...
Self-correction: it's in the elif branch and we have config.h so it doesn't matter.
Maybe it would be worth trying to build universal ctags using configure on Windows to see if it works and possibly what macros get defined in config.h. Universal ctags has also a different method of building using mk_mingw.mak makefile
https://github.com/universal-ctags/ctags/blob/master/mk_mingw.mak
so it might be possible something doesn't get defined when using configure.
Good idea. Here is Geany's and Ctags' `config.h` on my Windows system: https://gist.github.com/eht16/e15e74e5f7bffebdf5f4ada38077ea59
I used `./autogen.sh && ./configure && make` on MSYS2 instead of the `mk_mingw.mak`. https://github.com/universal-ctags/ctags/blob/master/docs/windows.rst maybe has two interesting pointers: - `--disable-external-sort is a recommended option for Windows builds.` I used this flag to create the `config.h` above - `If you want to build an iconv enabled version, you must specify WITH_ICONV=yes` this is from the section about `mk_mingw.mak`, the Autotools part automatically detects iconv and on my system CTags was built with iconv support; Geany's `config.h` is missing the `HAVE_ICONV` macro
Fun fact regarding your first question: Geany defines `WIN32` whereas Ctags does not :).
Unless you beat me in time by finding the cause, I'll give the two options above a try at the weekend.
At least for the tags file line ends, I'm starting to suspect it's us, not ctags. The newlines in tags files are written here:
https://github.com/geany/geany/blob/0a0b4b000cf18e837b921057a461ad05a08e41d3...
and it's opened here in text mode:
https://github.com/geany/geany/blob/0a0b4b000cf18e837b921057a461ad05a08e41d3...
At least from this:
https://stackoverflow.com/questions/58500480/why-does-printf-create-windows-...
on Windows it seems `\n` produces CRLF. Would opening the file in binary mode help? The curious thing is that this isn't a new code in this patch so I don't understand how it could have worked before.
I think the rest of the failed tests will be caused by something similar in ctags.
@eht Just curious, do you run `make check` when you make Windows releases? I'm wondering if these are new problems or if they existed before. Also:
1. Would you try opening the file in binary mode and check what tests fail when you do? 2. Would you try changing CRLF line ends in the SQL tests that fail to LF to see if this is the reason for the problem? I checked uctags implementation of file reading and CRLF should be handled there correctly so it could be something parser-specific.
@eht16 Just curious, do you run `make check` when you make Windows releases?
No, I don't. I just ran the checks now with this PR as I thought it might help. The tests fail in the same way for Geany master, so you are absolutely right: it's not new and not related to this PR. Sorry, I should have mentioned this.
1. Would you try opening the file in binary mode and check what tests fail when you do?
No change.
2. Would you try changing CRLF line ends in the SQL tests that fail to LF to see if this is the reason for the problem? I checked uctags implementation of file reading and CRLF should be handled there correctly so it could be something parser-specific.
This actually helps. I converted the test files of the six failures to LF and the tests passed fine.
So I think we should address this separately, maybe after this PR got merged.
No, I don't. I just ran the checks now with this PR as I thought it might help.
The tests fail in the same way for Geany master, so you are absolutely right: it's not new and not related to this PR.
Ctags uses appveyor for CI on Windows so it might be worth considering to use it as well so we can run tests for Windows builds too.
- Would you try opening the file in binary mode and check what tests fail when you do?
No change.
Weird, I don't get it then. This is the place where tags are written and newlines are written using `\n` - so it's a matter of figuring out why `\n` produces CRLF.
- Would you try changing CRLF line ends in the SQL tests that fail to LF to see if this is the reason for the problem? I checked uctags implementation of file reading and CRLF should be handled there correctly so it could be something parser-specific.
This actually helps. I converted the test files of the six failures to LF and the tests passed fine.
Good, at least we know it's the CRLFs. Now only to figure out why. I'm afraid I'm out of ideas so it's your turn with debugging :-(
So I think we should address this separately, maybe after this PR got merged.
Sounds good to me.
No, I don't. I just ran the checks now with this PR as I thought it might help. The tests fail in the same way for Geany master, so you are absolutely right: it's not new and not related to this PR.
Ctags uses appveyor for CI on Windows so it might be worth considering to use it as well so we can run tests for Windows builds too.
We discussed this in #1241. I still don't want to invest time in an appveyor setup. Some time ago I started (and not yet finished) a dockerized MSYS2 setup to cross-build Geany and Geany-Plugins from GIT to release ready installers. The script and Docker container could also be used in a Travis CI step even though my primary target are nightly build installers and especially release installers without booting a Windows machine.
- Would you try opening the file in binary mode and check what tests fail when you do?
No change.
Weird, I don't get it then. This is the place where tags are written and newlines are written using `\n` - so it's a matter of figuring out why `\n` produces CRLF.
I gave it another shot: the binary didn't work because of my other change in `runner.sh` where I added `--strip-trailing-cr` to the diff command. The good news is: without the `--strip-trailing-cr` change, the binary mode actually fixes all the tests except the six failures. The six failures can be fixed easily by converting the test samples to LF, however. So it should be pretty easy to get the tests running on Windows as well.
- Would you try changing CRLF line ends in the SQL tests that fail to LF to see if this is the reason for the problem? I checked uctags implementation of file reading and CRLF should be handled there correctly so it could be something parser-specific.
This actually helps. I converted the test files of the six failures to LF and the tests passed fine.
Good, at least we know it's the CRLFs. Now only to figure out why. I'm afraid I'm out of ideas so it's your turn with debugging :-(
See above :).
So I think we should address this separately, maybe after this PR got merged.
Sounds good to me.
I just opened #2677 for this.
The good news is: without the --strip-trailing-cr change, the binary mode actually fixes all the tests except the six failures.
Great! I think changing the mode to binary is such a trivial change that I could addd it as a commit to this pull request, what do you think?
The six failures can be fixed easily by converting the test samples to LF, however.
Yeah, but it should still be fixed. This could possibly mean that any file using CRLF, which is the default EOL on Windows, would get tags generated incorrectly so they may not appear in the symbol tree. Have you tried e.g. some C file with CRLFs, if the tags show up in the sidebar?
The good news is: without the --strip-trailing-cr change, the binary mode actually fixes all the tests except the six failures.
Great! I think changing the mode to binary is such a trivial change that I could addd it as a commit to this pull request, what do you think?
Sure, why not.
The six failures can be fixed easily by converting the test samples to LF, however.
Yeah, but it should still be fixed. This could possibly mean that any file using CRLF, which is the default EOL on Windows, would get tags generated incorrectly so they may not appear in the symbol tree. Have you tried e.g. some C file with CRLFs, if the tags show up in the sidebar?
No, the line endings of the input files don't matter. Tested with a C and SQL file (random choice) with LF and then again with CRLF line endings and the generated tags are the same. With the binary mode change we would also alte rthe behavior when writing tags files which then always contain only LF. I think this is fine even on Windows. And so we just need to update the six test samples which have CRLF line endings (in contrary to all other test samples).
No, the line endings of the input files don't matter. Tested with a C and SQL file (random choice) with LF and then again with CRLF line endings and the generated tags are the same.
I don't get it. Basically when you take the SQL tests that fail with CRLF line endings and open them in Geany, I would assume that there will be tags missing in the sidebar.
With the binary mode change we would also alte rthe behavior when writing tags files which then always contain only LF. I think this is fine even on Windows.
In fact, since we are considering 2.0 release and making "breaking" changes, I was thinking we could switch to the ctags tag file format for tag file generation, unit tests and the tag files we ship. We could still keep support for reading the "proprietary" binary format we use now but using a more standard text format would be nicer IMO.
@techee I couldn't find in uctags anything that said how it handles non-ascii names, the file format description seemed to still be Bram's 1998 one. If we are gonna switch to another format lets do it to one that is better than the proprietary one because we won't be able to control an external format.
I don't get it. Basically when you take the SQL tests that fail with CRLF line endings and open them in Geany, I would assume that there will be tags missing in the sidebar.
@techee IIUC the problem is with *tag files* that have CRLF line endings, not source files getting parsed. If that's so, it won't change nothing there, at worse we need special handling to support CRLF and LF not to break compatibility.
@elextr not sure how it handles non-ASCII, but it might be plain bytes, so the output encoding is whatever the input is was -- and doesn't handle non-ASCII-compatible support anyway. And yes, it's the "old" vi format, but augmented with a lot of extra fields in a compatible manner.
Your concerns about non-ASCII is a good one, but AFAICT we'd have the same problem, or even worse with the proprietary one, I don't think we do much conversion when generating tag files. OTOH, we should be happy with UTF-8 ones as we mostly (?) use that internally. I'm actually not sure how we deal with non-"real time" parsing, I'm afraid we still feed in the file itself, which means no encoding conversion.
So I'm all for @techee's idea of putting the binary format to soft retirement, but indeed now might be a good time to sit down and think on possible issues it should tackle.
@b4n all tags files probably need to be UTF-8 since the symbols we generate in real-time are from the "always UTF-8" Scintilla buffer, that way symbols from both will match.
The basic ctags tab separated format should be ok with UTF-8 IIUC. (so long as we are sure its UTF-8 :)
IIUC uctags is also developing a JSON format but its not finalised yet.
And I'm certainly not supporting tagmanager format, its only plus is that we can modify it.
@techee IIUC the problem is with tag files that have CRLF line endings, not source files getting parsed. If that's so, it won't change nothing there, at worse we need special handling to support CRLF and LF not to break compatibility.
Maybe I miss something but that's not my impression with the failed SQL tests - it's the tests themselves that have CRLFs in the code and the tag file diff is this based on Enrico's output. There are several tags missing and I think the missing tags are caused by the CRLFs in the test:
``` --- ./ingres_procedures.sql.tags 2016-06-12 17:16:40.683765400 +0200 +++ /tmp/tmp.2xGTegTO88/test.sql.tags 2020-11-25 23:33:27.945656500 +0100 @@ -1,7 +1,3 @@ # format=tagmanager db0001Ì256Ö0 -db0002Ì256Ö0 -db0003Ì256Ö0 -errÌ16384Ö0 -nÌ16384Ö0 xÌ16384Ö0 FAIL ingres_procedures.sql.tags (exit status: 2) ```
So I'm all for @techee's idea of putting the binary format to soft retirement, but indeed now might be a good time to sit down and think on possible issues it should tackle.
Agreed - two things that come in my mind: 1. We first need to synchronize kind letters with ctags so we interpret kinds from ctags correctly 2. We store our own information to the tags like TMTagType which are useful e.g. for unit tests and we should also be able to do this for the ctags tag format - I think we could introduce our own extension fields for this.
@techee actually I re-read @eht16 comments and I'm confused now. I based my remark on
No, the line endings of the input files don't matter. Tested with a C and SQL file (random choice) with LF and then again with CRLF line endings and the generated tags are the same.
But that is not coherent with what I understand of the filing tests a couple messages above. So… I'm now just confused on what works and what doesn't.
At any rate I agree that we cannot simply change the input files so the tests pass, if these files are valid input files, we must accept them -- and as you say, we'd have the same issue with the internal buffer that does have the actual file's line ends. I'll try to make a few tests with CRLF vs LF on my Linux box and see if the parser could choke on either, and possibly even try and craft "fake" Windows line end converter and see if it changes anything.
Anyhow, if there are no new failures, I agree we could merge this and figure them out later. I'd like a summary though, because I'm now feeling confused on the state of this on Windows :)
2. We store our own information to the tags like TMTagType which are useful e.g. for unit tests and we should also be able to do this for the ctags tag format - I think we could introduce our own extension fields for this.
Is there any use of that field apart for testing our mappings? Anyhow, we should not *depend* on this field, but we sure can add any one we like for our own purposes, we just cannot really depend on that field being present.
Is there any use of that field apart for testing our mappings? Anyhow, we should not depend on this field, but we sure can add any one we like for our own purposes, we just cannot really depend on that field being present.
No, there should be no need for it apart from the unit tests and we shouldn't add them e.g. when generating tag files (which I'm afraid we won't be able to drop completely until ctags has all the parsers we have - and there are several of them in the infamous c.c parser).
Anyhow, if there are no new failures, I agree we could merge this and figure them out later. I'd like a summary though, because I'm now feeling confused on the state of this on Windows :)
It's probably quite independent of this pull request as the problem existed before based on Enrico's comments. But I'm confused too.
s/a https://github.com/geany/geany/issues/2677#issuecomment-736096680 (I didn't know whether to post here or there, so…)
Guys, don't panic :). Maybe I wasn't clear above: I do *not* want to change the line endings of any input files! In my tests, it didn't matter if the input file had LF or CRLF line endings, the generated tags were correct.
I was just talking about converting the unittest tags files against we compare the generated tags files.
I agree that the diffs look confusing but as said earlier with the binary mode change for writing tags files, we ensure that the generated tags files always have LF line endings and so IMO it should be fine to also convert the pre-generated tags files for the unittests. To me, it doesn't seem as we have a major problem on Windows with the tags files.
Merged #2666 into master.
Wait, what? This has been merged in the next decade!
More seriously @techee, thanks so much for your work and patience on this. :bow:
Big Yey! delayed by several month of being too busy on the one hand and too scared on the other hand to open this message in my inbox fearing there would be another batch of review comments requiring another tens-of-patches-long pull requests :-)
Thanks for all your time reviewing this @b4n!
github-comments@lists.geany.org