This pull request isn't meant to be merged, it should just serve as a discussion place which parsers apart from those in #2990 to select for syncing. I'm not so familiar with the state of Geany parsers and if there are some patches that should be first applied to uctags before we can grab the uctags version, it would be nice if people who know more than me (@b4n @eht16 @elextr and others) could help with this.
Just to start somewhere, I included a few parsers to this commit: 1. html and go parsers - these are those I initially wrote and I think their development happened in uctags 2. diff, jscript, make, objc, rust - there don't seem to be too many changes in these parsers and the uctags version seems to be a superset of the geany version
I went through all the parsers and divided them into 2 categories:
Parsers where uctags implementation is similar to the Geany implementation: - asm - basic - diff (already included) - fortran - go (already included) - html (already included) - jscript (already included) - lua - make (already included) - objc (already included) - pascal - perl - php - ruby - rust (already included) - sql - nsis
Parsers which differ a lot: - python (but here I assume the uctags version is the Colomban's new python parser we could grab) - iniconf - matlab - r - sh - tcl - tex - verilog - vhdl - and of course all the languages from c.c
Apart from python, the languages from the first group seem to be the most obvious candidates for syncing. So does anyone have a better knowledge of the state of these parsers?
You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/2991
-- Commit Summary --
* <a href="https://github.com/geany/geany/pull/2991/commits/8ab2e0e68ce33fd0a883c8f4f778bf6233882dc9">Remove the geany_ prefix from parsers that seem to be good candidates for syncing</a> * <a href="https://github.com/geany/geany/pull/2991/commits/425d350014e065f79eb2af863e0516d0cb1d26a5">Copy over selected parsers from uctags so we can see the diffs</a>
-- File Changes --
R ctags/parsers/diff.c (93) D ctags/parsers/geany_go.c (831) A ctags/parsers/go.c (1416) R ctags/parsers/html.c (168) R ctags/parsers/jscript.c (462) R ctags/parsers/make.c (171) R ctags/parsers/objc.c (333) R ctags/parsers/rust.c (95)
-- Patch Links --
https://github.com/geany/geany/pull/2991.patch https://github.com/geany/geany/pull/2991.diff
Warning Brain dump:
Of the first list the only one I would suspect may have been modified in Geany independent of ctags is Lua.
Of the second list I'm pretty sure Verilog and VHDL were added to ctags then brought into Geany but probably have developed from that initial version so the ctags version should be good.
`c.c` -- runs screaming from the room. If languages in c.c have separate parsers in ctags we should import them instead (eg the new c/c++ parser) and only keep c.c for those languages that do not have ctags parsers (if any).
I presume that any parser not in the lists is the same in both Geany and ctags so won't need updating?
Of the first list the only one I would suspect may have been modified in Geany independent of ctags is Lua.
Of the second list I'm pretty sure Verilog and VHDL were added to ctags then brought into Geany but probably have > developed from that initial version so the ctags version should be good.
Thanks, let's see if others have some more information.
c.c -- runs screaming from the room. If languages in c.c have separate parsers in ctags we should import them instead (eg the new c/c++ parser) and only keep c.c for those languages that do not have ctags parsers (if any).
After #2984 is merged, I'd like to open a pull request switching us to the new c/c++ parser from uctags so these 2 most terrible languages (parsing-wise) should be eliminated from c.c. The rest of course remains. I remember the pre-babies version of Colomban talking about about separating those parsers from c.c but I suspect this task is delayed by about 15 years now ;-).
Anyway, the purpose of this PR is to switch to those parsers where we know we can just grab the uctag version - we'll have to deal with the rest later parser by parser.
I presume that any parser not in the lists is the same in both Geany and ctags so won't need updating?
Those with minimal or no changes are in PR #2990
I'm here because I searched python ctag in pull requests to see if it was being addressed anywhere after being told that the GDScript ctags file that I adapted from the python ctags file I found in the geany repository was actually based on a decrepit version of the python parser. I intend to drag in the uctags python parser right after I'm finished redoing my GDScript alterations to the newer version of the python parser in ctags, unless someone else is doing this already.
I just took the parsers from the first group plus python and tried them against our unit tests to see what happens. Fortran seems to produce different output in many situations so I dropped it for now. The asm parser uses the cpreprocessor.c/h that the new cxx parser uses and depends on some things from the new cxx parser so I dropped asm from now too. Finally, the new python parser generates different tags for imports and we should make some modifications in Geany to adjust for that so it's dropped for now too.
The rest of the parsers seem to work fine and the updates in unit tests seem to do the right thing. I pushed the updates to unit tests language by language including comments about the changes to simplify review.
I can of course drop some parser from this pull request if someone knows some updates were made to our parser that should go upstream first.
To summarize, this is the result of this pull request:
Parsers included here: - basic - diff - go - html - jscript - lua - make - objc - pascal - perl - php - ruby - rust - sql - nsis
Different parsers (longer-term stuff): - iniconf - matlab - r - sh - tcl - tex - verilog - vhdl - fortran - c.c
Parsers for which I will make separate PRs soon: - python (needs some more adjustments) - cxx - asm (depends on cxx)
@Davidy22 I'll prepare a separate PR for python soon - not sure if the version I use for all the parsers here from the end of October is new enough for you but when the python parser is in, it should be no problem to update it to the version you need.
I've already been muddling around with my modified for gdscript python parser so I think I have a handle on the tags in it. I can update my gdscript PR in a bit so you can yoink the not-gdscript file changes maybe
@Davidy22 Are all your changes submitted here?
https://github.com/universal-ctags/ctags
The point of all these pull requests is that we don't have any diffs of our own against universal ctags and all changes have to go there first and we just copy the corresponding parsers to Geany.
Yeah the one on the ctags repo is my latest work, haven't updated my PR here in a bit, probably will once I get merged in lexilla and ctags, my open PRs linked in the pull request here. Geany does have a little extra layer of mappings that I fiddled to make the symbol table on the left show the right things
Then I'd suggest to wait until the PR with python I'm planning to make lands in Geany so we don't have to deal with 2 different parsers at the same time.
Applied #2990 and #2991 to Geany at https://github.com/geany/geany/commit/eabc09a5591e8f2efd95ca136f5870c8dcba54..., works for me.
`make check` passes, just a query, are there any more test files we can copy from ctags?
Also checked any filetypes I happened to have a source file lying around for, and knew enough to check what should be generated, none seem to have any different failures from a version of Geany before they were applied.
If nobody beats me or objects I'd be happy to push #2990 and #2991 in a week (or so).
Oh my work is on a new parser for a language with similarities to python, no worries about colliding implementations
I had a quick look on the changes and this looks good, we'll get some more new tag types and probably the newer uctags code is less buggy than the current ones.
Though at least in the Pascal parser we will lose two features: - parsing "type" definitions (IIRC "type" in Pascal is somewhat similar to "structs" in C) - parsing argument list of functions
This two features are not in the uctags Pascal parser. I don't know where they came from, maybe we added them ourselves in the past.
Since these could be useful for others as well, it might be best to push these two upstream to uctags, so we'll get them back with the next uctags update.
Until then, it might be ok to have the two features removed (which will also solve #2987) by this PR and get them back with the next uctags sync.
An alternative could be to remove the Pascal parser from this PR and merge #2987 in the meantime.
I prefer the first option: use the uctags Pascal parser with this PR.
What do you think?
maybe we added them ourselves in the past.
Is it from fixing the crash on typing "type"?
But anyway it would be nice to push improvements upstream.
I would agree that using the uctags parser is the preferable option instead of drawing out the alignment to upstream even further. Out of interest, do Pascalians get any new features with the uctags parser in return for losing "type" for a release or so?
maybe we added them ourselves in the past.
Is it from fixing the crash on typing "type"?
Well, "fixing the crash on typing "type"" is the consequence of the feature that "type" is parsed at all which seems to be an exclusive Geany feature. though not sure if this is what you asked :D.
I would agree that using the uctags parser is the preferable option instead of drawing out the alignment to upstream even further. Out of interest, do Pascalians get any new features with the uctags parser in return for losing "type" for a release or so?
Nope, the parsers seems equal apart from the "type" tag type and the argument list parsing.
though not sure if this is what you asked :D.
It is what I asked, which just exposes my confusion :-)
Nope, the parsers seems equal apart from the "type" tag type and the argument list parsing.
Ok, probably should add to release news something like "Pascal parser has been upgraded to align with ctags, the Geany specific extensions of 'type' and argument list parsing have been pushed upstream and will return next update."
Unless the next Geany release takes so long that changes pushed upstream are accepted in a new parser beforehand and we can import it. Having synchronised with uctags at that stage importing should be simple if the symbols.c handling of those features is left in place.
Though at least in the Pascal parser we will lose two features:
I just made a diff of the 2 implementations and our pascal parser is a clear superset of the uctags version (nothing dramatic though) so I'd rather keep our parser and then introduce our changes upstream and drop pascal from this PR. This means you can just merge #2987 because this parser would stay for now.
make check passes, just a query, are there any more test files we can copy from ctags?
I think they have many more in uctags. In Geany though I think we don't even have to care about the parsers themselves (those are tested in uctags already) but rather the mapping to our internal representation (which is revealed in our tags files). So it's only important that all the tag types we support are generated by the tests.
Though at least in the Pascal parser we will lose two features:
I just made a diff of the 2 implementations and our pascal parser is a clear superset of the uctags version (nothing dramatic though) so I'd rather keep our parser and then introduce our changes upstream and drop pascal from this PR. This means you can just merge #2987 because this parser would stay for now.
Ok. Let's go this way. I'll try to prepare PRs for uctags with the two missing features. If/When they are merged, we can sync the Pascal parser.
@techee pushed 2 commits.
4589ce6b217c38ebbefd33d8a23333c7e401cf81 Remove pascal parser 0de1b5d5aa398264b63b9a2a451bbf7f0f85b97a Revert "Update pascal unit tests"
Ok. Let's go this way. I'll try to prepare PRs for uctags with the two missing features. If/When they are merged, we can sync the Pascal parser.
Cool, I was going to prepare the PR myself - you were too fast :-)
I just reverted the parser back to the original one (if desired, I can squash it with the previous commits).
@techee pushed 2 commits.
0ae4e82883de33e23d9fa5ffadb0b2568afce062 Add uctags iniconf parser 2a0152162a9ba4a785de67351f9a01883eaf79e2 Update tag mappings for the iniconf parser
Ok. Let's go this way. I'll try to prepare PRs for uctags with the two missing features. If/When they are merged, we can sync the Pascal parser.
Cool, I was going to prepare the PR myself - you were too fast :-)
:)
However, I only PR'ed the argument list parsing feature. While preparing tests for the "type" parsing, I noticed the current code is quite buggy and incomplete. It only parses very easy type declarations but it probably fails for real world uses. For example: ```pascal type TTest=class procedure Test1; procedure Test2; procedure Test3; end; ``` Will create a tag for "TTest" but also for the procedures "Test2" and "Test3" which are wrong. And "Test1" is missing for some reason (fun fact: even our test case expects that broken behavior in https://github.com/geany/geany/blob/master/tests/ctags/bug612019.pas.tags).
I guess the parsing of "type" declarations needs to implemented in a more sophisticated way. At least I don't want to submit it upstream in the current state. Since I don't want and can't fix the implementation, we maybe should just remove that particular feature instead of keeping an incomplete implementation.
Since I don't want and can't fix the implementation, we maybe should just remove that particular feature instead of keeping an incomplete implementation.
Yep, agreed.
I just added the iniconf parser to this pull request too - after looking at it in more detail, there are just extra things added and there are no changes in our unit tests.
@techee pushed 1 commit.
4fb3cddcf156268f14538c35b6d0dc8ab68ce569 Add a way to modify scope information if needed and use it to unify php separators
@techee pushed 1 commit.
8a2908835416aa307dcb022a38b6e3e1dbc9be5b Remove comment regarding ruby scope separator
@techee can #2990 and #2991 be merged, or have you more fiddling to do?
@techee can #2990 and #2991 be merged, or have you more fiddling to do?
I think #2990 is OK to be merged (just added a comment there regarding what I did).
I would still wait with this one. First, I guess merging #2990 will cause some merge conflicts here so these will have to be resolved first.
Second, there is the problem with the go parser I mentioned in https://github.com/geany/geany/pull/3032#issuecomment-983476305 and subsequent comment and I'd like to know the opinion of @masatake.
Also, someone might want to have a look at the patch I added yesterday here to rewrite PHP scope information if I didn't do something stupid there.
@techee pushed 1 commit.
7810a5b6f2b77268b38c7d188f416e793257a6d8 Add comment explaining why we modify PHP scope separators
@techee pushed 1 commit.
d49f3e9024166c82671bbcc12d009f497789915d Merge branch 'master' into nontrivial_parser_sync
@techee pushed 2 commits.
ceaef0dcf995e67ebb7c937bbc2ddf1411132aea Add a mechanism to enable/disable roles for certain kinds 75466c46b5371a887cdc491a84189a2f5cd88441 Move Go parser to the list of languages returning full scope information
@elextr I think I'm more or less done here. Merge conflicts are resolved, the scope separator problem of PHP should be fixed now and the Go problem from https://github.com/universal-ctags/ctags/issues/3211 too. All the bugs introduced by me are your responsibility now :-P.
All the bugs introduced by me are your responsibility now :-P.
Ok, so long as you fix #3032 to add all the new capabilities to Geany :-)
Will try this on the weekend and merge next week if it WFM, giving others time to try it too.
@techee pushed 1 commit.
940012d1466979a610600932c3262eac2dcc9bd3 Add a mechanism to enable/disable some ctags kinds
Ok, so long as you fix #3032 to add all the new capabilities to Geany :-)
Enabling some of the kinds may be much easier than making sure nothing breaks here ;-)
I added one more patch here to allow us to enable/disable certain kinds in ctags - will help to resolve the cython problem in #3031 until it's fixed in ctags and will allow us to easily handle such problems in the future.
@techee pushed 1 commit.
77723b6df7a466f2e4647988634251a6071271fe Add LUA to the list of parsers returning full scope
@techee pushed 1 commit.
70bad89aa7f63f47d2b0b39cc887083296733376 Enable/disable kinds in ctags based on whether we use them or not
Sorry for the last-minute push I just did, I just got an idea we could enable/disable kinds automatically based on whether we use them for the given language or not instead of enabling them all and this is just much better than hard-coding some kinds when there's some problem for them. This could also speed up parsing as the corresponding code paths can be skipped in individual parsers (and potentially reduces crashes because less code from the parser will be executed). This should fix the problem with cython parameters automatically as we don't use the 'z' kind.
Merged #2991 into master.
Still WFM, tests pass, merged before @techee can sneak more in :-)
github-comments@lists.geany.org