This is a followup of #2630 to fully port the `scripts/create_py_tags.py` script for generating tags for the Python standard library to Python 3.
While continuing on @claudep's work, I noticed plain porting is harder than to more or less rewrite the script. Now the script works by fully importing the modules, if possible, to use Python's `inspect.Signature` API to extract symbols. If this is not possible, the existing regular expression based parser is used as fallback.
Deprecated modules are ignored completely as well as a couple of special modules like the included Idle IDE and executable modules in general.
I'm using the resulting tags file since a few weeks and it feels fine, much better than before especially because of the better extracted argument lists of functions and methods. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/3039
-- Commit Summary --
* Refs #2615 - Convert create_py_tags.py to Python 3 * Rewrite Python standard library tags creation script for Python 3
-- File Changes --
M data/tags/std.py.tags (27660) M scripts/create_py_tags.py (486)
-- Patch Links --
https://github.com/geany/geany/pull/3039.patch https://github.com/geany/geany/pull/3039.diff
The question now is whether not to switch to the ctags file format for the languages which have identical implementation (and therefore kind letters) both in Geany and ctags. I don't know what the script does exactly but if we switched to the ctags file format, wouldn't it be sufficient to run the ctags binary on the corresponding directory with sources?
The "proprietary" half-binary format is still useful for our unit tests since it already contains ctags kinds mapped to our internal representation so we can verify this mapping is done correctly. But for the tag files shipped with Geany I think we are more or less ready to switch to the ctags file format and suggest users to use ctags to generate it.
ctags file format and suggest users to use ctags to generate it.
What about the other languages in `c.c`?
What about the other languages in c.c?
Yes, vala for instance is missing and those will still have to be generated by Geany if users want them. But all the parsers for the tags under `geany/data/tags` will be the upstream ones.
Maybe to clarify - as outlined here https://github.com/geany/geany/pull/3049#issuecomment-991886155 I would suggest switching to the ctags file format. That however doesn't mean that we have to necessarily use ctags to generate such files. We can still use Geany or whatever scripts to write the tag files, just in a different format (I think Geany processes includes and parses the included files automatically which I think ctags doesn't do and might be shame to lose that functionality). The ctags file format is pretty simple and generating it ourselves shouldn't be hard to do.
@eht16 pushed 2 commits.
916cefdc04684568bab968085551e337b2afdea1 Remove unused code c1387f831dc24eb58fa228839cd1de19dfacaeff Change the generated tags file to ctags format
I followed your suggestion and changed the output format of the tags file to ctags. In my tests the tags worked but I don't know the format that well, so it would be cool if you could spend a look at it, @techee.
I followed your suggestion and changed the output format of the tags file to ctags.
Well, it isn't something I'm one hundred percent sure we should do, but rather something I wanted to discuss. Also, I had something different in mind - to use `ctags` directly to generate the tag files instead of doing it by ourselves in the script (so there wouldn't be the need for messing with the ctags file format on our side). I haven't checked what exactly the script does and whether something like this would be possible though - what do you think?
Also, if we want to use the ctags format, we should merge https://github.com/geany/geany/pull/3049, otherwise not all the fields are parsed correctly.
To the topic of pros/cons of using the ctags file format, these are the advantages I can think of: * we could use `ctags` directly to generate tag files as mentioned above * currently the tagmanager format doesn't escape characters 200-215 which could break tag file parsing (it is fixable though) * `ctags` file format is "standard" while the tagmanager format is "proprietary" to geany (and also binary which isn't very nice)
On the other the cons of the ctags format are: * the tag files are bigger * they are slower to parse * command line `ctags` may be less flexible in generating tag files than some specific-purpose script * if Geany ctags is out of sync with the `ctags` command-line that produces tags, we may not be able to read all of the tags
@eht16 pushed 3 commits.
a4d49ce4d82040109168b995f76638c3e81a39eb Do not set the base class as parent erroneously in tags 6e326ffc904e42ac4fff21794f2225701e1e8da0 Use kind "member" for methods 0546fef7d1eb4fe35a6fe0e4e3ce979f681f9522 Update docs, ignored modules and classes to Python 3.11
I followed your suggestion and changed the output format of the tags file to ctags.
Well, it isn't something I'm one hundred percent sure we should do, but rather something I wanted to discuss. Also, I had something different in mind - to use `ctags` directly to generate the tag files instead of doing it by ourselves in the script (so there wouldn't be the need for messing with the ctags file format on our side). I haven't checked what exactly the script does and whether something like this would be possible though - what do you think?
Regarding whether to create the Python with ctags instead of this script: I gave it a try and there are a couple of differences and problems with ctags: - ctags will find way more tags, many tags we are not interested in for a global tags file like private methods and special methods (`_*` and `__*`) and variables. Those could be filtered out afterwards though. - ctags will add the path and search pattern or line numbers of the source file which doesn't make sense for global tags. Those could be filtered out afterwards though. - Classes found by ctags have no signature (the one of the corrsponding `__init__` method) while the ones of my script have - ctags will include deprecated tags as well while my script filter them out (even more than the manually defined ones)
Overall, for me, the generated tags of the script look cleaner and more sane than the ctags ones.
For reference, the ctags command I tried: ``` ctags \ --exclude=encodings \ --exclude=dist-packages \ --exclude=distutils \ --exclude=idlelib \ --exclude=ensurepip/_bundled \ --exclude=test \ --exclude=Tools \ --exclude=turtledemo \ --exclude=site-packages \ --exclude=turtle.py \ --exclude=asyncio/windows_utils.py \ --exclude=asyncio/windows_events.py \ --exclude=antigravity.py \ --exclude=ctypes/wintypes.py \ --recurse \ --languages=Python \ --excmd=number \ --totals=extra \ --fields=+tS /home/enrico/.pyenv/versions/3.10.8/lib/python3.10 ```
While playing with the ctags command, I noticed I erroneously set a class' base as parent which is wrong in this context and methods used the wrong kind. Those are fixed.
Also, if we want to use the ctags format, we should merge #3049, otherwise not all the fields are parsed correctly.
+1
To the topic of pros/cons of using the ctags file format, these are the advantages I can think of:
* we could use `ctags` directly to generate tag files as mentioned above * currently the tagmanager format doesn't escape characters 200-215 which could break tag file parsing (it is fixable though) * `ctags` file format is "standard" while the tagmanager format is "proprietary" to geany (and also binary which isn't very nice)
On the other the cons of the ctags format are:
* the tag files are bigger * they are slower to parse * command line `ctags` may be less flexible in generating tag files than some specific-purpose script * if Geany ctags is out of sync with the `ctags` command-line that produces tags, we may not be able to read all of the tags
I'd prefer the ctags format because, as you say, it's the standard format and probably less error prone than the custom tagmanager format.
I'd prefer the ctags format because, as you say, it's the standard format and probably less error prone than the custom tagmanager format.
OK, it probably makes sense to use the python script also because of all the additional problems you mentioned.
Classes found by ctags have no signature (the one of the corrsponding __init__ method) while the ones of my script have
Curious about this one - how does it behave when there are multiple corresponding `__init__` functions with a different signature? Will it pick just one of them for calltip? I'm asking because we now have this code
https://github.com/geany/geany/blob/8f35d3342df724145ee9a6873e4ed3a18446211d...
which can look up all `__init__` functions for a class and display a multi-calltip (with arrows on the side to scroll among the found calltips) containing all the constructors.
One more thing - wouldn't it make sense to factor-out the tag writing code to a separate file so it can be reused by other tag writing scripts? For instance, there's also `create_php_tags.py` which I think could reuse this code too. And maybe this tag-writing code could be configurable to either output the ctags format or the tag manager format - I can imagine that having the tagmanager format could be useful for debugging. What do you think?
Classes found by ctags have no signature (the one of the corrsponding **init** method) while the ones of my script have
Curious about this one - how does it behave when there are multiple corresponding `__init__` functions with a different signature? Will it pick just one of them for calltip? I'm asking because we now have this code
https://github.com/geany/geany/blob/8f35d3342df724145ee9a6873e4ed3a18446211d...
which can look up all `__init__` functions for a class and display a multi-calltip (with arrows on the side to scroll among the found calltips) containing all the constructors.
In Python, there is no point in having multiple `__init__` methods. While technically possible, it makes no sense because the latter method overrides the previous one. The script here would probably pick one of them, I don't know which one, it is decided by the `inspect` library.
One more thing - wouldn't it make sense to factor-out the tag writing code to a separate file so it can be reused by other tag writing scripts? For instance, there's also `create_php_tags.py` which I think could reuse this code too. And maybe this tag-writing code could be configurable to either output the ctags format or the tag manager format - I can imagine that having the tagmanager format could be useful for debugging. What do you think?
Sure we can do that. But IMO both ideas would be better handled in seperate PRs to not blow this one even more.
In Python, there is no point in having multiple __init__ methods. While technically possible, it makes no sense because the latter method overrides the previous one. The script here would probably pick one of them, I don't know which one, it is decided by the inspect library.
Ah, OK, I thought you could have `__init__(self, a)` and `__init__(self, a, b)` but after checking now, there can only be one `__init__()` in python.
Sure we can do that. But IMO both ideas would be better handled in seperate PRs to not blow this one even more.
Yeah, sure.
In my tests the tags worked but I don't know the format that well, so it would be cool if you could spend a look at it, @techee.
I just had a look and it looks good to me.
I just cleaned the commit history and would like to merge this in a few days if there are no objections.
@b4n commented on this pull request.
Not tested or properly reviewed, but I trust you :)
# If called without command line arguments, a preset of common Python libs is used.
# # WARNING -# Be aware that running this script will actually *import* modules in the specified directory +# Be aware that running this script will actually *import* all modules given on the command line # or in the standard library path of your Python installation. Dependent on what Python modules # you have installed, this might not be want you want and can have weird side effects.
[…] what* you want […]
not that it changed in this PR though
@@ -8,296 +7,368 @@
# # This script should be run in the top source directory. # -# Parses all files given on command line for Python classes or functions and write -# them into data/tags/std.py.tags (internal tagmanager format). +# Parses all files in the directories given on command line for Python classes or functions and +# write them into data/tags/std.py.tags (internal tagmanager format).
looks like it's not in tagmanager format anymore, is it?
@eht16 pushed 1 commit.
706ee56f0f0b2c09744c380b033b2ff44682a95e Fix doc typos
@eht16 commented on this pull request.
# If called without command line arguments, a preset of common Python libs is used.
# # WARNING -# Be aware that running this script will actually *import* modules in the specified directory +# Be aware that running this script will actually *import* all modules given on the command line # or in the standard library path of your Python installation. Dependent on what Python modules # you have installed, this might not be want you want and can have weird side effects.
Amazing, someone actually read the docs :). Thank you for spotting, fixed.
@eht16 commented on this pull request.
@@ -8,296 +7,368 @@
# # This script should be run in the top source directory. # -# Parses all files given on command line for Python classes or functions and write -# them into data/tags/std.py.tags (internal tagmanager format). +# Parses all files in the directories given on command line for Python classes or functions and +# write them into data/tags/std.py.tags (internal tagmanager format).
Thank you for spotting, fixed.
Merged #3039 into master.
github-comments@lists.geany.org