Related to #2615.
I don't know how good the newly generated tags are, my PHP knowledge is basically non-existent anymore :). At least, the diff looks like some were added, removed and changed, some keep even unchanged. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/3488
-- Commit Summary --
* Port create_php_tags to Python3 and generate new PHP tags file
-- File Changes --
M data/tags/std.php.tags (10332) M scripts/create_php_tags.py (27)
-- Patch Links --
https://github.com/geany/geany/pull/3488.patch https://github.com/geany/geany/pull/3488.diff
When #3039 is merged, I would adjust the script to generate ctags file format and share the code with the Python tags script.
@cwendling commented on this pull request.
I don't know much PHP anymore either, but at least the script seems to still work, the difference seem to be in the JSON itself (I only checked a couple things, but they match). Sadly the JSON seems to have a few flaws with the types in signatures (no alternative types, and it seems to use the last one which sometimes is `null` -- that used to be `mixed` in the PHP docs, but now is `foo|bar|baz`), but then again nothing the script can do with that data so I'd think it's fine.
(arg_list, TA_ARGLIST),
(return_type, TA_VARTYPE), (scope, TA_SCOPE)]: if attr is not None: - tag_line += '{type:c}{attr}'.format(type=type, attr=attr) + tag_line += f'{type_:c}{attr}' + print(tag_line)
new debugging, is that wanted?
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
why ISO 8859-1? This seems fully ASCII currently, is this encoding documented somewhere? I can't seem to find out anything in a response header or something.
@elextr commented on this pull request.
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
Isn't it the default encoding of PHP? So its what the PHP symbols will be. But tagmangler is a binary format, maybe it should be a binary file?
Is there a reason that Python tags are in ctags format and php tags in tagmanager?
@eht16 pushed 1 commit.
721550ca76caa155dc3ea2c7e0edb4710ef6c7e9 Port create_php_tags to Python3 and generate new PHP tags file
@eht16 commented on this pull request.
(arg_list, TA_ARGLIST),
(return_type, TA_VARTYPE), (scope, TA_SCOPE)]: if attr is not None: - tag_line += '{type:c}{attr}'.format(type=type, attr=attr) + tag_line += f'{type_:c}{attr}' + print(tag_line)
Oops, thanks for spotting. Just removed it.
@eht16 commented on this pull request.
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
The content parsed from the JSON is (probably) just ASCII. But the `TA_*` markers are not ASCII and are encoded in `iso-8859-1`, also for other files using this tagmanager format.
I just added it on writing the file to explicitly set it, as Python3 `open()` wants the `encoding` argument.
Is there a reason that Python tags are in ctags format and php tags in tagmanager?
Before #3039, Python tags were also in tagmanager format and so are the PHP tags.
As said in https://github.com/geany/geany/pull/3488#issuecomment-1537369732, I would switch the format to ctags for the PHP tags as well. But I'd like to make this iterative to avoid a huge "I do it all in one"-PR :).
@b4n commented on this pull request.
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
Aah, it's the output file, right. Well, OK then -- although I guess I'd open it binary (if it makes a difference)
@eht16 commented on this pull request.
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
It does not. Tried it and the resulting file is identical. In the generating script we had to use byte strings for writing to the binary yopened file and need to encode the generated tag line anyway. So, I guess the remaining difference could be performance, maybe the one is 0.5ms faster than other. For me, the difference cannot be big enough to care if the script is executed every few years :).
@elextr commented on this pull request.
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
All the symbols are ASCII and tagmanager only uses a few low byte values (which are valid ASCII) IIRC so I guess the encoding won't matter for this purpose.
When PHP adds emoji to its symbols it might be different :stuck_out_tongue:
@b4n commented on this pull request.
# write tags script_dir = dirname(__file__) tags_file_path = join(script_dir, '..', 'data', 'tags', 'std.php.tags') - with open(tags_file_path, 'w') as tags_file: + with open(tags_file_path, 'w', encoding='iso-8859-1') as tags_file:
So, I guess the remaining difference could be performance, maybe the one is 0.5ms faster than other. For me, the difference cannot be big enough to care if the script is executed every few years :).
Nah, clearly that doesn't matter at all :)
Merged #3488 into master.
github-comments@lists.geany.org