This is a third incarnation of PR to add support for [Kotlin language](https://kotlinlang.org/). Previous discussions can be found in https://github.com/geany/geany/pull/2778 and https://github.com/dolik-rce/geany/pull/1.
Included changes: - Small modification of update-ctags.sh, so it can import peg based parsers from ctags. - Pull Kotlin parser from ctags and integrate it into tagmanager. - Rename filetypes.Kotlin.conf to filetypes.kotlin, to make it work correctly with tagmanager. - Added test for kotlin tags.
Disclaimer: The parser was written (and is maintained) by me. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/3034
-- Commit Summary --
* Add support for Kotlin tags
-- File Changes --
M ctags/Makefile.am (11) A ctags/parsers/peg/kotlin.c (24629) A ctags/parsers/peg/kotlin.h (22) A ctags/parsers/peg/kotlin_post.h (123) A ctags/parsers/peg/kotlin_pre.h (69) A ctags/parsers/peg/peg_common.h (149) M data/Makefile.am (2) R data/filedefs/filetypes.kotlin (2) M scripts/update-ctags.py (12) M src/filetypes.c (1) M src/filetypes.h (1) M src/symbols.c (12) M src/tagmanager/tm_parser.c (12) M src/tagmanager/tm_parser.h (1) M src/tagmanager/tm_parsers.h (5) M tests/ctags/Makefile.am (1) A tests/ctags/kotlin.kt (22) A tests/ctags/kotlin.kt.tags (12)
-- Patch Links --
https://github.com/geany/geany/pull/3034.patch https://github.com/geany/geany/pull/3034.diff
@dolik-rce pushed 1 commit.
84b02954c49b891c2bba86e87d189b30fbeab8f2 Add support for Kotlin tags
@techee commented on this pull request.
@@ -13,12 +13,18 @@
dstdir = os.path.abspath(sys.argv[2])
os.chdir(dstdir + '/parsers') -parser_dst_files = glob.glob('*.c') + glob.glob('*.h') +parser_dst_files = glob.glob('**/*.c') + glob.glob('**/*.h')
Maybe I would slightly prefer mirroring the ctags directory structure - i.e. having peg outside the parsers directory. That would mean explicitly mentioning `peg` and `parsers` directories in the globs and in the code below eliminating the ``` if os.path.exists('parsers/' + f): shutil.copy('parsers/' + f, dstdir + '/parsers/' + f) ``` part.
@techee commented on this pull request.
@@ -0,0 +1,12 @@
+# format=tagmanager +AbstractWorker�32�com.example.helloworld�0 +Global�1�com.example.helloworld�0 +StringWorker�4096�com.example.helloworld�0 +Worker�1�com.example.helloworld�0 +com.example.helloworld�256�0 +greeting�16384�com.example.helloworld.Global�0 +main�16�com.example.helloworld�0 +name�16384�com.example.helloworld.Worker�0 +process�16�com.example.helloworld.AbstractWorker�0 +process�16�com.example.helloworld.Worker�0 +result�16384�com.example.helloworld.main�0
Haven't checked but does the unit test cover all kinds from `TMParserMapEntry map_KOTLIN`?
Looks good to me in general. I'm not familiar with the peg parsers - I assume these are generated and one has to always run `make` before so the sources get created, right?
@dolik-rce commented on this pull request.
@@ -13,12 +13,18 @@
dstdir = os.path.abspath(sys.argv[2])
os.chdir(dstdir + '/parsers') -parser_dst_files = glob.glob('*.c') + glob.glob('*.h') +parser_dst_files = glob.glob('**/*.c') + glob.glob('**/*.h')
Ok, I can add `ctags/peg/` directory.
@@ -0,0 +1,12 @@
+# format=tagmanager +AbstractWorker�32�com.example.helloworld�0 +Global�1�com.example.helloworld�0 +StringWorker�4096�com.example.helloworld�0 +Worker�1�com.example.helloworld�0 +com.example.helloworld�256�0 +greeting�16384�com.example.helloworld.Global�0 +main�16�com.example.helloworld�0 +name�16384�com.example.helloworld.Worker�0 +process�16�com.example.helloworld.AbstractWorker�0 +process�16�com.example.helloworld.Worker�0 +result�16384�com.example.helloworld.main�0
Yes, it covers all 8 tag kinds.
I'm not familiar with the peg parsers - I assume these are generated and one has to always run `make` before so the sources get created, right?
Yes, they are not committed in the git, but they are generated by `packcc` as needed. At the very least `make peg/kotllin.c` should be called, otherwise the update script fails. Should I add the call in the script?
Yes, they are not committed in the git, but they are generated by packcc as needed. At the very least make peg/kotllin.c should be called, otherwise the update script fails. Should I add the call in the script?
Maybe a good idea, yes. It could be easily forgotten otherwise.
@dolik-rce pushed 1 commit.
a696977aee3267154bb42cc7d84e44b677849835 follow ctags directory structure, call make before copying peg parsers
@techee commented on this pull request.
shutil.copy('parsers/' + f, dstdir + '/parsers/' + f)
- elif os.path.exists(f): - shutil.copy(f, dstdir + '/parsers/' + f) - else: - print('Error: Could not find file ' + f + '!') - sys.exit(1) + shutil.copy(f, dstdir + '/parsers') + +os.chdir(dstdir + '/peg') +peg_dst_files = glob.glob('*.c') + glob.glob('*.h') +peg_dst_files = list(filter(lambda x: not x.startswith('geany_'), peg_dst_files)) +os.chdir(srcdir + '/peg') +os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files))) +print('Copying peg parsers... ({} files)'.format(len(peg_dst_files))) +for f in peg_dst_files: + shutil.copy(f, dstdir + '/peg')
Since the code does more or less the same thing twice, it could be changed to something like the following:
``` os.chdir(srcdir + '/peg') os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files))) for dir in ['parsers', 'peg']: ... ```
@dolik-rce commented on this pull request.
shutil.copy('parsers/' + f, dstdir + '/parsers/' + f)
- elif os.path.exists(f): - shutil.copy(f, dstdir + '/parsers/' + f) - else: - print('Error: Could not find file ' + f + '!') - sys.exit(1) + shutil.copy(f, dstdir + '/parsers') + +os.chdir(dstdir + '/peg') +peg_dst_files = glob.glob('*.c') + glob.glob('*.h') +peg_dst_files = list(filter(lambda x: not x.startswith('geany_'), peg_dst_files)) +os.chdir(srcdir + '/peg') +os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files))) +print('Copying peg parsers... ({} files)'.format(len(peg_dst_files))) +for f in peg_dst_files: + shutil.copy(f, dstdir + '/peg')
I was actually thinking that the script could use some "readability improvements". Like using functions and giving it some structure... But it seemed a bit out of scope for this PR, so I stayed with the current style.
Would you mind if I implemented it in follow-up PR?
@techee commented on this pull request.
shutil.copy('parsers/' + f, dstdir + '/parsers/' + f)
- elif os.path.exists(f): - shutil.copy(f, dstdir + '/parsers/' + f) - else: - print('Error: Could not find file ' + f + '!') - sys.exit(1) + shutil.copy(f, dstdir + '/parsers') + +os.chdir(dstdir + '/peg') +peg_dst_files = glob.glob('*.c') + glob.glob('*.h') +peg_dst_files = list(filter(lambda x: not x.startswith('geany_'), peg_dst_files)) +os.chdir(srcdir + '/peg') +os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files))) +print('Copying peg parsers... ({} files)'.format(len(peg_dst_files))) +for f in peg_dst_files: + shutil.copy(f, dstdir + '/peg')
Would you mind if I implemented it in follow-up PR?
Yep, agreed. #3032 will do some more changes to the script so maybe better wait until both of these pull requests are merged.
I just noticed the size of the parser - 25klocs and 1MB of sources. Have you checked how much the size of stripped libgeany.so increases by this patch?
@techee maybe its time for "somebody" to add the capability for uctags to load parsers from dll :-)
Although the latest Ubuntu uctags package has an installed size of just 2Mb, and we don't use it all (yet :-)
Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses?
I just noticed the size of the parser - 25klocs and 1MB of sources. Have you checked how much the size of stripped libgeany.so increases by this patch?
Yes, the generated code is humongous. I guess it is price for implementing full Kotlin grammar. I have tried to check the library size with `make && strip --strip-unneeded`. Current master is about 3.57 MB, this branch yields 3.72 MB, so roughly 150kB. I agree it is quite a lot, but still manageable I hope.
Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses?
Frankly, the speed could be better. This has already have been discussed when I was adding the parser to ctags and it even resulted in some improvements in PackCC (which is used to generate the code), but it is still slower than handwritten parsers. You can have a look at https://github.com/universal-ctags/ctags/pull/2866 and https://github.com/universal-ctags/ctags/issues/2878 if you're interested.
Anyway, the pauses during editing are not noticeable for reasonably sized files. The only situation where the speed might be annoying is when user opens big kotlin project. But I'm not sure how this actually works, does Geany always process all open files on start? Or is there some kind of caching? I think some of the projects-related plugin even allowed to set option to index all files, even those that are not open. This could then take quite some time.
does Geany always process all open files on start? Or is there some kind of caching? I think some of the projects-related plugin even allowed to set option to index all files, even those that are not open. This could then take quite some time.
IIUC yes, no, yes, probably
Implementing the whole grammar has problems for this use-case, not only speed, but also robustness. It is being used on files that are being edited and may be eronneous or incomplet.
Unless the parser can recover from an error it is going to lose all or many symbols in more cases if it parses things that are not essential to detecting the tags since it will then find more errors. It is a problem with many languages, not just kotlin (see for example #2916), but the fact many tag parsers skip lots of code reduces the impact of partly edited code (outside of declarations).
This is also why full fat IDEs to try to insist that at least the the structural syntax is complete (eg {}s match) so their parsers have some handles to resync and recover after errors.
The kotlin parser can recover from errors. If it fails to parse something, it continues until it finds next parseable top-level object (e.g. function or class). The backtracking parser is actually quite good for this.
I did try at first to write simple, handwritten parser that only gets the important tags. However the problem with Kotlin is that the grammar is quite complex and it lead to way too many inaccuracies. That is why I had to implement it all in the end. Also, it was very hard to track scope, when only some parts of the code were parsed.
Yes, the generated code is humongous. I guess it is price for implementing full Kotlin grammar. I have tried to check the library size with make && strip --strip-unneeded. Current master is about 3.57 MB, this branch yields 3.72 MB, so roughly 150kB. I agree it is quite a lot, but still manageable I hope.
It's not that bad, I guess I'm still stuck with the "it must fit a floppy" mentality.
<!DOCTYPE html>
Implementation | Time | Avg. per file | Tags added -- | -- | -- | -- C | 4 s | 66 us | 567695 RegExp | 155 s | 2.5 ms | 527908 Peg | 523s | 8.6 ms | 782500
<br class="Apple-interchange-newline">
This is quite surprising, I would have expected the performance to be somewhere between a hand-written parser and a regex parser but it probably depends on the grammar and the lookahead it has to perform. Have you tried profiling the parser to see if there's not some unexpected bottleneck? With hand-written parsers I used the following:
https://wiki.geany.org/howtos/profiling/gperftools
Unless the parser can recover from an error it is going to lose all or many symbols in more cases if it parses things that are not essential to detecting the tags since it will then find more errors. It is a problem with many languages, not just kotlin (see for example #2916), but the fact many tag parsers skip lots of code reduces the impact of partly edited code (outside of declarations).
To be fair, hand-written ctags parsers aren't that great at error recovery either - when you type an extra `{` at the beginning of a file, it won't get parsed.
Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses?
If the table above is valid and from what I remember the C parser speed is about 50MB of sources per second and the kotlin parser is about 100x slower than the C parser, it gives us 500KB/s for kotlin. And the pauses should be just something like 0.2s not to be noticeable so about 100KB kotlin source files would be roughly the maximum for Geany.
This is quite surprising, I would have expected the performance to be somewhere between a hand-written parser and a regex parser but it probably depends on the grammar and the lookahead it has to perform. Have you tried profiling the parser to see if there's not some unexpected bottleneck?
I did a lot of profiling on PackCC and it turns out it is not very efficient with memory allocations (see https://github.com/universal-ctags/ctags/pull/2866#issuecomment-780332841), the backtracking is constantly allocating and deallocating small chunks of memory. So, more rules leads to more backtracking, which leads to more allocations, which slows it down. I have even [proposed couple possible optimizations](https://github.com/arithy/packcc/pull/48), but I did not have from enough manpower to actually implement it (sadly, I am not very proficient in C).
I might look into the grammar optimization in future, so the parser might become somewhat faster (especially, if it slows Geany down, that would annoy me a lot). Probably could beat the RegExp based ones (for comparable grammars). According to creator of ctags, the peg parser is about 15-30 times slower than handwritten parser (citation from https://github.com/universal-ctags/ctags/pull/2866#issuecomment-778786809). Even with grammar optimizations, it would still probably be about an order of magnitude slower than let's say current C parser.
Hmm, I'm afraid I don't have very good news regarding the performance. I tried it on Raspberry Pi 400 (which should be slightly faster than regular Raspberry Pi 4) with the following file which IMO isn't that big
https://github.com/JetBrains/Exposed/blob/master/exposed-core/src/main/kotli...
and the lag was already very annoying (to the point that I'd much rather have parsing disabled than editing this way). IMO Raspberry Pi 4 is a reasonable low-end machine Geany should support without big speed problems (apart from that Geany is the semi-official Raspberry Pi editor as it's installed by default) and the 500 LOC file isn't anything extreme either. So I'm not sure whether this parser should be used in Geany. It's shame that PEG parsers are so slow currently - if they were e.g. 10x slower than hand-written parsers, it would be fine I think but 100x is a little too much.
By the way I also tried this file
https://github.com/JetBrains/Exposed/blob/master/exposed-core/src/main/kotli...
which seems to contain many anonymous functions and it seems to be a common pattern in Kotlin files and seeing the symbol tree filled with `<lambda>` doesn't seem that useful to me so maybe these shouldn't be reported to Geany. In any case, the performance issue seems to be a bigger problem though.
That is really surprising :frowning_face: I'd expect RPi 4 to handle this much better. I don't have any slower hardware readily available right now, but I have tested it with `cpulimit -l 50 -i geany SQLExpressionBuilder.kt`, which on my i5 should mean it should get about 1.2GHz and was still pretty smooth. There might of course also be some difference due to memory speed.
Well, I started this as a "Christmas project" last year, so I guess I'll just have a another one for this year: to make it faster.
... seems to contain many anonymous functions and it seems to be a common pattern in Kotlin files and seeing the symbol tree filled with <lambda> doesn't seem that useful to me so maybe these shouldn't be reported to Geany.
Yes, lambdas can be quite common in Kotlin code and you're right, that they are probably useless in the symbol tree most of the time. I think the only valid reason to keep them is that there might be something else declared in their scope. Perhaps we could filter out all lambdas without any child nodes?
There might of course also be some difference due to memory speed.
That's quite probable.
Well, I started this as a "Christmas project" last year, so I guess I'll just have a another one for this year: to make it faster.
I know those! #3032 is Christmas 2018 :-).
I think the only valid reason to keep them is that there might be something else declared in their scope.
That's a good point. Filtering tags should in theory be possible, the question is how many such things we should do in Geany. But maybe anonymous functions are quite a common pattern to deserve this special handling. I'm just preparing a pull request that improves handling of anonymous tags so I might add something like that.
@dolik-rce pushed 2 commits.
9fe7ef7e8d26324c1a549f152796b0c1f7db066e Add support for Kotlin tags 8b695708aeafb641e17fc13e8f58d503aa547c0d follow ctags directory structure, call make before copying peg parsers
@dolik-rce pushed 2 commits.
f81448b83994bc3ff34b1a0d68a6ac8add4eca1e Add support for Kotlin tags fe2acf5ea159ced4eb32f49e257f1d519aa303a3 follow ctags directory structure, call make before copying peg parsers
@dolik-rce pushed 1 commit.
7c66a56ce749b6e735d2a352549a31531f3a3315 fix meson build
@dolik-rce pushed 3 commits.
11e36783f3350066559329d17a82a68a871cfda3 Add support for Kotlin tags f456bd3ab31eede727d92010a66fb2f65d57ee18 follow ctags directory structure, call make before copying peg parsers 784b5ba9f46dfcf139a7878d548e880488ebca9c fix meson build
@dolik-rce pushed 3 commits.
cb4be436d471b88be2e63f3f6032076dd8a0a608 Add support for Kotlin tags b052e116fdabc3996f18f8fd10c54d8fc3e18167 follow ctags directory structure, call make before copying peg parsers da3726e9d10b10df80fdc1994ac9ecabea7e5d3c fix meson build
@dolik-rce pushed 3 commits.
ab8cc5e3db768e6c7220132a98ad50bbe776ab1a Add support for Kotlin tags 25e73f85cab02db9d7106f8c407d1b72957ff714 follow ctags directory structure, call make before copying peg parsers fa83ce47215d7e52a9bbae88f4b3dc1b7a7409ad fix meson build
@dolik-rce pushed 1 commit.
05d2472fcb399535648a556cdec8b4b5f9f8ba8c fix meson build
@dolik-rce pushed 4 commits.
cad66a37655f7f4e3459d8611999e241f9fb4f5a Add support for Kotlin tags 39d1c2d8f6e7f5139dbf05908346da5169c8b94e follow ctags directory structure, call make before copying peg parsers 7a8a685025b715af6c3193a51ddd5c81fe9cd930 fix meson build de76df1cf08ce45656a247733b387024b3f28eb4 update ctags
@dolik-rce pushed 0 commits.
github-comments@lists.geany.org