The regex engine used by Geany does not match the one used by upstream ctags, so incorporating the regex parsers would need two regex engines.
Either this should be accepted (IMHO preferred as it allows to keep in sync with upstream) or an alternative implementation for those languages should be provided.
You know what? I actually changed my mind thinking about this lately, and now agree the best way to go is probably just use whatever upstream does. The goal is to use upstream as if it was a library, and actually target making it one that we could simply use. And one of the prominent CTags features is custom parser definitions (even more so with recent uctags additions), and even though we don't support this in Geany yet, it probably shouldn't be stripped from a libuctags.
So, I'll withdraw my "please, no duplicated regex libraries" argument. I'm still not in love with it, but now think it's probably the lesser of the evils (esp. if we can use the OS' regex one in most setups -- but the odd Windows kid). I have no idea how easy it'll be to get the regex parsing code in, though.
Whats involved in "getting the regex parsing code in"? Is it all inside the cobol.c?
Whats involved in "getting the regex parsing code in"? Isn't it all inside the cobol.c?
I'm not sure what's involved exactly, but no, it's not all in cobol.c otherwise it wouldn't have been a problem to just keep it :) The thing is that it needs the whole regex infrastructure from uctags, which we currently don't have. So that's another bunch of stuff to fetch and adapt from there.
Ahh, I just saw regexen inside uctags/cobol.c, but I see that they are passed in the parser definition not used inside cobol.c, so yeah there has to be something else.
On actionscript, I build uctags from git this afternoon, and `ctags --list-languages` doesn't list it, so it would appear it has been dropped?
Ahh, its now the `flex.c` parser I think, that seems to be a normal parser, not a regex one. So it probably can be incorporated easier?
@techee any comments?
@b4n @techee ping, any comment on using upstream flex as the actionscript parser?
You know what? I actually changed my mind thinking about this lately, and now agree the best way to go is probably just use whatever upstream does. The goal is to use upstream as if it was a library, and actually target making it one that we could simply use.
Yeah, this is my thinking too and using GNU regex would simplify things. I can try to prepare a patch to get it working on linux - I don't think it will be that hard (unless I run into some unpredictable issues). For windows I guess it means adding
https://github.com/universal-ctags/ctags/tree/master/gnu_regex
compiling it and using it for regex parsing, right? I'll leave that to someone with better knowledge of building for windows and autotools.
Umm, hello, is nobody going to answer the questions above about actionscript parser?
@elextr I have no clue from the top of my head and hadn't time to check yet, but I'll look. Also see https://github.com/universal-ctags/ctags/pull/2076 for COBOL :)
@b4n Ahh ha, so if that happens and my suggestion that flex.c is an actionscript parser then we won't need a regex infrastructure anyway :)
Umm, I should say that the basis for my suggestion of flex.c is this in the header:
``` * This module contains functions for generating tags for Adobe languages. * There are a number of different ones, but this will begin with: * Flex * MXML files (*.mMacromedia XML) * ActionScript files (*.as) * * Flex 3 language reference * http://livedocs.adobe.com/flex/3/langref/index.html ```
I have no knowledge or time to evaluate it beyond that :)
@elextr If only I could try Geany master behavior and compare results… :grin:
@b4n `git checkout 1.34.1` FTW :smile:
@b4n @techee ping, any comment on using upstream flex as the actionscript parser?
A quick test (not thorough at all) seems to show that yes, upstream has `flex` for ActionScript, but also that is very limited and worse than what we had. I'll look a little deeper into this, but it kinda sounds like I'll have to learn another language…
It [appears](http://flex.apache.org/) that flex is a framework within which actionscript is used as only part of it, so yeah it might not be as important to the parser.
@b4n, can you improve the parser thats part of flex.c rather than starting a new one from scratch?
@b4n, can you improve the parser thats part of flex.c rather than starting a new one from scratch?
That was of course the plan.
But I'm kind of depressed now looking at it. It's a variation of JavaScript (apparently it's ECMA-262 as well, not sure what that exactly means as it's not exactly the same language as JavaScript), and the upstream Flex parser is a copy of an older version of jscript, full of the tedious bugs that I fixed in the JS parser. This is a nightmare I didn't expect.
The best solution would probably be make a ECMA-262 generic module, and have the jscript and actionscript parser use that. Also, `flex` is actually a parser for MXML *and* ActionScript (MXML is an XML dialect that can contain ActionScript data), which predates the introduction of sub-parser concept. Nowadays this would be better handled with 2 separate parsers, and have the MXLM one call on the ActionScript one on the appropriate ranges. Meh, anyway, that ain't gonna happen for 1.35.
Some pain later: https://github.com/universal-ctags/ctags/pull/2084 (actually, once I gave up importing separate commits, it became very tedious but less painful).
Allow me to join the discussion.
The regex engine used by Geany does not match the one used by upstream ctags
How about replacing the regex engine of u-ctags? https://github.com/universal-ctags/ctags/issues/1861
I'm very surprised that @b4n has rewritten two parsers of u-ctags. Improving Cobol and Flex parsers are welcome but... @b4n, what will you do if someone adds more regex basedd parsers to u-ctags. Replacing the regex engine of u-ctags is more effortless. I'm sorry I don't have time now. So I don't read the all discussion.
@masatake See https://github.com/geany/geany/pull/2132 - I've added back support for regex parsers.
After merging #2132, I think I should stop introducing new code to ctags side. How about moving libctags code to u-ctags? If it is in u-ctags side, I can take care not to break the interface. You can add a test case that represents an expectation of Geany side. I don't want Geany people to take time for synchronizing code.
Should I open an issue at Geany issue tracker or U-ctags issue tracker?
I guess libctags in geany may disable some unwanted features. It is understandable to me. Some features I have added to u-ctags is so immature. u-ctags should provide the way to disable such features at build-time or run-time.
Could you write a short document about libctags internal? Based on the document, I will try to move libctags code from Geany to u-ctags incrementally.
How about moving libctags code to u-ctags?
Yeah, that's the plan. Actually there are very little changes needed now (try to diff the commit of uctags mentioned in #2132 against the Geany version from the pull request), we should just try to think about what's the right way to do it in uctags - the way we handle re-parsing might not be the best way from uctags point of view and it would probably be nicer to handle it using some new callback functions in writers.
I definitely want to work on it but only when we get Geany reasonably up-to-date. Even #2132 is against uctags which is several months old. But if it gets merged soon, I'd create one more commit to sync it against the most recent uctags version and then we can start thinking about bringing the needed functionality to uctags.
Could you write a short document about libctags internal?
There's not much to write. Basically we just need to:
* bypass some things from ctags main() which are irrelevant for us (command-line options etc.) - this is done in ctagsInit() inside ctags.api.c * register our custom writer (done inside ctagsInit()) which unlike other writers doesn't write to the MIO (which is NULL in our case) but instead the writeEntry() passes the tag information to Geany * be informed when re-parsing happens so we can invalidate the tags we passed to Geany - this is currently done in a hacky way but it could be nicer if there was a function for it in a writer * be able to initiate parsing by ourselves - either from a file or from a char buffer - see ctagsParse() in ctags-api.c * get some basic information about the registered parsers like kinds they provide etc. - see the rest of ctags-api.c
This is basically the main functionality, there are some other diffs where we for instance don't have fnmatch.h or where the parser list order has to correspond to the order specified in Geany which we could maybe change in Geany itself. But that's basically it.
@techee @masatake what about Geany's c.c having a lot of extra stuff (even extra languages IIUC)? I don't watch uctags much, but last I looked any attempt to import that was abandoned.
@techee @masatake @b4n
I assume you are talking about having u-ctags provide a library that other apps can use? Thats a good idea. Why not refactor u-ctags to separate the parsers and parsing infrastructure from the command line and file writing stuff. That way such a library only includes parsing, thats the part most apps want and the library can be used by the u-ctags app of course :).
As I think @techee proposed apps (including u-ctags itself) should register functions to provide things like reading files and accepting tags, for u-ctags to write to disk (I presume) and for Geany to add to symbols.
Also apps need to be able to not build in parsers for languages they don't want, there are twice as many files in u-ctags/parsers as in Geany/parsers directory.
Related: https://github.com/universal-ctags/ctags/issues/63
I'm very surprised that @b4n has rewritten two parsers of u-ctags. Improving Cobol and Flex parsers are welcome but... @b4n, what will you do if someone adds more regex basedd parsers to u-ctags. Replacing the regex engine of u-ctags is more effortless.
Adding support for u-ctags regex parser would be better and is the goal, but it's not easy, and certainly not effortless -- see @techee's work :slightly_smiling_face: The reason why I worked on the parsers is that we'd need something working in the next release for COBOL and ActionScript, relase which is supposed to happen next weekend, which is awful close, and I didn't expect getting the regex parsers to work would be a trivial task that would take a couple of hours and be robust. So I figured that it was safer to work on 2 parsers, as if there really was a problem with them it would only affect users of those, not potentially everyone. And well, while at it I figured I could probably improve parsing for the relevant languages given those regex parsers seemed quite basic. Also to be fair you failed to take into account my tendency to write parsers for languages I don't know :grinning:
--- I'll answer the rest later, hopefully today.
@techee @masatake what about Geany's c.c having a lot of extra stuff (even extra languages IIUC)? I don't watch uctags much, but last I looked any attempt to import that was abandoned.
Well, that will definitely not be done by me :-). However, I think that the first step for Geany could be to switch to the new cxx parser for C/C++ and not to use the parser of these languages from c.c. Then the C/C++ portion of c.c could be removed completely and I think especially C++ contributes by the largest amount of code. The same could be done with the upstream c.c and it might be much easier to merge what remains.
Why not refactor u-ctags to separate the parsers and parsing infrastructure from the command line and file writing stuff. That way such a library only includes parsing, thats the part most apps want and the library can be used by the u-ctags app of course :).
Yes, that would be the ideal long-term solution. And actually the writing stuff should probably be part of such a library too because some applications may want to write the tags files - it's just the command-line interface that should be separate. But doing this right means a lot of thinking about what the official API should look like and in the end may need a lot of refactoring so first I'd just try to bypass what we don't want to happen for Geany like the command-line interface add some missing functionality we need in Geany (and there's very little of new code, it's just something like 100 lines) and use the internal ctags API for now. I could create some dummy client application for uctags which uses the same functionality we need in Geany to make sure some internal change in uctags doesn't remove or break what we need.
Also apps need to be able to not build in parsers for languages they don't want, there are twice as many files in u-ctags/parsers as in Geany/parsers directory.
Yeah, but in this case it's our business to remove the parsers we don't want to get compiled like we do with Scintilla:
https://github.com/geany/geany/blob/master/scintilla/scintilla_changes.patch
Yeah, but in this case it's our business to remove the parsers we don't want to get compiled like we do with Scintilla:
Actually thinking about it some more, thats the wrong way to look at it.
If it was a proper library then both Geany and ctags and others should be able to use it _as a library_. There should be no need to import ctags code into Geany at all if it was a proper library, just link to the API.
And that helps u-ctags because everyone is using and testing and fixing and improving the same code instead of differing versions between upstream and other users (which is still the case despite the valiant efforts of @techee).
@elextr,
If it was a proper library then both Geany and ctags and others should be able to use it as a library. There should be no need to import ctags code into Geany at all if it was a proper library, just link to the API.
This is ideal but I would like to focus on the interface between Geany. I don't have enough skill to design and maintain a new library API. 3 ~ 5 years ago, I declined to make ctags a library in the above reason. However, libctags is already there. I don't have to design an API. I think I can do keeping the interface. The benefit of moving the library to u-ctags project is that I, an active ctags developer, can take care of keeping the interface working.
The following figure shows the position of libctags in us.
![libctags-layer](https://user-images.githubusercontent.com/77077/56729424-3bef4980-6790-11e9-...)
For a while, the libctags is only for Geany. I assume Geany may incorporate ctags code via git subtree.
@techee
I could create some dummy client application for uctags which uses the same functionality we need in Geany to make sure some internal change in uctags doesn't remove or break what we need.
YES! It is what we need. Let's give an impressive name the dummy client. This client drives us. My idea is 'minigeany' or 'microgeany'. The client is at uctags side so we can add test cases on it.
After merging #2132, I think I should stop introducing new code to ctags side.
I can only guess this is not exactly what you meant, because we certainly don't want to freeze upstream u-ctags, nor stop innovation. If you mean try and keep things stable, yes, it's best to maintain stable interface whenever possible, but it shouldn't prevent innovation.
@techee @masatake what about Geany's c.c having a lot of extra stuff (even extra languages IIUC)? I don't watch uctags much, but last I looked any attempt to import that was abandoned.
Well, that will definitely not be done by me :-). However, I think that the first step for Geany could be to switch to the new cxx parser for C/C++ and not to use the parser of these languages from c.c. Then the C/C++ portion of c.c could be removed completely and I think especially C++ contributes by the largest amount of code. The same could be done with the upstream c.c and it might be much easier to merge what remains.
It's probably something I'll look at at some point. I'll check, but I guess it'll end up splitting the different languages into separate parsers proper. I do even have a standalone Vala parser lying somewhere that just await a revival [1] so that part could be fairly easy and give Vala support a boost. Anyway, it's not my priority right now, but it might become so when we progress further enough towards a u-ctags sync.
Apart from that, I don't have much to add: yes, the ultimate goal should probably being able to simply link a libuctags, and in the meantime we should try and make it as easy as possible to keep Geany's copy of uctags in sync with upstream. I trust @techee to layout the required interface, as he's acquired a lot of experience here :)
However, libctags is already there.
@masatake I assume you mean the libctags that Geany builds?
@b4n @techee, does that have an API? My understanding is that it is just a collection of pieces of u-ctags for use by Geany? And why do we build it as a library anyway, why not just link it?
However, libctags is already there. @masatake I assume you mean the libctags that Geany builds?
Yes, I mean "libctags in Geany source tree". It already works well, as you know.
This is ideal but I would like to focus on the interface between Geany.
I don't have enough skill to design and maintain a new library API. 3 ~ 5 years ago, I declined to make ctags a library in the above reason. However, libctags is already there. I don't have to design an API. I think I can do keeping the interface. The benefit of moving the library to u-ctags project is that I, an active ctags developer, can take care of keeping the interface working.
Totally agree, I think uctags shouldn't go crazy about making it a library - it's primarily a command-line program and with Geany being the only application (mis)using it as a library, there's no need to spend much effort on it. For us it's not really a problem if some internal interface changes - we can fix the code in Geany easily as long as the core required functionality is accessible to us in some way. The main purpose of the ctags-api.c/h file (which I moved back to tag manager in the latest version) is to isolate Geany from the rest of ctags so this is the only file we'd have to update if something changes.
YES! It is what we need. Let's give an impressive name the dummy client. This client drives us.
My idea is 'minigeany' or 'microgeany'. The client is at uctags side so we can add test cases on it.
Good, I guess I'll do it now. The weather is beautiful outside and it would be really stupid not to sit inside and code ;-). I've made some more changes in #2132 to improve how we interface with ctags and creating such a client should be really easy now. I'll open a pull request in uctags when I'm done.
For anyone interested, I've created the simple ctags client app here:
https://github.com/universal-ctags/ctags/pull/2085
See #2134 for ActionScript.
Closing this one, we use upstream regex parsers already.
Closed #2119.
github-comments@lists.geany.org