New, flexible interfaces to query tags from a workspace.
The existing functions are rather tied to their use case (either showing tag names in the autocomplete list or in the goto definition popup). E.g. there is currently no function to search for all tags starting with a prefix that does NOT do deduplication.
The goal is to provide these interfaces to plugins. In particular, python plugins (via my peasy plugin) can benefit. Mangling the global tag arrays manually is super slow (pygobject converts each tag in each array to a PyGObject on first use, even if no tag referenced). Another goal is to eventually use this in the existing tag search/query functions.
@techee @b4n Please comment and review You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/1187
-- Commit Summary --
* tagmanager: new query module * gi: add support for array annoations * plugin api: export new tagmanager query interfaces
-- File Changes --
M doc/Doxyfile.in (5) M scripts/gen-api-gtkdoc.py (3) M src/tagmanager/Makefile.am (6) A src/tagmanager/tm_query.c (312) A src/tagmanager/tm_query.h (37) M src/tagmanager/tm_tag.c (35) M src/tagmanager/tm_tag.h (12) M src/tagmanager/tm_workspace.c (4)
-- Patch Links --
https://github.com/geany/geany/pull/1187.patch https://github.com/geany/geany/pull/1187.diff
g_slice_free(TMQuery, q);
- }
+}
+/** Add name filter to a query
- The query results will be restricted to tags matching the name. The
- matching is a simple prefix matching. The length to match can be
- limited to allow multiple tags to match a prefix.
- @param q The query to operate on.
- @param name The name to filter.
- @param name_len The number of characters to use (from the beginning).
- @return 0 on succcess, < 0 on error.
looks like a better job for a boolean
- GPtrArray *ret;
- sort_options.sort_attrs = NULL;
- /* tags_array isn not needed by tm_tag_compare(), but for tm_search_cmp() */
- sort_options.tags_array = NULL;
- sort_options.first = TRUE;
- foreach_ptr_array(s, i, q->names)
- {
TMTag **ptag;
sort_options.cmp_len = s->len;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
why insert sorted, esp. as you optionally sort at the end??
{
tags = tm_tags_find(q->workspace->tags_array, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
}
- }
- /* Filter tags according to scope, type and lang. */
- for (node = res.head; node; node = next)
- {
gboolean match = TRUE;
next = node->next;
tag = node->data;
foreach_ptr_array(scope, i, q->scopes)
match = match && (g_strcmp0(tag->scope, scope) == 0);
as you seem worried about speed, better break out early when match becomes false, as it will just keep being false on and on
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
- }
- g_list_free(res.head);
why use a list to then convert it? looks like a better job to direct `GPtrArray` + `tm_tags_prune()`
if (match && q->type >= 0)
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
[`g_ptr_array_insert()`](https://developer.gnome.org/glib/unstable/glib-Pointer-Arrays.html#g-ptr-arr...) is awfully recent (2.40), plus I really don't see the advantage over [`g_ptr_array_add()`](https://developer.gnome.org/glib/unstable/glib-Pointer-Arrays.html#g-ptr-arr...) which IIUC your usage would do exactly the same -- minus the theoretical possibility for inserting not at the end, which would kill the performances (and be wrong).
Well… I don't quite love the proposed query API, but well, I don't know. But it's not too crazy -- well, not more than a DB API would be crazy at least.
But maybe you could show real use cases of why this kind of API is good and fits everyone?
- */
+GEANY_API_SYMBOL +TMQuery *tm_query_new(const TMWorkspace *workspace, gint data_sources) +{
- TMQuery *q = g_slice_new(TMQuery);
- q->workspace = workspace;
- q->data_sources = data_sources;
- q->names = g_ptr_array_new();
- q->langs = g_array_new(FALSE, FALSE, sizeof(TMParserType));
- q->scopes = g_ptr_array_new();
- q->type = -1;
- q->refcount = 1;
- g_ptr_array_set_free_func(q->names, free_g_string);
- g_ptr_array_set_free_func(q->scopes, g_free);
https://developer.gnome.org/glib/stable/glib-Pointer-Arrays.html#g-ptr-array...
+gint tm_query_add_lang(TMQuery *q, TMParserType lang) +{
- g_array_append_val(q->langs, lang);
- return 0;
+}
+/** Set the tag type filter of a query.
- The query results will be restricted to tags match any of the given types.
- You can only set this filter because the @a type is a bitmask containing
- multiple values.
- @param q The query to operate on.
- @param type Bitmasp of the tag time to filter, see @a TMTagType.
`s/Bitmasp/Bitmask/` `s/tag time/tag type/` ?
g_slice_free(TMQuery, q);
- }
+}
+/** Add name filter to a query
- The query results will be restricted to tags matching the name. The
- matching is a simple prefix matching. The length to match can be
- limited to allow multiple tags to match a prefix.
- @param q The query to operate on.
- @param name The name to filter.
- @param name_len The number of characters to use (from the beginning).
- @return 0 on succcess, < 0 on error.
It never returns an error either, for example if `q` or `name` is `NULL` (the latter causing an empty string in the array as opposed to a crash).
-1 to exposing more TM, we should be getting rid of public TM stuff and focusing on a better (non-TM) API for plugins to be able to provide this kind of features (ex. using libclang, libpython, [libuctags-future](https://github.com/universal-ctags/ctags/issues/63), etc), IMO. Also if the only use-case is to work around deficiencies in PyGI, it seems like it would be better suited as a helper function/API provided by the libpeas plugin.
well, not more than a DB API would be crazy at least.
@b4n, great idea!! replace TM with sqlite
Its even got a precedent :)
- */
+GEANY_API_SYMBOL +TMQuery *tm_query_new(const TMWorkspace *workspace, gint data_sources) +{
- TMQuery *q = g_slice_new(TMQuery);
- q->workspace = workspace;
- q->data_sources = data_sources;
- q->names = g_ptr_array_new();
- q->langs = g_array_new(FALSE, FALSE, sizeof(TMParserType));
- q->scopes = g_ptr_array_new();
- q->type = -1;
- q->refcount = 1;
- g_ptr_array_set_free_func(q->names, free_g_string);
- g_ptr_array_set_free_func(q->scopes, g_free);
Will use that.
g_slice_free(TMQuery, q);
- }
+}
+/** Add name filter to a query
- The query results will be restricted to tags matching the name. The
- matching is a simple prefix matching. The length to match can be
- limited to allow multiple tags to match a prefix.
- @param q The query to operate on.
- @param name The name to filter.
- @param name_len The number of characters to use (from the beginning).
- @return 0 on succcess, < 0 on error.
The ability to return error codes is more to make it extensible. Since I want them in the plugin API the return type should be determined now (even if it always returns the same value *now*).
While gboolean would work, I've almost always regretted using in the past. gint is so much more flexible. I can change to boolean if you truly want it, though.
+gint tm_query_add_lang(TMQuery *q, TMParserType lang) +{
- g_array_append_val(q->langs, lang);
- return 0;
+}
+/** Set the tag type filter of a query.
- The query results will be restricted to tags match any of the given types.
- You can only set this filter because the @a type is a bitmask containing
- multiple values.
- @param q The query to operate on.
- @param type Bitmasp of the tag time to filter, see @a TMTagType.
Thanks, will correct that.
- GPtrArray *ret;
- sort_options.sort_attrs = NULL;
- /* tags_array isn not needed by tm_tag_compare(), but for tm_search_cmp() */
- sort_options.tags_array = NULL;
- sort_options.first = TRUE;
- foreach_ptr_array(s, i, q->names)
- {
TMTag **ptag;
sort_options.cmp_len = s->len;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
I want to have them sorted when no explicit sorting is given, see the doc comment. This is to preserve the semantics of the existing, global tag arrays. E.g. some tag manager functions expect a sorted TMTag array so the result can be used OOTB.
Sorting by tag name is almost always wanted anyway.
{
tags = tm_tags_find(q->workspace->tags_array, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
}
- }
- /* Filter tags according to scope, type and lang. */
- for (node = res.head; node; node = next)
- {
gboolean match = TRUE;
next = node->next;
tag = node->data;
foreach_ptr_array(scope, i, q->scopes)
match = match && (g_strcmp0(tag->scope, scope) == 0);
I'm not worried about best possible speed, just about not being super slow. The performance problem with the global tag array I mentioned is half a second UI freeze (for each global_tags and tag_array) on the first access. This is unbearable.
For this code I care a bit more about readability. I tried to add some breaks there but it would really be ugly IMO.
if (match && q->type >= 0)
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
I thought of g_ptr_array_insert() as a more beatiful way of saying `ret->pdata[i++] = tag`. If that's too recent (sorry, I wasn't aware) I'll use the plan C code. g_ptr_array_add() would just traverse the whole array again and again so it's not the best tool for the job either.
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
- }
- g_list_free(res.head);
A doubly-linked list is better when you insert in the middle or even beginning which this function does (arrays require the tail to be `memmove`'d. What makes you think direct `GPtrArray` + `tm_tags_prune()` is better?
One more vote for seeing uses of this. I'm a little worried about adding more API before anyone actually uses it (especially API like this which is just a facade for what is already accessible). I understand the performance problem but the question is how many people will want to create python plugins working with TM; in geany-plugins there are just geanygendoc, projectorganizer and geanyprj dealing with TM and it doesn't seem someone would want to create TM-using plugins very often.
Another goal is to eventually use this in the existing tag search/query functions.
For C this new API is more or less useless because the filtering can easily be done by directly looking at the tag values and IMO it's more explicit and clearer than using some layer in between.
Well… I don't quite love the proposed query API, but well, I don't know. But it's not too crazy -- well, not more than a DB API would be crazy at least.
But maybe you could show real use cases of why this kind of API is good and fits everyone?
Below is the python code of that I can improve with the new API. As I mentioned, the initial code causes a half-second UI lag which is unbearable. And my session doesn't even have too much tags, although I use ProjectManager to load tags for the project so it's just not open files (this is when I work on Geany, not the Linux kernel). It was under 10000 tags when I profiled the python code.
Let me describe my use case and what I found when I searched for an existing good fit. - My use case is: I'm developing a [quickswitch](https://github.com/kugel-/peasy/blob/master/plugins/quickswitch.py) plugin (inside Peasy). The plugin adds a dialog and a keybinding to allow to quickly type the prefix of a tag name to jump to its definition or location. If the prefix is ambigious, it gives a list of candidates so I can chose one. So I press Alt+3, type tm_q, then it opens a list of tm_query_* functions to jump to. If I type tm_query_ex, then it immediately jumps to the definition of tm_query_exec. - I found `tm_workspace_find()` which does exact match (I need prefix matching). `tm_workspace_find()` is tied to the use case of finding the destination of go-to-definition/declaraiton - I found `tm_workspace_find_prefix()` which dedups. I don't want deduplication. `tm_workspace_find_prefix()` is tied to the use case of showing candidates in the autocomplete list - Both have a bug that "lang == -1" doesn't actually work (despite their doc comments) because `tm_tag_langs_compatible()` doesn't actually look for -1. I cannot specifiy a lang in the plugin. - Both unconditionally search in the global tags as well which is isn't needed in my case so just a waste of resources.
So I thought it would be more useful to implement a generic, query interface that can be used by Geany itself and plugins, instead of adding one more special-purpose `tm_workspace_find_prefix_no_dedup_no_global` function. One could also add tons of parameters to `tm_workspace_find()` to make it fit, but I really liked the idea of having a dedicated C file with a generic query API more. I always care about extensibility if stuff is in the plugin API. The query API can be easily extended with more filters or other stuff without breaking existing users, which is not possible with a single, one-size-fits-all function (new stuff would require new argments, therefore an API break).
The idea is also to use query API inside geany so that `tm_workspace_find*` can be removed or refactored. The API is generic enough that `tm_workspace_find`, `tm_workspace_find_prefix` and `tm_workspace_find_scope_members` can be replaced or implemented with the query interface. Yes, it kind of resembles a DB query. This is quite on purpose because that makes it extensible. Plus, we could probably even replace tagmanager with sqlite and without having to change this interface :-)
``` def search_tags2(self, prefix): print("search_tags2") ws = self.geany_plugin.geany_data.app.tm_workspace q = Geany.tm_query_create(ws, Geany.TMQuerySource.SESSION_TAGS); q.add_name(prefix, len(prefix)) return q.exec(None, None)
def search_tags(self, prefix): if (hasattr(Geany, "tm_query_create")): return self.search_tags2(prefix) ret = [] # This is super slow with lots of tags. pygobject seems to call getattr() # for all elements on the first use, regardless of how it used array = self.geany_plugin.geany_data.app.tm_workspace.tags_array n = tag_bisect_left(array, prefix) for tag in array[n:]: if (tag.name.startswith(prefix)): ret.append(tag) else: break return ret ```
Am 23.08.2016 um 09:27 schrieb Jiří Techet:
Another goal is to eventually use this in the existing tag search/query functions.
For C this new API is more or less useless because the filtering can easily be done by directly looking at the tag values and IMO it's more explicit and clearer than using some layer in between.
I disagree. It's a very nice improvement if `tm_workspace_find_prefix()` along with its helper `fill_find_tags_array_prefix()` can be replaced with a simple implementation like this:
``` GPtrArray *tm_workspace_find_prefix(const char *prefix, TMParserType lang, guint max_num) { TMTagAttrType dedup_attrs[] = { tm_tag_attr_name_t, 0 }; GPtrArray *tags; TMQuery *q = tm_query_new(theWorkspace, TM_QUERY_SOURCE_GLOBAL_TAGS | TM_QUERY_SOURCE_SESSION_TAGS); tm_query_add_name(q, prefix, strlen(prefix)); tm_query_add_lang(q, lang); tags = tm_query_exec(q, NULL, dedup_attr);
if (tags->len > max_num) tags->len = max_num; return tags; } ```
Same goes for `tm_workspace_find()` and `tm_workspace_find_scope_members()`
if (match && q->type >= 0)
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
`g_ptr_array_add()` would just traverse the whole array again and again so it's not the best tool for the job either.
What makes you think that? `GPtrArray` maintains a size, so no need for traversal -- which wouldn't even work with NULL items anyway). And [`g_ptr_array_add()` is actually basically implemented as `array->pdata[array->len++] = data;`](https://git.gnome.org/browse/glib/tree/glib/garray.c#n1375).
- GPtrArray *ret;
- sort_options.sort_attrs = NULL;
- /* tags_array isn not needed by tm_tag_compare(), but for tm_search_cmp() */
- sort_options.tags_array = NULL;
- sort_options.first = TRUE;
- foreach_ptr_array(s, i, q->names)
- {
TMTag **ptag;
sort_options.cmp_len = s->len;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
Sorting on insertion is bound to be slower than sorting at the end when building a complete set. Sorting on insertion is good when wanting to expand an existing sorted set and maintain it sorted (well, so long as the number of insertions is sufficiently small compared to the size of the data set). So you should just add, and sort like you want at the end (using `tm_tags_sort()` even, it's there!). IIUC, there is no need for the list to be sorted before returning it, so maintaining it sorted is just a waste of resources -- plus extra code for `tag_compare_ptr`.
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
- }
- g_list_free(res.head);
There is no need to add in the middle. As mentioned above, sorting before the end looks like a bad idea here, so you end up trading prunning the array with building a list. And prunning the array is basically implemented as a set of pointer moves, not consecutive `memmove()`s.
- GPtrArray *ret;
- sort_options.sort_attrs = NULL;
- /* tags_array isn not needed by tm_tag_compare(), but for tm_search_cmp() */
- sort_options.tags_array = NULL;
- sort_options.first = TRUE;
- foreach_ptr_array(s, i, q->names)
- {
TMTag **ptag;
sort_options.cmp_len = s->len;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
Okay. I was thinking sorting on insertion with a doubly linked list would be quicker (since the insertion point can be found with bisection, and inserting is then O(1)) but I don't feel strong enough to make serious measurements, so I'll just go back and post-sort.
The array still has the sligt disadvantage that it must be resized when adding items exeeding the lnght, but his is ot a ptoblem here.
if (match && q->type >= 0)
match = tag->type & q->type;
/* Remove tag from the results. tags are unowned so removing is easy */
if (!match)
g_queue_unlink(&res, node);
- }
- /* Convert GQueue to GPtrArray for sort, dedup and return */
- i = 0;
- ret = g_ptr_array_sized_new(g_queue_get_length(&res));
- foreach_list(node, res.head)
- {
tag = (TMTag *) node->data;
g_ptr_array_insert(ret, i++, tag);
Thanks, you are right. Somewhow I was thinking in the lines if GSList where the insertion poinst must be searched first. Yes, `g_ptr_array_add()` is fine, I'll use that.
- GPtrArray *ret;
- sort_options.sort_attrs = NULL;
- /* tags_array isn not needed by tm_tag_compare(), but for tm_search_cmp() */
- sort_options.tags_array = NULL;
- sort_options.first = TRUE;
- foreach_ptr_array(s, i, q->names)
- {
TMTag **ptag;
sort_options.cmp_len = s->len;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
since the insertion point can be found with bisection
It couldn't as lists don't allow random access. The only way is basically to traverse the whole list up to the insertion point. If you knew the data set you could imagine making a guess at whether it's "close to" the end/start and start there, but that's as good as it gets AFAIK.
Sorting is a different story, because you can sort several items at once if you find interesting properties on them, and can take clever paths in some cases, but that doesn't work with a single insertion.
The array still has the slight disadvantage that it must be resized when adding items exceeding the lenght, but this is not a problem here.
Yeah well, it's not so much worse than allocating the whole array at the end as `GPtrArray` is not so stupid and grows in chunks.
@kugel- pushed 2 commits.
75ef096 tagmanager: refactoring tm_query 2805e1f tagmanager: tm_query: allow prefix matching for scope as well
- GPtrArray *ret;
- sort_options.sort_attrs = NULL;
- /* tags_array isn not needed by tm_tag_compare(), but for tm_search_cmp() */
- sort_options.tags_array = NULL;
- sort_options.first = TRUE;
- foreach_ptr_array(s, i, q->names)
- {
TMTag **ptag;
sort_options.cmp_len = s->len;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
foreach_c_array(ptag, tags, ntags)
g_queue_insert_sorted(&res, *ptag, tag_compare_ptr, &sort_options);
It couldn't as lists don't allow random access
Err. Of course, you are right, on all points. I removed all of the linked list code, hope it's okay now.
I don't know but I still find it a bit too heavy-weight for what you use it for. Wouldn't it be sufficient to just extend some tag array utility functions and make them accessible for plugins? These two
``` GPtrArray *tm_tags_extract(); void tm_tags_sort(); ```
could be changed to e.g.
``` GPtrArray *tm_tags_extract(GPtrArray *tags_array, gchar *name, gchar *scope, gboolean prefix, TMParserType lang, TMTagType tag_types); void tm_tags_sort(GPtrArray *tags_array, TMTagAttrType *sort_attributes, gboolean dedup); ```
plus maybe add
``` GPtrArray *tm_tags_concat(GPtrArray *array1, GPtrArray *array2); ```
to concatenate two tag arrays. You will be able to apply them on arbitrary tag arrays (global or non-global) or if you wanted logical `or`, you could run several tm_tags_extract(), concat the results and run tm_tags_sort() on the result.
g_slice_free(TMQuery, q);
- }
+}
+/** Add name filter to a query
- The query results will be restricted to tags matching the name. The
- matching is a simple prefix matching. The length to match can be
- limited to allow multiple tags to match a prefix.
- @param q The query to operate on.
- @param name The name to filter.
- @param name_len The number of characters to use (from the beginning).
- @return 0 on succcess, < 0 on error.
If you want it to be a nice API for non-C languages, go all the way and use a `GError`, that will be translated to a proper exception, contains a code and even a message. Not saying it's really worth it, but it seems more like the canonical way of doing it in GLib-using stuff.
g_slice_free(TMQuery, q);
- }
+}
+/** Add name filter to a query
- The query results will be restricted to tags matching the name. The
- matching is a simple prefix matching. The length to match can be
- limited to allow multiple tags to match a prefix.
- @param q The query to operate on.
- @param name The name to filter.
- @param name_len The number of characters to use (from the beginning).
- @return 0 on succcess, < 0 on error.
…and it still allows the boolean-based calls of `if (call())` in C
g_slice_free(TMQuery, q);
- }
+}
+/** Add name filter to a query
- The query results will be restricted to tags matching the name. The
- matching is a simple prefix matching. The length to match can be
- limited to allow multiple tags to match a prefix.
- @param q The query to operate on.
- @param name The name to filter.
- @param name_len The number of characters to use (from the beginning).
- @return 0 on succcess, < 0 on error.
Is this a serious suggestion?
If you dislike gint return so much I will change it. Just saying I regularly regret this later on (but not always).
GPtrArray *tm_tags_extract(GPtrArray *tags_array, gchar *name, gchar *scope, gboolean prefix, TMParserType lang, TMTagType tag_types);
This is specifically the type of function I want to avoid. This is not extensible at all, so not a good fit for a plugin API. Also, parameter-heavy functions are generally hard to use correctly (it's sometimes necessary but in this case my proposal gives a better solution IMO).
This is not extensible at all, so not a good fit for a plugin API
Then stick a `...` as the last parameter, infinitely extensible :)
An actual sensible good trick used by Linux system calls (among others) is to always have a `flags` parameter as the last thing, then you can add options for stuff like `prefix` or whatever else could be represented as a bitmask.
This is specifically the type of function I want to avoid. This is not extensible at all, so not a good fit for a plugin API. Also, parameter-heavy functions are generally hard to use correctly (it's sometimes necessary but in this case my proposal gives a better solution IMO).
But yours isn't very consistent in this respect either - sorting parameters are provided as arguments of tm_query_exec() while the rest is provided by the query setters.
If you want to keep the setters instead of function parameters it's fine - this wasn't the important thing in my comment. But I think it would be much more flexible if it could be applied to an arbitrary tag array instead of the hard-coded global or session tags. There's for instance no way to get tags for the current document only. I would prefer
``` GPtrArray *tm_query_filter(GPtrArray *tag_array, TMQuery *q, TMTagAttrType *sort_attr, TMTagAttrType *dedup_attr) ```
and it would filter the source tag_array whatever it is. In fact, I would even prefer splitting this one into two:
``` GPtrArray *tm_query_filter(GPtrArray *tag_array, TMQuery *q) ```
and
``` GPtrArray *tm_sort(GPtrArray *tag_array, TMTagAttrType *sort_attr, TMTagAttrType *dedup_attr) ```
- foreach_ptr_array(s, i, q->names)
- {
TMTag **tags;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
if (ntags)
{
g_ptr_array_set_size(ret, ret->len + ntags);
memcpy(&ret->pdata[ret->len], *tags, ntags * sizeof(gpointer));
}
}
if (q->data_sources & TM_QUERY_SOURCE_SESSION_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
Is `q->workspace->global_tags` correct here? You do the same as in the TM_QUERY_SOURCE_GLOBAL_TAGS case above.
- foreach_ptr_array(s, i, q->names)
- {
TMTag **tags;
if (q->data_sources & TM_QUERY_SOURCE_GLOBAL_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
if (ntags)
{
g_ptr_array_set_size(ret, ret->len + ntags);
memcpy(&ret->pdata[ret->len], *tags, ntags * sizeof(gpointer));
}
}
if (q->data_sources & TM_QUERY_SOURCE_SESSION_TAGS)
{
tags = tm_tags_find(q->workspace->global_tags, s->str, s->len, &ntags);
Indeed, copy&paste error. Was correct in the initial commit.
Am 25.08.2016 um 09:25 schrieb Jiří Techet:
This is specifically the type of function I want to avoid. This is not extensible at all, so not a good fit for a plugin API. Also, parameter-heavy functions are generally hard to use correctly (it's sometimes necessary but in this case my proposal gives a better solution IMO).
But yours isn't very consistent in this respect either - sorting parameters are provided as arguments of tm_query_exec() while the rest is provided by the query setters.
It's a bit like with DB queries. Query filters (WHERE clause) apply to the query itself, post-processing (ORDER BY, GROUP BY) applies to the result set.
Or did you mean to suggest that sorting/dedup should be set with a separate function? Well, I designed it such that these two don't affect the result itself, so that a query can re-run with different sorting/dedup. I guess you could also make _exec() parameter-less and export tm_tags_sort() and tm_tags_dedup() separately but this seems less convinient and makes tm_query interface look incomplete (because the two are in the tm_tags* namespace). But I guess I could live with it if necessary.
If you want to keep the setters instead of function parameters it's fine
- this wasn't the important thing in my comment. But I think it would be
much more flexible if it could be applied to an arbitrary tag array instead of the hard-coded global or session tags. There's for instance no way to get tags for the current document only. I would prefer
|GPtrArray *tm_query_filter(GPtrArray *tag_array, TMQuery *q, TMTagAttrType *sort_attr, TMTagAttrType *dedup_attr) |
and it would filter the source tag_array whatever it is. In fact, I would even prefer splitting this one into two:
|GPtrArray *tm_query_filter(GPtrArray *tag_array, TMQuery *q) |
and
|GPtrArray *tm_sort(GPtrArray *tag_array, TMTagAttrType *sort_attr, TMTagAttrType *dedup_attr) |
For my GI use case I must avoid referencing the `workspace->global_tags` and `workspace->tags_array`, because that's what's slowing down the python plugin (it seems to do stuff with getattr() for each element even if no python code actually accesses the elements). This is the main reason for the abstraction via `TM_QUERY_SOURCE_*`
However, I am open to allow the query API to work also on local tag arrays. I could imagine a second constructor like this: |TMQuery *tm_query_new_from_tags(const TMWorkspace *workspace, TMTag **tags);|
Then the remaining TMQuery interface could also be used on that kind of instance. The `workspace` parameter truly really needed but might become handy later on. `tags` would have to be suitably sorted by tag name though.
For your idea to get tags for a specific document, I'd suggest a new filter. |gint tm_query_add_source_file(TMQuery *q, TMSourceFile *source_file);|
Here shows the strength of the TMQuery interface. Constructors and filters can be added painlessly, without breaking existing users.
`WHERE`, `ORDER BY` `GROUP BY` why don't we just replace tagmanager by sqlite like Anjuta did :)
@codebrainz
Then stick a ... as the last parameter, infinitely extensible :)
An actual sensible good trick used by Linux system calls (among others) is to always have a flags parameter as the last thing, then you can add options for stuff like prefix or whatever else could be represented as a bitmask.
Both suggestions aren't type-safe and not compatible to GI. I don't see why we should start playing tricks with type-unsafe functions now?
For my GI use case I must avoid referencing the `workspace->global_tags`
and `workspace->tags_array`, because that's what's slowing down the python plugin (it seems to do stuff with getattr() for each element even if no python code actually accesses the elements). This is the main reason for the abstraction via `TM_QUERY_SOURCE_*`
Isn't it access to individual TMTag elements of the array which is slow rather than just referencing the whole array? The TMTag elements would be accessed inside the C code which should still be fast.
For your idea to get tags for a specific document, I'd suggest a new filter.
`gint tm_query_add_source_file(TMQuery *q, TMSourceFile *source_file);`
I'd say don't add much more now, it can be added if someone needs it.
Here shows the strength of the TMQuery interface. Constructors and
filters can be added painlessly, without breaking existing users.
We probably see different things. I see 350 lines of new code which "does nothing" and what I like about Geany is there's very little code like this and it avoids various adaptors, interfaces and similar design patterns adding lots of "nothing doing" code with very well hidden "something".
Take this as my personal rant only; I have a very bad experience maintaining and evolving an enterprise system consisting of such a code. Geany's code is fortunately very direct and easy read and maintain and adding TMQuery won't cause any big disaster.
Isn't it access to individual TMTag elements of the array which is slow rather than just referencing the whole array? The TMTag elements would be accessed inside the C code which should still be fast.
Unfortunately, it's referencing the whole array which causes the slow down. I profiled the python code, and it's the following assignment (or any other first use) of the array that's problematic: `array = self.geany_plugin.geany_data.app.tm_workspace.tags_array`
self.geany_plugin.geany_data.app.tm_workspace.tags_array
5 dict lookups, no wonder :)
Both suggestions aren't type-safe and not compatible to GI. I don't see why we should start playing tricks with type-unsafe functions now?
Well the `...` was a joke as suggested by the :) and the next sentence starting with "An actually sensible...". Putting a flags parameter isn't any more type unsafe than anything else in C, avoids having lots of boolean parameters, and allows you to customize the function further in the future by adding options that can be represented as flags.
Unfortunately, it's referencing the whole array which causes the slow down.
The usual solution here is to write those parts of the Python code in an extension module. IMO adding a bunch of code to Geany to work around problems with Python/PyGI isn't a great solution.
Am 25.08.2016 um 14:38 schrieb Matthew Brush:
Both suggestions aren't type-safe and not compatible to GI. I don't see why we should start playing tricks with type-unsafe functions now?
Well the |...| was a joke as suggested by the :) and the next sentence starting with "An actually sensible...". Putting a flags parameter isn't any more type unsafe than anything else in C, avoids having lots of boolean parameters, and allows you to customize the function further in the future by adding options that can be represented as flags.
What do you want to achieve with the flags alone? You still need params to say to *which* name/scope/lang/type/whatever to filter for.
Unfortunately, it's referencing the whole array which causes the slow down.
The usual solution here is to write those parts of the Python code in an extension module. IMO adding a bunch of code to Geany to work around problems with Python/PyGI isn't a great solution.
This interface isn't just a workaround for my python problem. It's a new interface for both Geany and plugins because the existing methods to query tags are poor (inflexible and inconsistent), such that I wouldn't want to even export in the first place. For plugins there isn't any method provided by Geany ATM, just the global workspace (with bare tag arrays) is exported.
This interface isn't just a workaround for my python problem. It's a new
interface for both Geany and plugins because the existing methods to query tags are poor (inflexible and inconsistent), such that I wouldn't want to even export in the first place. For plugins there isn't any method provided by Geany ATM, just the global workspace (with bare tag arrays) is exported.
To me it looks more like a solution for your problem (which is fine) than something that would be needed in Geany. The current search functions in TM are indeed single-purpose for specific uses in Geany but there's no problem to grab one of the tag arrays, go through the elements and pick those you are interested in (one for loop, one if inside and this is more flexible than some extra API). The new API seems to be useful just to make the filtering fast in Python (and again, sounds like a fair reason).
My use case is: I'm developing a quickswitch plugin (inside Peasy). The plugin adds a dialog and a keybinding to allow to quickly type the prefix of a tag name to jump to its definition or location. If the prefix is ambigious, it gives a list of candidates so I can chose one. So I press Alt+3, type tm_q, then it opens a list of tm_query_* functions to jump to. If I type tm_query_ex, then it immediately jumps to the definition of tm_query_exec.
I think this is really useful - actually so much it would be nice to have it directly in Geany.
What do you want to achieve with the flags alone? You still need params to say to *which* name/scope/lang/type/whatever to filter for.
It was more a general comment on API design, since you mentioned about making functions extensible. Like if you wanted to add more boolean options or different types of attributes to filter by, etc, you can add those in without breaking the API/ABI.
This interface isn't just a workaround for my python problem.
You said you made it because you couldn't accomplish what you wanted in Python efficiently enough. That's where you usually would write a Python extension module to work around the inefficiencies.
It's a new interface for both Geany and plugins because the existing methods to query tags are poor (inflexible and inconsistent), such that I wouldn't want to even export in the first place.For plugins there isn't any method provided by Geany ATM, just the global workspace (with bare tag arrays) is exported.
I'm not too keen on exporting any TM stuff FWIW. I'd like to see TM/Ctags as some kind of fallback tag handling that can be disabled by plugins who provide better filetype-specific support in the future. At least the one nice thing about the current TM for plugins is that it's a nice thin API that allows plugins to still accomplish whatever they want (KISS).
I'd like to see TM/Ctags as some kind of fallback tag handling that can be disabled by plugins who provide better filetype-specific support in the future.
Wouldn't it be possible to feed the symbol information (e.g. from llvm) to TM (and extend TM to do what's necessary)? Or have symbol info both in TM and LLVM so you can reuse e.g. the symbol tree from Geany and have only autocompletion handled by LLVM?
I'm not familiar with LLVM/clang so I don't know what would be necessary for this and if something like this is doable.
To me it looks more like a solution for your problem (which is fine) than something that would be needed in Geany. The current search functions in TM are indeed single-purpose for specific uses in Geany but there's no problem to grab one of the tag arrays, go through the elements and pick those you are interested in (one for loop, one if inside and this is more flexible than some extra API). The new API seems to be useful just to make the filtering fast in Python (and again, sounds like a fair reason).
Of course it's one solution for my problem. But I made it so that it's useful outside my problem space too.
I simply found that there are no plugin APIs for introspecting Geany's tags, other than giving access to the raw tag arrays. And the existing, un-exported functions are not suitable to become exported. So I felt the need to create a useful method to achieve that, which also solves my specific problem.
I need a solution for my problem, and a generic query interface is a lot better than adding even more special-purpose function and even adding those to the plugin API. My proposed solution can potentially cover so much more use cases that we should be well prepared for the future.
I'm not too keen on exporting any TM stuff FWIW. I'd like to see TM/Ctags as some kind of fallback tag handling that can be disabled by plugins who provide better filetype-specific support in the future. At least the one nice thing about the current TM for plugins is that it's a nice thin API that allows plugins to still accomplish whatever they want (KISS).
TM has *no* API for quering tags, except for the exposure of the raw arrays. Currently exported are methods to add/remove source files from the workspace. IIRC those where specifically added for ProjectOrganizer, in order to add tags for unopened documents.
Also, TM is no fallback. It's simply what we've got to manage our tags. In my opinion we should open it up more, in order to allow plugins to easily get involved in tag management. This includes methods to add tags via non-ctags mechanisms, but also for quering existing tags (from whatever source they may come from).
In fact, plugins shouldn't disable TM to do their own stuff. This will only cause pain since neither Geany itself or other active plugins can function properly in such a situation. Instead, all plugins should cooperatively work on the same tag management system (using proper APIs). I.e. a llvm/clang plugin should add tags to the tagmanager, so that my quickswitch plugin still does what it should do and Geany can show proper autocompletion and symbol sidebar.
You said you made it because you couldn't accomplish what you wanted in Python efficiently enough. That's where you usually would write a Python extension module to work around the inefficiencies.
I could implement a solution specifically for my plugin, somwhere yes. But I prefer a solution where everyone else can benefit as well. Besides, peasy is *not* python specific, it doesn't even use any pygobject APIs or has the necessary build system support for python-specific extension modules. I definitely won't introduce that if there are better alternatives.
@kugel- pushed 1 commit.
576e6a3 tagmanager: Enhance tm_query_add_scope()
Wouldn't it be possible to feed the symbol information (e.g. from llvm) to TM (and extend TM to do what's necessary)?
It surely would be possible, but I'm not sure if TM is up to the job (I don't know enough about it to evaluate that). The main goal is to get proper support for C and C++, which as @elextr always mentions, is quite poor at present (in C it doesn't support pointer dereferences, local types, local variables, parameters, and additionally in C++ it doesn't support template arguments, non-global symbols, lambda parameters or lambda locals, in most languages it switches modes randomly between useful completions and everything under the sun, etc).
I'm not familiar with LLVM/clang so I don't know what would be necessary for this and if something like this is doable.
The API basically that you create a TranslationUnit by parsing source files, and from that you get access to the complete AST (manually walkable or via a visitor API), and then it has special functions made for IDE's that works on the translation units, like it has one "[codeCompleteAt](http://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html#ga50fedfa85d...)" which provides a list of every possible valid thing that could be completed there and still make valid code, or "[tokenize](http://clang.llvm.org/doxygen/group__CINDEX__LEX.html#ga6b315a71102d4f6c95eb...)" which quickly lexes/parses the code providing a list of "tokens" which can be used for to do stuff like semantic-highlighting (eg. unlike currently how Scintilla will highlight all same named tokens alike, you can have the class "string" in namespace std highlighted as a class when used as such or if you name a variable "string", have it coloured as an identifier). It also has all kinds of other useful stuff like an [API for diagnostics](http://clang.llvm.org/doxygen/group__CINDEX__DIAG.html) (eg. to replace the weird regex-based makefile output parser currently used in Geany), and a lot more.
Also, TM is no fallback. It's simply what we've got to manage our tags.
I meant after filetype-plugins providing proper per-language support are implemented, TM would be the fallback.
In my opinion we should open it up more, in order to allow plugins to easily get involved in tag management.
Then when it gets supplanted/replaced/upgraded, any plugin using it breaks, not good.
In fact, plugins shouldn't disable TM to do their own stuff. This will only cause pain since neither Geany itself or other active plugins can function properly in such a situation. Instead, all plugins should cooperatively work on the same tag management system (using proper APIs).
I agree, but I don't think TM is the right API at present for this purpose. I'm just worried we're boxing ourselves in a corner if we expose a lot of the API and then need to make massive sweeping changes to it in order to accommodate useful APIs for plugins to provide these kinds of features.
Besides, peasy is not python specific, it doesn't even use any pygobject APIs or has the necessary build system support for python-specific extension modules.
I meant your Python plugin should provide the extension module, not the proxy plugin loader.
I agree, but I don't think TM is the right API at present for this purpose. I'm just worried we're boxing ourselves in a corner if we expose a lot of the API and then need to make massive sweeping changes to it in order to accommodate useful APIs for plugins to provide these kinds of features.
IMO it's OK to break API if it doesn't happen too often and if there's a good reason for it so this wouldn't worry me much.
I'm just afraid that we won't be able to find a common way to represent both ctags symbols (simple list), clang symbols (tree I guess) or whatever else may appear for any other language. Moreover I guess clang already provides some TM-like API you can use to perform queries so any functionality TM provides would be mostly redundant.
I think it will be necessary to keep them separate and provide e.g. some callbacks plugins can use to override autocompletion or calltips, provide symbols for the sidebar (some common tree-like representation would be necessary here but it shouldn't be too hard to find something) etc. It wouldn't actually be necessary to disable TM and ctags completely - they could still parse the files, you just wouldn't see their tags in the UI.
If it's like this you don't have to worry about what happens in TM - it would be just a parallel system that parses symbols independent of your own.
I agree, but I don't think TM is the right API at present for this purpose. I'm just worried we're boxing ourselves in a corner if we expose a lot of the API and then need to make massive sweeping changes to it in order to accommodate useful APIs for plugins to provide these kinds of features.
Yes, therefore this PR which aims to improve TM's interface so that it can be up to the task. We can freely change and improve TM, and do so in backward compatible ways, just as with the rest of Geany.
But, IMO, we definitely need something in Geany that can marshall tags, whether they come from ctags, some advanced llvm based system or some static file (GNU GLOBAL, anyone?). This way we can ensure that competing plugins pay nice and provide a consistent user interface. We can also change just Geany for new tag-related features and not touch every plugin.
Yes, therefore this PR which aims to improve TM's interface so that it can be up to the task.
We can freely change and improve TM, and do so in backward compatible ways, just as with the rest of Geany.
I'd be surprised if this interface worked well for the libclang backend - I expect its output is something in principle similar to XML DOM tree and you can't capture the tree with a simple array of tags unless you want to give up some information (which I'm afraid will be necessary for e.g. better autocompletion).
I'd be surprised if this interface worked well for the libclang backend - I expect its output is something in principle similar to XML DOM tree and you can't capture the tree with a simple array of tags unless you want to give up some information (which I'm afraid will be necessary for e.g. better autocompletion).
The way of libclang is the same for almost every (library that provides a) compiler front-end. I've never seen one that doesn't represent the parsed code as an abstract syntax tree. If TagManager can't support that properly, then it won't support most languages properly.
So what does this mean for this specific PR? Should the query API take the possibility of an AST into account?
AST represents the entire code not just tags. A tag listing seems to be probably to remain a list. I could imagine that tags become tree-like (via scope relationship) but I'm not sure the query API is concerned. In a tag tree world it would return a list of subtrees, no?
- else
s = g_string_new_len(name, name_len);
- g_ptr_array_add(q->names, s);
- return 0;
+}
+/* Add scope filter to a query
- The query results will be restricted to tags that are subordenates of
- a container scope matching @a scope, for example a C++ class.
- @param q The query to operate on.
- @param scope The name of the container or parent scope.
Is scope just a single name or is it some kind of delimited string (ex. `mynamespace/myclass/myinnerclass/myfield`) ?
- else
s = g_string_new_len(name, name_len);
- g_ptr_array_add(q->names, s);
- return 0;
+}
+/* Add scope filter to a query
- The query results will be restricted to tags that are subordenates of
- a container scope matching @a scope, for example a C++ class.
- @param q The query to operate on.
- @param scope The name of the container or parent scope.
Full, delimited string:
``` namespace Foo { namespace Bar { class Baz { public: static int getA() { return 42; } }; }; }; ``` tag->name -> getA tag->scope -> Foo::Bar::Baz
- else
s = g_string_new_len(name, name_len);
- g_ptr_array_add(q->names, s);
- return 0;
+}
+/* Add scope filter to a query
- The query results will be restricted to tags that are subordenates of
- a container scope matching @a scope, for example a C++ class.
- @param q The query to operate on.
- @param scope The name of the container or parent scope.
Cool, that's a good thing. It might be useful to document that here even if it's not unique to this function. Also `s/subordenates/subordinates/` or could reword it like "...tags which are scoped within the given 'scope'".
So what does this mean for this specific PR? Should the query API take the possibility of an AST into account?
I don't think so.
IMO ctags tags and any other AST-based parser are fundamentally incompatible. There's no way to convert ctags tags to AST - they are more or less flat:
- you only know on which line a tag was defined but if there are two tags on the same line, you don't know which tag was first. This is the only ordering info and it's insufficient to build AST.
- not all parsers generate scope information so you won't have info about what's defined inside another tag.
If you try to do the opposite and convert AST into linear lists of tags, you probably lose the info you need for autocompletion.
This means I don't find it very realistic that TM could ever be used both for ctags and libclang. I believe the main goal of using libclang is to get better autocompletion and calltips (I find symbol tree, tag definition/declaration goto pretty good even when using TM because these don't require much context information). This means we could:
- keep TM on all the time and use it for symbol tree and tag definition goto - plugins will be able to use it all the time (they won't have access to the "better" tags from libclang but I think they are only needed for autocompletion which the plugin that adds the libclang functionality will do anyway) - have some callback mechanism that libclang plugin can register itself for doing autocompletion and calltip popup
I think we should be a bit realistic in what we can do and what we should do - Geany is a simple multi-language IDE and we should concentrate on keeping it that way and improving it. Dedicated language-specific IDEs can always be fine-tuned for the specific language's features but if I wanted this, I'd be using other editor than Geany (which I do when I use Android Studio and Xcode for mobile development). Having some super-generic symbol management system that will be half of the Geany's source code seems to be outside Geany's scope. I have no statistics but I suspect most Geany users use it mostly for scripting languages so we now spend like one year implementing this and only a small percentage of users will use it.
And for me one thing is clear - I won't be working on that myself.
I suspect most Geany users use it mostly for scripting languages so we now spend like one year implementing this and only a small percentage of users will use it.
And for me one thing is clear - I won't be working on that myself.
If only one person uses a plugin feature, but that person implements it, its fine. And nobody is forcing anyone to work on anything. Nobody ever knows where capabilities can lead.
The only request is that people don't resist others making things general enough to allow their different uses.
keep TM on all the time and use it for symbol tree and tag definition goto
These two things are trivial to do, and quite a bit better from plugins if using an actual language support library (libclang, libpython, libvala, etc).
I think we should be a bit realistic in what we can do and what we should do - Geany is a simple multi-language IDE and we should concentrate on keeping it that way and improving it.
Most of things wrong with it could be solved by using proper language support libraries, and if they're provided by plugins, why would anyone not using them care?
Having some super-generic symbol management system that will be half of the Geany's source code seems to be outside Geany's scope.
Uh...that's _exactly_ what TagManager/CTags is right now, and in some cases it doesn't work well at all. Moving such stuff to plugins (maybe a core plugin) would only have the effect of greatly simplifying Geany's core code, leaving it with just a few hooks and farming out language-specific support to plugins, which can do a much better job without bloating or making the core so complex.
I suspect most Geany users use it mostly for scripting languages so we now spend like one year implementing this and only a small percentage of users will use it
That doesn't make much sense, if people are using it for scripting languages, and developers can add proper support for those languages, then the largest percentage of users stand to benefit.
If only one person uses a plugin feature, but that person implements it, its fine. And nobody is forcing anyone to work on anything. Nobody ever knows where capabilities can lead.
No, it isn't. If it adds several thousands of lines to Geany that have to be maintained (and making TM work for both AST and ctags looks to me like a multi-thousand LOC patch which will add lots of complications) then we should really think twice about it. It's not only about writing the patch, it's also about maintaining the code later.
The only request is that people don't resist others making things general enough to allow their different uses.
I'm definitely not trying to prevent anyone doing anything but want to prevent adding too much code that won't be used by many.
These two things are trivial to do, and quite a bit better from plugins if using an actual language support library (libclang, libpython, libvala, etc).
Well, it depends if you want to implement the symbol tree all by yourself or not. If not, you'll either have to map libclang symbols to TMTags or come up with some substitute used by both.
Also I'm not sure how you plan to handle cases where you open a single C file which by itself doesn't compile and which isn't part of any project. I guess libclang won't be able to parse it right? Then TM should be used instead.
Most of things wrong with it could be solved by using proper language support libraries, and if they're provided by plugins, why would anyone not using them care?
Again, I don't know which way you plan to implement it - as I said, I think ASTs are incompatible with the linear tag lists generated by ctags. My feeling is it will require lots of new code on Geany's side, not the plugins side which I indeed don't care about, to make something generic. So which way do you plan to solve it?
Uh...that's exactly what TagManager/CTags is right now, and in some cases it doesn't work well at all. Moving such stuff to plugins (maybe a core plugin) would only have the effect of greatly simplifying Geany's core code, leaving it with just a few hooks and farming out language-specific support to plugins, which can do a much better job without bloating or making the core so complex.
The purpose of https://github.com/geany/geany/pull/1160 and following pull requests is to make ctags identical to upstream ctags at which point the ctags directory will be just external dependency we don't have to maintain in Geany and just grab the upstrem version. Indeed, there's some code in tagmanager which has to be maintained but if we want to keep support of all the languages we have, it will have to stay this way. And no, I don't think it would be a good idea splitting ctags into 30 different plugins and maintain them separately and require users to install all of them if they want multi-language support.
That doesn't make much sense, if people are using it for scripting languages, and developers can add proper support for those languages, then the largest percentage of users stand to benefit.
Again, it depends what way you want to implement it. My impression was you wanted to make some super-generic TM managing all the symbols regardless in what format they are. I don't think it's doable without making the code crazy. If plugins just install hooks for specific features like autocompletion, symbol tree, etc. that should be pretty trivial on Geany side and it shouldn't be any problem.
Well, it depends if you want to implement the symbol tree all by yourself or not. If not, you'll either have to map libclang symbols to TMTags or come up with some substitute used by both.
It's just a GtkTreeView. It could be as simple as Geany setting up the cell renderers and the model and then calling into a plugin to fill in the model.
Also I'm not sure how you plan to handle cases where you open a single C file which by itself doesn't compile and which isn't part of any project. I guess libclang won't be able to parse it right? Then TM should be used instead.
Indeed, there needs a way to inherit or fallback to basic (current) functionality.
Again, I don't know which way you plan to implement it - as I said, I think ASTs are incompatible with the linear tag lists generated by ctags.
I don't really see why. It wouldn't be hard to traverse the AST and generate a tags list, filling in the scope with the correct path/namespaces. IIUC Ctags and not TM is responsible for the broken scope support in Geany.
My feeling is it will require lots of new code on Geany's side, not the plugins side which I indeed don't care about, to make something generic.
The only thing that would need to be added to Geany is some hooks/callbacks/interfaces/whichever to get the plugin to provide the info. All of the complexity would be in the plugins (and not even since most language libraries make this stuff quite trivial to implement).
Indeed, there's some code in tagmanager which has to be maintained but if we want to keep support of all the languages we have, it will have to stay this way.
It doesn't have to stay in the core, it could become a core plugin, but it does have to be on by default and/or provide the baseline functionality. I only have an interest in augmenting the current functionality by plugins, I have no interest in making Geany not work as well out-of-the-box.
And no, I don't think it would be a good idea splitting ctags into 30 different plugins and maintain them separately and require users to install all of them if they want multi-language support.
Just one plugin, part of the core, and on by default.
If plugins just install hooks for specific features like autocompletion, symbol tree, etc. that should be pretty trivial on Geany side and it shouldn't be any problem.
That's the idea. The language support libraries already provide their own specific TM-like APIs, no point in re-inventing that generically when most of the needed features (filling in a treeview, filling in a listview, providing a location of declaration, etc) can be accomplished more easily on the plugin side if they have access to the actual language implementation libraries.
@kugel- sorry for hijacking the comments here, we should probably move unrelated discussion to the mailing list.
It's just a GtkTreeView. It could be as simple as Geany setting up the cell renderers and the model and then calling into a plugin to fill in the model.
Did you have a look at the code? It's pretty non-trivial to update it with changes only so the tree doesn't jump on update...
I don't really see why. It wouldn't be hard to traverse the AST and generate a tags list, filling in the scope with the correct path/namespaces.
...but if you are fine with feeding it with some TMTag-like array, it should be fine.
I indeed believe you should be able to feed the tags into TM. But my impression was you wanted to use only this info from TM to perform autocompletion which I think isn't sufficient.
IIUC Ctags and not TM is responsible for the broken scope support in Geany.
It's partly scope information (using namespace not taken into account) and local variables aren't parsed which limits the autocompletion functionality. In addition, there's no visibility information for symbols so it isn't clear if the symbol is available at the given place (e.g. if corresponding header containing its declaration was imported). And this is something you won't be able to store into the tags and for which you'll probably need the AST for autocompletion.
That's the idea. The language support libraries already provide their own specific TM-like APIs, no point in re-inventing that generically when most of the needed features (filling in a treeview, filling in a listview, providing a location of declaration, etc) can be accomplished more easily on the plugin side if they have access to the actual language implementation libraries.
Sounds good then.
@kugel- sorry for hijacking the comments here, we should probably move unrelated discussion to the mailing list.
Yep, sorry here too - ending now.
I don't see why TM couldn't be improved to support other, AST-based parsers. Sure TM is lacking now but we cna fix that. I don't think it's too much work.
But coming back to this PR. I don't think the proposed query API is affected by the above ideas. It'll always be used to return a list of tags. If the TMTag structure changes for new features or are subtrees instead of plain tags is a different story.
So, what needs to be done to get this merged? @b4n @techee @codebrainz ?
@kugel- makes a good point, no matter what the internals look like, the results of a query are likely to be a list.
I don't see why TM couldn't be improved to support other, AST-based parsers. Sure TM is lacking now but we cna fix that. I don't think it's too much work.
Good, modify tag array merging
https://github.com/geany/geany/blob/master/src/tagmanager/tm_tag.c#L375
to work on trees that is similarly fast and let's talk then ;-).
But coming back to this PR. I don't think the proposed query API is affected by the above ideas. It'll always be used to return a list of tags. If the TMTag structure changes for new features or are subtrees instead of plain tags is a different story.
As I think TM should be used for ctags-like tags, lists should be fine. (They would be insufficient if you needed some AST information, e.g. for code completion.)
Am 29.08.2016 um 17:52 schrieb Jiří Techet:
I don't see why TM couldn't be improved to support other, AST-based parsers. Sure TM is lacking now but we cna fix that. I don't think it's too much work.
Good, modify tag array merging
https://github.com/geany/geany/blob/master/src/tagmanager/tm_tag.c#L375
to work on trees that is similarly fast and let's talk then ;-).
The plugin probably would provide a list, which it generated internally from whatever representation.
But coming back to this PR. I don't think the proposed query API is affected by the above ideas. It'll always be used to return a list of tags. If the TMTag structure changes for new features or are subtrees instead of plain tags is a different story.
As I think TM should be used for ctags-like tags, lists should be fine. (They would be insufficient if you needed some AST information, e.g. for code completion.)
Then TM should be improved to hold sufficient information. auto completion leaves a lot to be desired even for ctags based tags so there's definitely work to do on the TM side.
I wonder what makes you guys think this list vs tree(AST) makes things fundamentally unworkable? It's a complete non-issue to me. It's just a representation that can be converted as needed.
Anyway, as @elextr also said, I think that even if TM would use a completely different internal representation (such as a tree), the query results would most likely still be a list. Similarly to DB queries, which also return flat tables. So the list vs tree discussion becomes orthogonal to this PR.
I wonder what makes you guys think this list vs tree(AST) makes things
fundamentally unworkable? It's a complete non-issue to me. It's just a representation that can be converted as needed.
No, it's not just a representation - there's completely different information stored in AST. AST contains the complete source code in the form of a tree - see e.g. the picture here:
https://en.wikipedia.org/wiki/Abstract_syntax_tree
You could dump the tree and get back the original source code (except things like whitespace). AST contains everything - operators, keywords, control flow structures etc. The shape of the tree also captures things like operator precedence, block nesting etc.
Moreover, the tree will be very language-specific as it really captures the whole syntax of the given language.
ctags doesn't generate AST and it never will because it only serves for symbol indexing. The only output of it is a list of named symbols, nothing else is stored during parsing. ctags always creates just a flat list of tags - there's no need to store them into tree - you won't gain any extra information by this. This won't "improve" TM.
I believe for a really smart autocompletion you need the info from AST and this is so language-specific that it cannot be done in a generic way.
Can we have the list/tree/AST discussion elsewhere and concentrate on the PR?
Can we have the list/tree/AST discussion elsewhere and concentrate on the PR?
Sure (well, there's not much to discuss IMO but you can open a new issue for it if you wish).
Regarding the pull request I don't have much more to say than I said before. I'm not completely thrilled by it but I understand the performance problem you are facing so I think it's fine.
Can we have the list/tree/AST discussion elsewhere and concentrate on the PR?
Well #1195 is about allowing plugins to do that sort of stuff.
Yeah, one of the mailing list threads listed in the "Related" comment in #1195 would be more appropriate place than here.
My main objection to this PR is exposing more of TM to the plugin API, and doing so because of a single plugin language, for a single implementation, which has an already extremely common workaround mechanism builtin to the main implementation for just this case (C-extensions).
If I needed this in a GeanyPy plugin, I'd probably propose to add a helper function/API into it directly to smooth over the language barrier.
I'm developing a quickswitch plugin (inside Peasy). The plugin adds a dialog and a keybinding to allow to quickly type the prefix of a tag name to jump to its definition or location. If the prefix is ambigious, it gives a list of candidates so I can chose one.
This sounds like a generalized case of auto-complete and jump-to-def/decl. I see no reason why it's incompatible with an extended ft-plugin API (as initiated in #1195) and necessarily has to expose TagManager guts to plugins to perform its function.
I don't understand. It uses tags to find the location so, so it makes use of TM. Currently only the plain tag arrays are exported which I find a poor interface for any plugin (not just mine). It's not a ft-plugin if you're suggesting that?
@b4n @codebrainz Please merge this or is there anything left to do?
github-comments@lists.geany.org