This is a reworked and extended implementation of sorting based on tag presence in header files discussed (and dropped) from #3185. This implementation uses the header tags from ctags so we know precisely which files are included so we can use not only the tags from the source's header file but also tags from other included files. After this change the ordering sequence looks this way:
- sort local vars first (with highest line number first), - followed by tags from current file, - NEW: followed by tags from header, - NEW: followed by tags from other included files, - followed by workspace tags, - followed by global tags
To me, the result seems to improve autocompletion results (it's a bit hard to tell for sure, one can always prepare artificial examples to test where it works better, with the real world it's harder to spot a difference between different orderings unless something is visibly broken).
For reference, I also tried the option to use the includes from the included headers recursively to some depth but the results didn't seem visibly better and were actually slightly worse (by including `<gtk.h>` one got all the GTK symbols which were then fighting with symbols from explicitly included headers that started to appear lower in the list which was often worse). You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/3269
-- Commit Summary --
* Maintain a hash table of TMSourceFiles inside TMWorkspace * Process header tags * Sort autocompletion tags also based on their presence in included files
-- File Changes --
M src/symbols.c (2) M src/tagmanager/tm_parser.c (4) M src/tagmanager/tm_parser.h (3) M src/tagmanager/tm_workspace.c (155) M src/tagmanager/tm_workspace.h (1)
-- Patch Links --
https://github.com/geany/geany/pull/3269.patch https://github.com/geany/geany/pull/3269.diff
@elextr commented on this pull request.
@@ -746,9 +847,12 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
const TMTag *t1 = *((TMTag **) a); const TMTag *t2 = *((TMTag **) b);
- /* sort local vars first (with highest line number first), followed - * by tags from current file, followed by workspace tags, followed by - * global tags */ + /* sort local vars first (with highest line number first),
Please add this list to the manual.
When it says "local" vars, do you mean all locals in the current function, or only those declared before the cursor?
Before I look too deep: So I understand this only affects the sorting but does not add tags*? I previously understood that it adds tags for (private?) members that are declared in class declarations so that they can be autocompleted in the corresponding source file but not in other source files.
*: the new tm_tag_header_t only says "foo.h is included" so that we can later lookup foo.h and import tags from that file, right? So not a real tag but more of a pointer to a TMSourceFile?
Adding to @kugel- question, my understanding is that it only looks up include files that are open or loaded via tagfiles, it won't open any extra files, is that right?
@techee commented on this pull request.
@@ -746,9 +847,12 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
const TMTag *t1 = *((TMTag **) a); const TMTag *t2 = *((TMTag **) b);
- /* sort local vars first (with highest line number first), followed - * by tags from current file, followed by workspace tags, followed by - * global tags */ + /* sort local vars first (with highest line number first),
IMO this is not something that should be documented. For users this whole thing is "geany does its best to place more relevant symbols to the top of the list" and how Geany does this is an implementation detail. You won't really notice this logic in practice, just that the symbol you want to type (hopefully) appears at or near the top of the list. If we happen to add support for LSP, each server will have its own logic which will be totally different from what we use in Geany now.
Before I look too deep: So I understand this only affects the sorting but does not add tags*?
Yes (except for the header tags themselves for every include, see below).
*: the new tm_tag_header_t only says "foo.h is included" so that we can later lookup foo.h and import tags from that file, right? So not a real tag but more of a pointer to a TMSourceFile?
Adding to @kugel- question, my understanding is that it only looks up include files that are open or loaded via tagfiles, it won't open any extra files, is that right?
For every include in the parsed file such as ``` #include <some_path/some_header.h> ``` the header tag generates tag whose name is ``` some_path/some_header.h ``` This is completely independent of what files we have open, it can be whatever file name. It's up to us to match this file name to the open `TMSourceFile` files. This is what the first patch serves for - it maintains a hash table ``` file_name -> List<TMSourceFile> ``` so we can take the header tag and quickly look up the corresponding `TMSourceFile`. The `List` is there because there may be multiple identically named files opened from different paths - it's hard to make any guess about the paths since we don't know which directories are searched for the includes by the build system. When we don't find any corresponding `TMSourceFile`, the file isn't open in Geany and we don't do anything for that include. The `header_tag -> TMSourceFile` lookup happens dynamically when performing autocompletion and it isn't stored anywhere.
I previously understood that it adds tags for (private?) members that are declared in class declarations so that they can be autocompleted in the corresponding source file but not in other source files.
No, it doesn't (and didn't) add any tags, it serves for sorting purposes only but in two different situations where in (2) it affects what results the user gets:
1. Sorting the tags in non-scope autocompletion popup - this just affects the order in which the user sees the tags where I think it makes sense to see tags from included files before other project files. 2. Sorting the candidate tags which are searched for members in scope autocompletion. Let's say the user has just written ``` name. ``` and expects we show a list of `name` members. Some tag names are very common and for instance in this case, I can imagine many classes have the `name` member possibly each of a different type (and different members) so it's important we pick the right `name` tag. When we sort all the `name` tags so we first check the tags from the header file, we have a good chance of showing the right members than some `name` from a totally different class unrelated to the current file.
Before I forget, thinking about it now, we should probably store the header tags to the generated tag files too as they can be useful for global tags as well. I won't do it now (so we don't have to fight with modified unit tests if some rebases are needed), we should just keep it in mind.
@techee commented on this pull request.
@@ -69,7 +69,7 @@ static GHashTable *subparser_map = NULL;
{'u', tm_tag_union_t}, /* union */ \ {'v', tm_tag_variable_t}, /* variable */ \ {'x', tm_tag_externvar_t}, /* externvar */ \ - {'h', tm_tag_undef_t}, /* header */ \ + {'h', tm_tag_header_t}, /* header */ \
It's called `header` in ctags but maybe we should call it `tm_tag_include_t` to make it clear this is something that's generated for every `#include` directive and not for every header file in project.
@techee "Yes" would have done ;-) but thanks for being explicit.
As for the documentation, yeah, "sorted by likely relevance" should do, just something to point out its not simply alphabetic.
Also I assume the header tags are not followed recursively, so symbols in includes included are not sorted higher? (not asking for it, keep this PR simple he says quickly :-)
@techee commented on this pull request.
@@ -746,9 +847,12 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
const TMTag *t1 = *((TMTag **) a); const TMTag *t2 = *((TMTag **) b);
- /* sort local vars first (with highest line number first), followed - * by tags from current file, followed by workspace tags, followed by - * global tags */ + /* sort local vars first (with highest line number first),
When it says "local" vars, do you mean all locals in the current function, or only those declared before the cursor?
Currently it's those before the cursor.
@elextr commented on this pull request.
@@ -746,9 +847,12 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
const TMTag *t1 = *((TMTag **) a); const TMTag *t2 = *((TMTag **) b);
- /* sort local vars first (with highest line number first), followed - * by tags from current file, followed by workspace tags, followed by - * global tags */ + /* sort local vars first (with highest line number first),
Currently it's those before the cursor.
Ok, its fine for C, if locals only and sorting only its ok for C++ so long as no other declarations (eg members) that happen after the cursor are not excluded from the autocomplete (even if they get sorted later).
Also I assume the header tags are not followed recursively, so symbols in includes included are not sorted higher? (not asking for it, keep this PR simple he says quickly :-)
See the last paragraph in my first post ;-). It wasn't much more complicated to implement, it just didn't seem to produce better results. Maybe if it were implemented as a separate sort step ``` ... - followed by tags from header, - followed by tags from other directly included files, - followed by tags from indirectly included files, ... ``` so it wouldn't compete with symbols from directly included files, it would work better but I'm not sure if it wouldn't complicate things too much and if the benefits are worth it.
See the last paragraph in my first post ;-).
Ahh.
I was thinking of where there was a "single include" situation where the direct include file does nothing much more than include others, but yeah it could be less good in other situations. Lets keep it simple for now.
I was thinking of where there was a "single include" situation where the direct include file does nothing much more than include others, but yeah it could be less good in other situations. Lets keep it simple for now.
Yeah, I was thinking that too but then the big fat headers like `gtk.h` "killed" humble project header files by the sheer amount of symbols they declare. But haven't tried the extra sort step which should mitigate this problem. But yeah, I agree this is something for a possible subsequent PR.
I'm wondering it it's really appropriate for non-scope auto-completion. If I hack sidebar.c, why would sidebar.h symbols be more relevant than gtk_* symbols?
For scope auto-completion (this->) it maybe a different story.
I'm wondering if it it's really appropriate for non-scope auto-completion. If I hack sidebar.c, why would sidebar.h symbols be more relevant than gtk_* symbols?
Both are equally relevant but the question is when the autocompletion starts returning useful results. For GTK it will be only after you type a sufficiently long prefix like `gtk_something_` when the symbol you want starts appearing. You definitely won't get anything useful when you just type `g`. On the other hand, if you want something from `sidebar.h` that starts with `g`, just typing the single `g` might already give you what you want. But when this is mixed with all the symbols from GTK, this symbol from `sidebar.h` gets lost in all the other GTK symbols and you won't get useful results after typing the `g`. In short: 1. This doesn't improve (but also doesn't worsen) anything when you want a GTK symbol 2. But improves autocompletion when you want a symbol from your own sources
@techee commented on this pull request.
@@ -746,9 +847,12 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
const TMTag *t1 = *((TMTag **) a); const TMTag *t2 = *((TMTag **) b);
- /* sort local vars first (with highest line number first), followed - * by tags from current file, followed by workspace tags, followed by - * global tags */ + /* sort local vars first (with highest line number first),
@elextr We have already been there :-)
https://github.com/geany/geany/pull/3175#issuecomment-1107821608
I haven't changed anything about this in this PR, it works as before.
@elextr commented on this pull request.
@@ -746,9 +847,12 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
const TMTag *t1 = *((TMTag **) a); const TMTag *t2 = *((TMTag **) b);
- /* sort local vars first (with highest line number first), followed - * by tags from current file, followed by workspace tags, followed by - * global tags */ + /* sort local vars first (with highest line number first),
Never rely on my umm, you know thingy, umm, oh yeah ... memory!!
@kugel- your comment is appropriate for C, but not for C++ (have I mentioned C++ is NOT C).
To repeat my comment from #3175, C may only have minimal declarations in headers, but C++ typically has far more, especially when using generic programming (ie templates), because the body needs to be available when the template is used.
For example Geany `.h` files are 20% of the `.c` files, Scintilla (which does not use templates) `src/*.h` is 30% of `src/*.cxx` but some of the headers in `includes` are also relevant so it would really be more, or one of my projects that uses templates the `*.hpp` files are nearly 50% of the `*.cpp` files.
So for C++ more names come from the associated headers (rather than random general headers) than is the case with C, and as @techee explained above the heuristic moves these names up the list, making it more useful.
I understand that for C++ scope-completion is more relevant, and for that I can understand the idea as you often do `ClassName::` or `this->`
Anyway I won't argue further if you both agree it's a good idea. My worry is that all of Geany's headers (except sidebar.h) are penalized when editing sidebar.c and I have no reason to assume that I want sidebar.h symbols more than other Geany symbols (might even be that the opposite is true, not sure).
I understand that for C++ scope-completion is more relevant, and for that I can understand the idea as you often do `ClassName::` or `this->`
I have to correct you there, as my edit above said, C++ programmers almost __never__ do `this->` (grep Scintilla source if you don't believe me), scope completion is _not_ more relevant to C++, member names are most often _not_ qualified. Please forget the Python idea of `self.xxx`, it does not apply to C++.
Edit: also just to point out, nobody using C++ writes this-> so member names are not scope autocompletes.
Apart from that, `this` isn't recognized or handled by scope autocompletion in any way right now.
My worry is that all of Geany's headers (except sidebar.h) are penalized when editing sidebar.c and I have no reason to assume that I want sidebar.h symbols more than other Geany symbols (might even be that the opposite is true, not sure).
Since tags from the current file are already placed above other symbols, you won't see much difference because `symbols.h` (assuming you are editing `symbols.c`) just contains declarations of the functions from `symbols.c` which are already placed above. However in C++ headers also contain members which are not declared in the source files and members are accessed frequently in C++ code and you want non-scope autocompletion for these more often than some random members from other header files.
Also note that the "penalization" already happens, it's just in the form of alphabetic ordering and more or less random in terms of what would be useful for the user. For instance, when you have symbols `ga`, `gb`, and `gz` in your project declared in some header files and for simplification we assume that all gtk symbols start with the `gtk_` prefix, after typing `g` without this PR you get an autocompletion popup containing ``` ga gb gtk_... gtk_... ``` Here, `ga` and `gb` "won" and gtk symbols were penailized, and `gz` was just killed by the amount of gtk symbols and has no chance to even appear in the list. After this PR, assuming the current file includes the headers containing these symbols, you get ``` ga gb gz gtk_... gtk_... ``` GTK gets penalized but to get the symbol you want, you'll have to type much longer prefix anyway so it isn't a problem.
@techee pushed 3 commits.
e441e514d4810aa0d3bf70487c7b10f64745608f Maintain a hash table of TMSourceFiles inside TMWorkspace 8289fc4b81cb7fc046f45c96c03ffe53158225a5 Process header tags dd985447f7c92777f746e473d06c4dfa72b4e868 Sort autocompletion tags also based on their presence in included files
Just repushed to fix a small conflict in `tm_workspace_is_autocomplete_tag()` with the recently merged #3268.
@techee pushed 2 commits.
c17c7fb34c9e4fe8a05609dc8a4b0af2759e2cd5 Process header tags 7d56b6fccd44a4bbf4c753cdd25593dd33fdcc78 Sort autocompletion tags also based on their presence in included files
@kugel- requested changes on this pull request.
I might have overlooked, but is there something that prevents this logic to be applied to header files (if such avoidance even wanted)?
I can imagine a header file foo.h that include a identically-named file, either using `#include_next <foo.h>` or with include path trickery.
Actually thinking about it, it might even make sense to sort tags of the corresponding *source* file earlier, when editing a header file. For example when I want to create a declaration in a header file it's likely to be a function that might be already implemented in the corresponding source file.
@@ -742,9 +797,54 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const char *name,
}
+/* fill 'includes' with the included files except the header corresponding + * to the current file which is returned in 'header' */ +static void fill_includes(TMSourceFile *source, TMSourceFile **header, GHashTable *includes)
I would find it idiomatic to make this `get_includes()` that returns the included files list.
Also, I find it strange that the corresponding header is excluded from the included files list. I think it wouldn't hurt because `sort_found_tags()` is special-casing the corresponding header anyway and might make this function be useful in other situations as well.
+ for (i = 0; i < headers->len; i++) + { + TMTag *hdr_tag = headers->pdata[i]; + gchar *hdr_name = g_path_get_basename(hdr_tag->name); + GPtrArray *tm_files = g_hash_table_lookup(theWorkspace->source_file_map, hdr_name); + + if (tm_files && tm_files->len > 0) + { + TMSourceFile *hdr = tm_files->pdata[0]; + gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); + + if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) + *header = hdr; + else + g_hash_table_add(includes, hdr);
I'm missing hash collision detection. The ptr array `source_file_map` might contain `TMSourceFile`s with different basenames that have the same hash value. So here you'd need to check if the TMSourceFile matches with the header tag.
@@ -768,6 +871,16 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
return -1; else if (t2->file == info->file && t1->file != info->file) return 1; + else if (info->header_file && + t1->file == info->header_file && t2->file != info->header_file)
Err, if t1 is the corresponding header, t2 can't be (can it?). The `&& t2->file ...` part confuses me and could be left out, right?
Same below.
@techee commented on this pull request.
@@ -742,9 +797,54 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const char *name,
}
+/* fill 'includes' with the included files except the header corresponding + * to the current file which is returned in 'header' */ +static void fill_includes(TMSourceFile *source, TMSourceFile **header, GHashTable *includes)
I would find it idiomatic to make this get_includes() that returns the included files list.
This is a relict of when I experimented with running this function recursively. Since it didn't work very well, it indeed looks better as you describe.
Also, I find it strange that the corresponding header is excluded from the included files list. I think it wouldn't hurt because sort_found_tags() is special-casing the corresponding header anyway and might make this function be useful in other situations as well.
Can do.
@techee commented on this pull request.
@@ -768,6 +871,16 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
return -1; else if (t2->file == info->file && t1->file != info->file) return 1; + else if (info->header_file && + t1->file == info->header_file && t2->file != info->header_file)
I'm not sure if I understand the question correctly but ``` t1->file == info->header_file && t2->file != info->header_file ``` says that when we have two tags, `t1` and `t2`, and `t1` comes from `header_file` and `t2` does not come from this header file, then `t1` should be sorted in front of `t2`. The initial check for `info->header_file` non-NULL value is there because for global tags `t1->file` will also be NULL and then this check would do a wrong thing.
I might have overlooked, but is there something that prevents this logic to be applied to header files (if such avoidance even wanted)?
There is nothing like that. I wanted to avoid hard-coding header file extensions and without this we can't say if the current file is a header or a source.
I can imagine a header file foo.h that include a identically-named file, either using #include_next <foo.h> or with include path trickery.
This case isn't currently handled (not sure if it's common enough and also the consequences won't be too bad). Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".
Actually thinking about it, it might even make sense to sort tags of the corresponding source file earlier, when editing a header file. For example when I want to create a declaration in a header file it's likely to be a function that might be already implemented in the corresponding source file.
Makes sense, we'd just have to do this reverse-include thing to lookup sources. To find headers, we can use the header tags for includes from ctags but we don't have anything in the opposite direction.
@kugel- commented on this pull request.
@@ -69,7 +69,7 @@ static GHashTable *subparser_map = NULL;
{'u', tm_tag_union_t}, /* union */ \ {'v', tm_tag_variable_t}, /* variable */ \ {'x', tm_tag_externvar_t}, /* externvar */ \ - {'h', tm_tag_undef_t}, /* header */ \ + {'h', tm_tag_header_t}, /* header */ \
sounds good
@kugel- commented on this pull request.
@@ -768,6 +871,16 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
return -1; else if (t2->file == info->file && t1->file != info->file) return 1; + else if (info->header_file && + t1->file == info->header_file && t2->file != info->header_file)
Right. Tags are compared, not files. Nevermind.
Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".
I think this assumption easily breaks down on larger (maybe proprietary) code bases. Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR.
Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".
I think this assumption easily breaks down on larger (maybe proprietary) code bases. Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR.
Agree that include path clashes is not uncommon, especially for portable software that has multiple backends, and therefore multiple versions of the same include. Can we use the path instead of just the name, presumably that is handled since Geany can open two files of the same name and symbols don't get mixed up (hopefully can't try ATM). That should be unique unless a global tags file overlaps local files.
Also agree that this PR should be kept simple, just make an issue for remembering it and do later.
I think this assumption easily breaks down on larger (maybe proprietary) code bases.
Yes, though the probability you are editing a file affected by this is probably low.
Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR.
That would be one possibility (though maybe a bit an overkill). I was thinking we could use a path-based heuristic assuming that related headers are stored closer to the corresponding sources on the file system. For instance in ``` /A/B/C1/D/foo.h /A/B/C/D/foo.h /A/B/C/D2/foo.c ``` the header for `foo.c` would be `/A/B/C/D/foo.h` because the identical path prefix (3 directories) is longer than for `/A/B/C1/D/foo.h` (2 directories).
Agree that include path clashes is not uncommon, especially for portable software that has multiple backends, and therefore multiple versions of the same include. Can we use the path instead of just the name, presumably that is handled since Geany can open two files of the same name and symbols don't get mixed up (hopefully can't try ATM).
If you mean what I said above, I agree. But note that we can't know exactly what header at what path to use. We only see includes like ``` #include "header.h" ``` but we don't know their exact path (which may be determined by e.g. `gcc -I include_path`)
but we don't know their exact path
We must have known a path to get symbols from the file?
We must have known a path to get symbols from the file?
We are talking about the situation where you have ``` #include "header.h" ``` in your source file and 2 identically named headers at paths ``` /A/B/C1/header.h /A/B/C2/header.h ``` Symbols from which of these headers should be placed to the top of the autocompletion list when editing your source file?
Oh, I thought you were saying we didn't know the paths. Ok, clearly the best is:
``` /A/B/C1/include/header.hpp ``` :smile:
In other words we can't know how the project is organised, thats a known limitation of Geany's version of "lightweight" IDE, so I doubt there is a simple heuristic that will improve some organisations without making others worse, let alone improve all cases.
project/windows_backend/header.hpp: ``` class Something { windows_something_handle h; public: Something(); do_something(); ~Something(); }; ```
project/linux_backend/header.hpp: ``` class Something { linux_something_file_descriptor h; public: Something(); do_something(); ~Something(); }; ```
What type do we use for `h` when writing `Something::do_something()`?
I guess an associated question would be what would happen if both the declarations of `Something` are in the same file with an `#ifdef` deciding between them?
Ok, clearly the best is:
/A/B/C1/include/header.hpp
Ok, let's hard-code that path and the problem's solved forever ;-).
With your `Something` backend example the problem isn't so big because at least the member names are identical so non-scope autocompletion is going to work (of course scope autocompletion for `h` could be a problem).
What might be a bigger problem is some `utils.h` created at various places with different utility functions for say different submodules. Here the path-based heuristic I suggested or the @kugel- 's tag-similarity-based heuristic might work.
Anyway, the worst case that happens is that tags won't be sorted in the most optimal way which can however happen for many other reasons so right now I'd suggest to just keep the implementation simple and possibly improve it in the future.
I guess an associated question would be what would happen if both the declarations of Something are in the same file with an #ifdef deciding between them?
ctags extracts tags from both branches.
so right now I'd suggest to just keep the implementation simple and possibly improve it in the future.
:+1:
ctags extracts tags from both branches.
Ok, so the `Something` problem already exists, so its a separate problem not something (pun intended :-) changed by this PR.
@techee pushed 2 commits.
c00ff48162a942dfd22553497e75c9b8368debc9 Rename tm_tag_header_t to m_tag_include_t 18ba0438080affcb04e7dafa55004bd12a11ac4d Clean up fill_includes()
Before I forget, thinking about it now, we should probably store the header tags to the generated tag files too as they can be useful for global tags as well. I won't do it now (so we don't have to fight with modified unit tests if some rebases are needed), we should just keep it in mind.
Thinking about it again, this won't work - for global tags we don't have file names stored for individual tags so having names of included headers won't help us. So leaving this as it is.
@kugel- I made the changes you requested in the latest commit. Is this all you requested or did I forget about something?
Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier? At least this is easy to explain (e.g. in the manual) and we won't get the wrong header (unless it's named completely different, of course).
@kugel- approved this pull request.
LGBI, not tested myself!
Might squash the last two commits into earlier ones but it's your call.
Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier?
Good idea, can do, shouldn't be too complicated.
At least this is easy to explain (e.g. in the manual) and we won't get the wrong header (unless it's named completely different, of course).
I personally wouldn't explain anything to users :-). Basically, from user's perspective, the "AI" in Geany does its best to sort tags so the most relevant ones come at the top and that's it.
Might squash the last two commits into earlier ones but it's your call.
Yeah, that's the plan.
I personally wouldn't explain anything to users :-). Basically, from user's perspective, the "AI" in Geany does its best to sort tags so the most relevant ones come at the top and that's it.
Despite @kugel-'s documentation allergy ;-) we do need to note that this happens to avoid regular issues being raised for the list being out of alphabetical order, but I agree we can skip explaining the algorithm.
Despite @kugel-'s documentation allergy ;-)
Not @kugel-'s, mine :-). To be specific, I really think this is implementation detail that users shouldn't worry about.
we do need to note that this happens to avoid regular issues being raised for the list being out of alphabetical order, but I agree we can skip explaining the algorithm.
We already did it in the previous PR: https://github.com/geany/geany/commit/a91a9c6ca17e5a7a163b095d5a9065afa7e8af...
We already did it in the previous PR:
I have mentioned my poor, erm, thingy, umm, oh yeah, memory!!
So fine, nothing to do.
@techee pushed 1 commit.
f845aad49016f473102a35797797d0956548b13d Use all header candidates when sorting tags in autocompletion
Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier?
@kugel- Done in the last commit.
@kugel- commented on this pull request.
- if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) - *header = hdr; - g_hash_table_add(includes, hdr); - g_strfreev(hdr_comps); + for (j = 0; j < tm_files->len; j++) + { + TMSourceFile *hdr = tm_files->pdata[j]; + gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); + + if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
I think this can be improved.
We're called for a foo.c and loop over all includes here. We have the file name already, and then lookup the `TMSourceFile` (and then for every `TMSourceFile` we check if the file name is foo.h).
Instead, we can already check for foo.h earlier (when looking up the `TMSourceFile`). And then we can return a copy (or ref?) of the `GPtrArray` as `header_candidates` instead of creating a new list. Something like this:
``` GPtrArray *tm_files = g_hash_table_lookup(theWorkspace->source_file_map, hdr_name); if (g_strcmp0(hdr_name, src_comps[0])) ```
Perhaps it would make sense to separate this part out of `get_includes()` and make a distinct `g_hash_table_lookup()` for the expected header name? Hm but then it wouldn't depend on foo.c actually including foo.h (could also be seen as a good thing, not sure)
@techee pushed 1 commit.
427c5476f244555eb78f6d043b8c11f71e61d900 Use GPtrArray for candidate headers to avoid extra list allocation
@techee commented on this pull request.
- if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) - *header = hdr; - g_hash_table_add(includes, hdr); - g_strfreev(hdr_comps); + for (j = 0; j < tm_files->len; j++) + { + TMSourceFile *hdr = tm_files->pdata[j]; + gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); + + if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
I'm not sure if I understand completely what you suggest but it's true we can return the GPtrArray directly and avoid creation of the list - I just made a new commit doing this. Is this what you had in mind?
Note that the inner loop should still be there as we should also use all the includes of the same name similarly to the header file.
Separating extraction of header candidates to a separate function would be possible but it will probably lead to a lot of duplicated code.
@techee commented on this pull request.
- if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) - *header = hdr; - g_hash_table_add(includes, hdr); - g_strfreev(hdr_comps); + for (j = 0; j < tm_files->len; j++) + { + TMSourceFile *hdr = tm_files->pdata[j]; + gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); + + if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
Ah, `g_ptr_array_find()` was introduced in glib 2.54 which we don't have. Wouldn't be hard to implemented, it's jut an extra complication and maybe using the list is a simpler solution then (despite some extra allocation which I think won't matter much in this case).
@kugel- commented on this pull request.
- if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) - *header = hdr; - g_hash_table_add(includes, hdr); - g_strfreev(hdr_comps); + for (j = 0; j < tm_files->len; j++) + { + TMSourceFile *hdr = tm_files->pdata[j]; + gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); + + if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
I mean you can return the GPtrArray right after `g_hash_table_lookup(theWorkspace->source_file_map, hdr_name)`, before the inner loop. The GPtrArray has all headers with the same basename already and the inner loop only compares the basename once more.
The inner loop still adds to the `includes` hash table but it doesn't need to deal with the header candidates.
Ah, g_ptr_array_find() was introduced in glib 2.54 which we don't have
All files in the GPtrArray have the same base name, so you just need to compare the basename against `->pdata[0]`
FWIW, we will probably bump GTK dependency to 3.24 soon and then we also get glib 2.54+ (which is a bit older isn't it?).
@techee pushed 1 commit.
d6e849955c651128ac63ff8921c41aed38c207ab Move header_candidates assignment outside the inner loop
@techee commented on this pull request.
- if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) - *header = hdr; - g_hash_table_add(includes, hdr); - g_strfreev(hdr_comps); + for (j = 0; j < tm_files->len; j++) + { + TMSourceFile *hdr = tm_files->pdata[j]; + gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); + + if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
The inner loop still adds to the includes hash table but it doesn't need to deal with the header candidates.
Yeah, got it now, fixed in the last commit.
FWIW, we will probably bump GTK dependency to 3.24 soon and then we also get glib 2.54+ (which is a bit older isn't it?).
That would fix the problem. 2.54 is from 2017.
@techee pushed 1 commit.
a7bc7fcb5e68569b559a02b9edb2365460ee087f Add implementation of g_ptr_array_find() for glib < 2.54
@techee pushed 1 commit.
13f254cecf270bafbde39848d0fd25f5ac8ca22e Add implementation of g_ptr_array_find() for glib < 2.54
@kugel- commented on this pull request.
{
guint j;
- for (j = 0; j < tm_files->len; j++) + if (!*header_candidates)
There isn't any caller that passes non-NULL, right? I would define `get_includes()` such that it always writes to `*header_candidates` (unless there are none).
@kugel- commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
I see this only now. Doesn't that break on files with multiple dots in name? I thought we want to compare the base name without extension, i.e. up until the last dot.
@techee commented on this pull request.
{
guint j;
- for (j = 0; j < tm_files->len; j++) + if (!*header_candidates)
The check ``` if (!*header_candidates) ``` is to avoid doing the comparisons below if we already set `header_candidates` in the previous iteration. Also notice `*header_candidates = NULL;` at the beginning of the function.
What you describe would be ``` if (!header_candidates) ``` (i.e. without the dereference)
@techee commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
Yes, you are right. Well, I could `g_strdup()` the string, use strrchr() to find the `.` and replace it with `\0` which would give us the base name. Or is there some simpler solution?
@kugel- commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
I think I would use `g_path_get_basename()` instead of `g_strdup()`. But it doesn't matter much so yea I don't know a simpler solution.
@kugel- commented on this pull request.
{
guint j;
- for (j = 0; j < tm_files->len; j++) + if (!*header_candidates)
You are right, I didn't notice the outer loop. So I guess the idea is to avoid multiple assignments to `*header_candidates` in case a file is included multiple times (maybe legitimately, via different directories)?
So I think this is alright now.
@elextr commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
IIUC the intention here is to get the filename without the last extension, `g_path_get_basename()` does not do that. But `utils_remove_ext_from_filename()` does.
Note that there are other `strsplit()`s above that may need changing.
@kugel- commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
I know that. `g_path_get_basename()` (or `g_strdup()`) must be combined with `strrchr()` and setting the last dot to `NUL` as per @techee suggestion. `utils_remove_ext_from_filename()` does that and is a good candidate (though I'm not sure if @techee wants to use Geany utility function in tagmanager code).
@elextr commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
I'm not sure if @techee wants to use Geany utility function in tagmanager code
AFAIK there is no "upstream" for tagmanager, and this PR is modifying it anyway, so why not.
@techee pushed 1 commit.
7cc5d522bb4159f8faa7f592359f7254245c56d1 Use strrchr() to strip file extension instead of g_strsplit()
@techee commented on this pull request.
{
guint j;
- for (j = 0; j < tm_files->len; j++) + if (!*header_candidates)
So I guess the idea is to avoid multiple assignments to *header_candidates in case a file is included multiple times (maybe legitimately, via different directories)?
This is just an optimization so the code inside the `if` block doesn't have to be executed if we already found `header_candidates`.
@techee commented on this pull request.
{
- TMSourceFile *hdr = tm_files->pdata[j]; + TMSourceFile *hdr = tm_files->pdata[0]; gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
I'm not sure if @techee wants to use Geany utility function in tagmanager code
AFAIK there is no "upstream" for tagmanager, and this PR is modifying it anyway, so why not.
No problem with using Geany utility functions in TM, I just think that we should do some bigger Geany-fication of TM code separately (once the TM code stabilizes and there are no bigger PRs pending - which will hopefully be soon). So for now I just used the `strrchr()` in the fix. `g_path_get_basename()` gets called previously already so we don't have to do that here.
which will hopefully be soon
@elextr gazes at sky and whistles ... :grin:
@kugel- approved this pull request.
LGTM
Should I leave the PR as it is or rebase/squash some of the commits first?
Squashing makes sense. You could probably even reduce to the 3 initial commits as the other ones are fix-ups that popped up during review.
@kugel- Yes, that's what I thought, done.
So unless there are any objections, I'll merge this PR in about a week.
Great
Merged #3269 into master.
github-comments@lists.geany.org