After some more thinking about it, we can fix #3454 very easily by ourselves by checking whether the tag originates from a source file with a known/common C/C++ extension - if not, always set "local" to FALSE.
The first patch adds `is_c_source` flag to TMSourceFile to indicate whether the file has one of the known C/C++ extensions, the second patch uses this flag and for C/C++ sources sets "local" to TRUE only when this flag is set (in addition to ctag's isFileScope flag).
@b4n Does this fix look OK to you?
Fixes #3454. You can view, comment on, or merge this pull request online at:
https://github.com/geany/geany/pull/3490
-- Commit Summary --
* Add a flag to TMSourceFile indicating whether it's a C/C++ source file * For C/C++ only mark tag as local if it originates from a source file
-- File Changes --
M src/tagmanager/tm_ctags.c (7) M src/tagmanager/tm_source_file.c (18) M src/tagmanager/tm_source_file.h (2)
-- Patch Links --
https://github.com/geany/geany/pull/3490.patch https://github.com/geany/geany/pull/3490.diff
I'll try to test it properly tomorrow, but the approach, although a bit of a workaround (as it's basically overriding a similar check in uctags), it looks pretty good indeed (esp. with the discussion on the uctags PR).
@b4n commented on this pull request.
- if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
+ { + const gchar **ext; + const gchar *common_src_exts[] = + {".c", ".C", ".cc", ".cp", ".cpp", ".cxx", ".c++", ".CPP", ".CXX", NULL}; + + for (ext = common_src_exts; *ext; ext++) + { + if (g_str_has_suffix(source_file->short_name, *ext)) + { + source_file->is_c_source = TRUE; + break; + } + } + } +
wouldn't it be simpler to have `source_file->is_source`, initialize it to `TRUE` and use it for all languages, and make it `FALSE` for C and C++ if the extension doesn't match any of those?
```C source_file->is_source = TRUE;
if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP) { const gchar *ext = strrchr(source_file->short_name, '.');
if (! ext) source_file->is_source = FALSE; else { int i; const gchar *common_src_exts[] = {"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
for (i = 0; i < G_N_ELEMENTS(common_src_exts); i++) { if (strcmp(ext + 1, common_src_exts[i]) == 0) break }
source_file->is_source = (i < G_N_ELEMENTS(common_src_exts)); } } ``` note: code above is entirely untested.
What about header files, why are they not C/C++ source?
Sometimes we include C files in other C files. Then `static` is not file-local. So we should probably also consider includes (not just files with header extension) or just revert the original change.
Since (IIUC) the cause of #3454 is the intention to limit the symbols visible at any point by using heuristics (originally dodgy uctags ones) to exclude symbols that definitely can't be visible. The "its not the same language so it can't be visible" heuristic is the cause of the OP of #3454 exacerbated by simply using extensions, not the filetype.
As @kugel- pointed out for C/C++ there are situations where actual combinations of files affect visibility, and in reverse just because a `.h` file is open in Geany does not mean all its symbols are available in another C/C++ file but they are considered for autocompletes for example.
This is the effect of the inability of the uctags/Geany to know about the build, without actual dependencies and following includes its simply not possible to do it correctly.
So its necessary to resort to heuristics, and they will never handle all situations. The decision needs to be made to find the most useful tradeoff between how many omitted visible symbols there are vs how many incorrectly visible symbols there are.
This was one of the reasons I moved to Vscode for C++ and I can sort of confirm the problem using Vscode because I notice that when I add another file to a project but before I add it to the build, Vscode (even using an LSP) will give a very similar excessive set of rubbish symbols to uctags/Geany.
Sometimes we include C files in other C files. Then static is not file-local. So we should probably also consider includes (not just files with header extension) or just revert the original change.
That's true, I'm just wondering how such files typically look like and for what purpose they include the sources. One such scenario I can imagine is that you want to split a huge source file into multiple files without some refactoring and create `master_file.c` containing ``` #include "file1.c" #include "file2.c" #include "file3.c" ... ``` In this case, though, this PR isn't a problem as you don't have code in this `master_file.c` so you don't need those static declarations from the individual source files visible there.
Do you have some concrete examples of when including source files is used? In the worst case we can always do https://github.com/geany/geany/pull/3457, it would just be shame to do it because of some really obscure and almost never used scenario.
@techee commented on this pull request.
- if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
+ { + const gchar **ext; + const gchar *common_src_exts[] = + {".c", ".C", ".cc", ".cp", ".cpp", ".cxx", ".c++", ".CPP", ".CXX", NULL}; + + for (ext = common_src_exts; *ext; ext++) + { + if (g_str_has_suffix(source_file->short_name, *ext)) + { + source_file->is_c_source = TRUE; + break; + } + } + } +
``` source_file->is_source = TRUE; ```
Yes, we could do that - makes probably more sense as the `is_source` flag is valid for all languages and not specific to C/C++.
```C if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP) { const gchar *ext = strrchr(source_file->short_name, '.');
if (! ext) source_file->is_source = FALSE; else { int i; const gchar *common_src_exts[] = {"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
for (i = 0; i < G_N_ELEMENTS(common_src_exts); i++) { if (strcmp(ext + 1, common_src_exts[i]) == 0) break }
source_file->is_source = (i < G_N_ELEMENTS(common_src_exts)); } } ``` Does this code offer any benefit compared to what I wrote? I personally find my code a bit easier to read as you avoid the extra branch where `.` is not found and one also has to think a bit when looking at ``` source_file->is_source = (i < G_N_ELEMENTS(common_src_exts)); ``` what exactly it means.
@b4n commented on this pull request.
- if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
+ { + const gchar **ext; + const gchar *common_src_exts[] = + {".c", ".C", ".cc", ".cp", ".cpp", ".cxx", ".c++", ".CPP", ".CXX", NULL}; + + for (ext = common_src_exts; *ext; ext++) + { + if (g_str_has_suffix(source_file->short_name, *ext)) + { + source_file->is_c_source = TRUE; + break; + } + } + } +
Yes, we could do that - makes probably more sense as the `is_source` flag is valid for all languages and not specific to C/C++.
Yeah, the main reason I suggest this would be to make the `init_tag()` check simply `tag->local = tag_entry->isFileScope && file->is_source` rather than having language-specific tests there as well.
Does this code offer any benefit compared to what I wrote? I personally find my code a bit easier to read as you avoid the extra branch where `.` is not found
Merely premature optimization (comparison only compares useful things, and traverse the basename only once), so no, it doesn't really matter.
If it's only a readability issue, you could easily rewrite this ever so slightly like so: ```C source_file->is_source = TRUE;
if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP) { const gchar *ext = strrchr(source_file->short_name, '.');
source_file->is_source = FALSE; if (ext) { const gchar *common_src_exts[] = {"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
for (int i = 0; i < G_N_ELEMENTS(common_src_exts); i++) { if (strcmp(ext + 1, common_src_exts[i]) == 0) { source_file->is_source = TRUE; break; } } } } ```
anyway, either way is fine.
Do you have some concrete examples of when including source files is used?
An example in one of the projects I'm currently doing has platform specific files included like:
```cpp #if defined(LIN) #include "plat_lin.cpp" #elif defined(WIN) #include "plat_win.cpp" #elif defined(MAC) #include "plat_mac.cpp" #endif ```
then the code that uses those in the including file. Other selective includes are the network code (using either Boost++ or cURL) and other libraries where there is more than one solution that can be wrapped to a consistent interface.
Its much neater than lots of `ifdef`s throughout the code.
What about header files, why are they not C/C++ source?
You could come up with a better name for the flag, maybe `most_probably_not_a_file_whose_local_symbols_will_be_visible_to_others`? :grin:
@elextr interesting… I usually see this still using separate source compiled conditionally with a common header defining the interface. But OK, so you have a case where there is no way to know but having complete knowledge of the whole project tree -- which we won't have (well, a plugin like projectorganizer *could* look at all the files, figure out the includes and make a slightly more educated filtering guess, but it'd still be a heuristic without knowing the real build settings and all).
But I agree with @techee, the filtering is useful to lower the amount of irrelevant suggestions in most situations, so we have to find where to draw the line. Or we go the Geany way and add a setting? :smile:
You could come up with a better name for the flag
Yeah, I only realised that the problem was the name of the flag after I posted that.
Based on my following post `semi_random_guess_if_symbols_should_leak_everywhere` :grin:
Hmmm, maybe just simply `globaly_visible`?
But I agree with @techee, the filtering is useful to lower the amount of irrelevant suggestions in most situations, so we have to find where to draw the line.
Agree, see my [long post](https://github.com/geany/geany/pull/3490#issuecomment-1539952982)
Or we go the Geany way and add a setting? :smile:
Oh, how could we resist ...
Based on my following post `semi_random_guess_if_symbols_should_leak_everywhere` grin
Hmmm, maybe just simply `globally_visible`?
[Edit: oops, wrong way round, maybe `local_visibility`]
Actually if we want that to work for all languages, we could make it technical: `trust_file_scope` (which is really what this flags is about: whether we want to trust uctags' `isFileScope`)
Actually if we want that to work for all languages, we could make it technical: trust_file_scope (which is really what this flags is about: whether we want to trust uctags' isFileScope)
Isn't the real question if the Geany flag is a copy of isFileScope or our own heuristic, or some combination of both? For example where does it leave our own heuristic "inter-language symbols not allowed".
Isn't that what causes #3454 where symbols from `*.foo` are not visible in C code?
@elextr no, that's what I though initially, but it's actually because of the `isFileScope` filtering. uctags sets it when it does *not* match a known header extension, which my file didn't. Here we're just taking the more conservative approach of setting it only if it *matches* a known *non-header* extension.
only if it matches a known non-header extension.
Won't that break a whole lot of languages where there is no separation of header and body, where modules from one body file are imported into another body file?
@elextr well, we do that only for C and C++ ATM (see code), but that's basically the reason why I suggest a generic name merely stating how the flag is used rather than what we try to achieve with it (which does not make sense for languages with no header/source separation)
only for C and C++ ATM
And in a heartbeat he drops the obligatory "C++ isn't C" comment.
Although not used much yet, modules have been a thing since C++20, three years ... and the interfaces are in the `.cpp` files not in "header" files.
Not to mention private modules.
And declarations inside an anonymous namespace are internal linkage (implicit static).
@elextr walks away whistling and ducks a brick :grin:
Just to flesh out how C++ modules impact this.
Modules break the assumption that there must be a declaration in a header file for a symbol to be used in another file, the module is exported from a body file and the declarations from the modules are imported by an `import` statement, no header needed as the compiler does the work.
So if we set the filescope/local/whatever flag for `.cpp` files that will make the symbols exported with the module invisible elsewhere, but they should be visible.
Its not just an academic C++ exercise, since this is basically how modules work in most languages if we can solve it for C++ we can extend the heuristic to other languages too.
I've been running this (well, almost… see below :grin:) for a couple days and it seems to work nicely
My "small" patch on top just because: ```diff diff --git a/src/tagmanager/tm_ctags.c b/src/tagmanager/tm_ctags.c index a5cb9531e..16e2ebf04 100644 --- a/src/tagmanager/tm_ctags.c +++ b/src/tagmanager/tm_ctags.c @@ -119,12 +119,7 @@ static gboolean init_tag(TMTag *tag, TMSourceFile *file, const tagEntryInfo *tag
tag->name = g_strdup(tag_entry->name); tag->type = type; - /* ctags sets "isFileScope" also for files with an unknown extension - - * make sure we set "local" only for files with a known C/C++ extension */ - if (file->lang == TM_PARSER_C || file->lang == TM_PARSER_CPP) - tag->local = tag_entry->isFileScope && file->is_c_source; - else - tag->local = tag_entry->isFileScope; + tag->local = tag_entry->isFileScope && file->trust_file_scope; tag->flags = tm_tag_flag_none_t; if (isTagExtraBitMarked(tag_entry, XTAG_ANONYMOUS)) tag->flags |= tm_tag_flag_anon_t; diff --git a/src/tagmanager/tm_source_file.c b/src/tagmanager/tm_source_file.c index 599ae4fa7..e5a27419b 100644 --- a/src/tagmanager/tm_source_file.c +++ b/src/tagmanager/tm_source_file.c @@ -609,20 +609,28 @@ static gboolean tm_source_file_init(TMSourceFile *source_file, const char *file_ else source_file->lang = tm_ctags_get_named_lang(name);
- source_file->is_c_source = FALSE; + source_file->trust_file_scope = TRUE;
+ /* ctags sets "isFileScope" for all C/C++ files without a known header + * extension, but we don't want to use it for files with a less conventional + * extension, not to exclude symbols we shouldn't */ if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP) { - const gchar **ext; - const gchar *common_src_exts[] = - {".c", ".C", ".cc", ".cp", ".cpp", ".cxx", ".c++", ".CPP", ".CXX", NULL}; + const gchar *ext = strrchr(source_file->short_name, '.');
- for (ext = common_src_exts; *ext; ext++) + source_file->trust_file_scope = FALSE; + if (ext) { - if (g_str_has_suffix(source_file->short_name, *ext)) + const gchar *common_src_exts[] = + {"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"}; + + for (guint i = 0; i < G_N_ELEMENTS(common_src_exts); i++) { - source_file->is_c_source = TRUE; - break; + if (strcmp(ext + 1, common_src_exts[i]) == 0) + { + source_file->trust_file_scope = TRUE; + break; + } } } } diff --git a/src/tagmanager/tm_source_file.h b/src/tagmanager/tm_source_file.h index f3ca0cbc8..342d1259c 100644 --- a/src/tagmanager/tm_source_file.h +++ b/src/tagmanager/tm_source_file.h @@ -36,7 +36,7 @@ typedef struct TMSourceFile char *short_name; /**< Just the name of the file (without the path) */ GPtrArray *tags_array; /**< Sorted tag array obtained by parsing the object. @elementtype{TMTag} */ /* Flag indicating whether the file is a C/C++ source (i.e. not a header) based on its extension */ - gboolean is_c_source; + gboolean trust_file_scope; } TMSourceFile;
GType tm_source_file_get_type(void); ```
Just to flesh out how C++ modules impact this. […]
I'll look a tad into modules out of curiosity one day, but does that mean that a kind of header equivalent is generated by the compiler, so that it can be installed and imported?
So if we set the filescope/local/whatever flag for `.cpp` files that will make the symbols exported with the module invisible elsewhere, but they should be visible.
Just looking quickly at https://en.cppreference.com/w/cpp/language/modules suggests that there will be an `export` keyword, making things fairly easy for the C++ parser, doesn't it?
I'll look a tad into modules out of curiosity one day, but does that mean that a kind of header equivalent is generated by the compiler, so that it can be installed and imported?
Presumably somewhere internal to compiler data files somewhere I guess ... oooh, that google is good https://gcc.gnu.org/wiki/cxx-modules#CMI_Location #3245
Just looking quickly at https://en.cppreference.com/w/cpp/language/modules suggests that there will be an export keyword, making things fairly easy for the C++ parser, doesn't it?
Yes, if for that declaration only having an "export" keyword it overrules the file long "local visibility only" ruling made from having a `.cpp` or equivalent extension.
@techee pushed 2 commits.
db5d7e5cac00de7f9e2aee48b05bde5397370dd2 Add a flag indicating whether to trust isFileScope from ctags 2fd242b94c41f9254b2288fdb39a1c20d819b53d For C/C++ only mark tag as local if it originates from a source file
@b4n I've just repushed a hybrid version of the patch - using the `trust_file_scope` name but using `g_str_has_suffix()` as I used in the original patch which I like a bit better (but I don't insist on that one too much if the other version is preferred).
Now the hard stuff - someone has to decide whether to merge this PR or https://github.com/geany/geany/pull/3457. I would lean towards merging this PR and if we start receiving bug reports related to this issue, we can revert to #3457.
Merged #3490 into master.
I've been running the ever so slightly different implementation of this for a long while and it worked nicely for me. Just tested this PR again, and it seems to indeed actually behave the same, so I'm merging it.
github-comments@lists.geany.org