Le 17/04/2012 18:20, Nick Treleaven a écrit :
Hi, How's it going?
Hi,
Sorry for the long delays -- and also small activity -- recently. I have/had a lot of non-Geany stuff to do and stuff, the whole story, you know.
Lex mentioned in this mail: http://lists.uvena.de/pipermail/geany/2012-April/007991.html
that (according to him) you are working to 'replace it'. Maybe he's exaggerating, but this sounds interesting. If you have time could you maybe send me a link to the IRC log, or better, explain a little about the work on the devel list?
That's true, I have a WIP on writing a new ctags management code. It is far from being ready for production, and I'm not even sure the ideas in it are really appropriate. But yes, there is a something :)
I finally committed it and pushed it so you can see it, comment, blame, flame & more: see https://github.com/b4n/geany/tree/wip%2Fctagsmanager
A few points, as they comes to my mind:
* it is under the new ctagsmanager/ directory; * it uses the same tag parsers tagmanager used, in ctagsmanager/ctags; * it support asynchronous parsing (though not concurrent parsing); * all types have different names than the tagmanager ones, though currently the API is almost an 1:1 mapping -- and that's maybe a huge mistake?; * there is 2 backends as of today: - a "simple" one that is simple and that doesn't waste (too much) memory, but that searches in O(n); - a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
In practice I haven't yet tested anything big enough to see any difference in performances between these two backends, and that's something that probably isn't worth bothering about for now.
* this "backend" abstraction might be really overkill, and maybe we could do better without it?; * tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend); * tag matching/finding is done using ctm_data_backend_find() (or a convenience wrapper), which takes 2 functions for performing the search: - a sort function, used to, heh, sort the tags to search and/or the resulting list (the "simple" backend should use it to sort the result; and the "multi-cache" backend uses it to sort the caches) - a match function, use to check whether a tag should be included in the results. Like the sort function it returns a strcmp()-like value, with the only difference that it probably returns 0 for more tags.
It is somewhat similar to what tagmanager did, but it's more flexible -- and maybe complex, though many sort/match functions would already be provided.
* no pruning is done yet, so there is duplicate in the results; * there is an (almost) working scope completeion implementation; * ... I don't see anything to add, so I'll stop here :)
All this isn't of course written in stone: if we already redo a lot of things, let's get something nice -- though IMO we'll always be better than with tagmanager, since each time I wanted to touch it it took me hours, and sometimes I even haven't been able to fix the problem (thinking of e.g. scope completion...).
Also, later in the thread he says that performance problems with resorting global and workspace tags cannot be fixed with the design of tagmanager. I've been working yesterday on improving this significantly by merging the new tags each time instead of resorting *all* the tags. I hope to commit this in the next few days.
Cool! I haven't done any profiling on either tagmanager or may new attempt, so I can't tell what's actually slow, but any improvement is good to have :)
Regards, Colomban
[...]
Also, later in the thread he says that performance problems with resorting global and workspace tags cannot be fixed with the design of tagmanager. I've been working yesterday on improving this significantly by merging the new tags each time instead of resorting *all* the tags. I hope to commit this in the next few days.
Cool! I haven't done any profiling on either tagmanager or may new attempt, so I can't tell what's actually slow, but any improvement is good to have :)
Coo, have I accidentally ignited a competition between Nick fixing tagmanager and Colomban's potential replacement? :)
Didn't mean to, but if we get to pick the best, great :)
Cheers Lex
Regards, Colomban
Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
On 24/04/2012 22:31, Colomban Wendling wrote:
Le 17/04/2012 18:20, Nick Treleaven a écrit : Sorry for the long delays -- and also small activity -- recently. I have/had a lot of non-Geany stuff to do and stuff, the whole story, you know.
No problem.
I finally committed it and pushed it so you can see it, comment, blame, flame& more: see https://github.com/b4n/geany/tree/wip%2Fctagsmanager
A few points, as they comes to my mind:
- it support asynchronous parsing (though not concurrent parsing);
What's the difference? Also, what does it buy us?
- it is under the new ctagsmanager/ directory;
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
- all types have different names than the tagmanager ones, though currently the API is almost an 1:1 mapping -- and that's maybe a huge mistake?;
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
- there is 2 backends as of today:
- a "simple" one that is simple and that doesn't waste (too much) memory, but that searches in O(n);
Linear searching seems like a bad idea when we currently have O(log n). Removing x tags is O(n * x) by the look of it.
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Given my global tags merge change (see below), I think tagmanager needs some changes to avoid sorting the workspace tags array each time tags are updated, but this is certainly doable.
In practice I haven't yet tested anything big enough to see any difference in performances between these two backends, and that's something that probably isn't worth bothering about for now.
- this "backend" abstraction might be really overkill, and maybe we could do better without it?;
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
- tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
- tag matching/finding is done using ctm_data_backend_find() (or a convenience wrapper), which takes 2 functions for performing the search:
- a sort function, used to, heh, sort the tags to search and/or the resulting list (the "simple" backend should use it to sort the result; and the "multi-cache" backend uses it to sort the caches)
It doesn't sound a good idea to have to sort the haystack on each lookup. Lookup should be very fast, like tagmanager. Adding/removing tags is always less important because real-time updates can be disabled anyway.
- a match function, use to check whether a tag should be included in the results. Like the sort function it returns a strcmp()-like value, with the only difference that it probably returns 0 for more tags.
It is somewhat similar to what tagmanager did, but it's more flexible -- and maybe complex, though many sort/match functions would already be provided.
- no pruning is done yet, so there is duplicate in the results;
- there is an (almost) working scope completeion implementation;
- ... I don't see anything to add, so I'll stop here :)
All this isn't of course written in stone: if we already redo a lot of things, let's get something nice -- though IMO we'll always be better than with tagmanager, since each time I wanted to touch it it took me hours, and sometimes I even haven't been able to fix the problem (thinking of e.g. scope completion...).
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
Also, later in the thread he says that performance problems with resorting global and workspace tags cannot be fixed with the design of tagmanager. I've been working yesterday on improving this significantly by merging the new tags each time instead of resorting *all* the tags. I hope to commit this in the next few days.
Cool! I haven't done any profiling on either tagmanager or may new attempt, so I can't tell what's actually slow, but any improvement is good to have :)
I finished the global tags merging change: https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd2906786462...
IMO this should make loading global tags fast enough, even for heavy use, as the whole array is now only traversed once.
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags & merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
The other changes I've made were in response to a user's problem with generating C tag files, and some changes to the CTags files.
Regards, Nick
On 26/04/2012 16:02, Nick Treleaven wrote:
On 24/04/2012 22:31, Colomban Wendling wrote:
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
BTW this is a good idea to clearly separate CTags from tagmanager. If this change can be applied separately, perhaps it could be merged into master.
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags & merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
Another option is to remove the workspace array altogether and just have Geany collate tags from each (sorted) source object when needed. Not sure the implications of this yet but it would simplify tagmanager.
Le 26/04/2012 18:53, Nick Treleaven a écrit :
On 26/04/2012 16:02, Nick Treleaven wrote:
On 24/04/2012 22:31, Colomban Wendling wrote:
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
BTW this is a good idea to clearly separate CTags from tagmanager. If this change can be applied separately, perhaps it could be merged into master.
It should be quite easy -- though it won't still be a vanilla CTags of course, my own isn't either (yet?).
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags & merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
Another option is to remove the workspace array altogether and just have Geany collate tags from each (sorted) source object when needed. Not sure the implications of this yet but it would simplify tagmanager.
Well, tagmanager needs to know all tags to perform e.g. scope completion beyond file's boundaries. And for search too, it would need us to pass it everything, or to perform a search on each document's tags and then merge the result. Doesn't sound sensible at a first glance, but maybe I'm missing something.
On 29/04/2012 15:47, Colomban Wendling wrote:
Le 26/04/2012 18:53, Nick Treleaven a écrit :
On 26/04/2012 16:02, Nick Treleaven wrote:
On 24/04/2012 22:31, Colomban Wendling wrote:
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
BTW this is a good idea to clearly separate CTags from tagmanager. If this change can be applied separately, perhaps it could be merged into master.
It should be quite easy -- though it won't still be a vanilla CTags of course, my own isn't either (yet?).
I just thought it was a separate change from the TM rewrite.
We have a lot of changes from CTags e.g. new languages and improvements to existing parsers (and even CTags itself) which IMO we can't throw away.
It is possible to merge some changes from CTags but for e.g. c.c this has become pretty difficult. I tried this once but it ended up breaking the parser and I didn't work out why.
IMO the biggest issue with CTags is that it isn't really maintained and they rarely accept patches even when it fixes an important bug (e.g. a fix to regex callback parsing with ignored matches).
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags& merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
Another option is to remove the workspace array altogether and just have Geany collate tags from each (sorted) source object when needed. Not sure the implications of this yet but it would simplify tagmanager.
Well, tagmanager needs to know all tags to perform e.g. scope completion beyond file's boundaries. And for search too, it would need us to pass it everything, or to perform a search on each document's tags and then merge the result. Doesn't sound sensible at a first glance, but maybe I'm missing something.
It comes to a trade off between merging/sorting results from multiple tag arrays and merging/sorting the whole workspace array even when only one document is reparsed.
Actually I suppose the first one can cause lags which the user could notice if there are a lot of results, whereas the second one only causes problems on reparsing, which is less often than searching.
I'm undecided. The second one can be more serious if the calling code doesn't handle e.g. save all specially and the workspace array size is quite big. The special handling is to not update the workspace array until all documents have been parsed, then to resort it. If it's only save all, we can cope, but there may be other ways this can happen.
Hi All,
To summarise since the thread has several subthreads.
1. Tagmanager Understandability
a. I generated the doxygen documentation for tagmanager, it works if you set recursive, but didn't help much:
- if its not OOP why does it say things like "TMWorkspace is derived from TMWorkObject" and similar?
- its not clear how it all goes together, the workspace contains global tags and work_objects, or is that files and whats the difference between source_file and file_entry?
- similarly whats the difference between symbol and tag?
2. Ability to expand tagmanager to handle names declared in lexical scopes (not to be confused with struct/class scopes). Here is the example again with some numbers so I can refer to them
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members <1> */ { struct c o; o./* struct c members */ p./* struct a members <2> */ } o./* struct a members */ p./* struct a members */ }
a. yep, tries use more memory than an array, the usual speed/space, pick one, tradeoff :)
b. @Nick, when you say sort by scope then name, are you wanting to have an entry in the table for each declaration of the name?
- If so this makes the array much bigger to search and your search speed depends on size, and it doesn't get you anything, you can't search by scope since you don't know if the name is declared in the scope you are in or an outer scope compare p at <1> and <2>
- having a single name array which then points to scope info for the name is a viable approach (disclosure, thats how I'm doing the symbol table for a language I'm developing) but the table being searched is usually larger than if you have nested arrays. Being smaller these are faster to search if the search isn't O(1), hence the suggestion of trie instead of bsearch.
3. Background/asynchronous whatever you want to call it parsing
a. @Colomban, you say that caches are created on first lookup, doesn't that throw work back into the UI thread which could have been done in the parsing thread?
4. Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
5. Overloaded symbols
Since Colombans patch, overloaded symbols are now stable for all practical code (I think theoretically it could get confused if the overloads are on the same line but thats unlikely enough to ignore for human generated code)
6. Moving functionality from symbols.c to tagmanager
a. Since its the 100th anniversary of the Titanic sinking, I think that "shuffling the deckchairs" is an apt analogy, the functionality has to be somewhere, its only useful to move it if the destination significantly reduces the effort required.
Cheers Lex
On 02/05/2012 05:46, Lex Trotman wrote:
Hi All,
To summarise since the thread has several subthreads.
- Tagmanager Understandability
a. I generated the doxygen documentation for tagmanager, it works if you set recursive, but didn't help much:
- if its not OOP why does it say things like "TMWorkspace is derived
from TMWorkObject" and similar?
documentation bug IMO
- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the
workspace work objects are document tags. global tags explained in geany's manual.
difference between source_file and file_entry?
It doesn't look like tm_file_entry_ is really used.
- similarly whats the difference between symbol and tag?
tm_symbol_ doesn't seem to be used.
- Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes). Here is the example again with some numbers so I can refer to them
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members<1> */ { struct c o; o./* struct c members */ p./* struct a members<2> */ } o./* struct a members */ p./* struct a members */ }
a. yep, tries use more memory than an array, the usual speed/space, pick one, tradeoff :)
b. @Nick, when you say sort by scope then name, are you wanting to have an entry in the table for each declaration of the name?
no
- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't search by scope since you don't know if the name is declared in the scope you are in or an outer scope compare p at<1> and<2>
- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol table for a language I'm developing) but the table being searched is usually larger than if you have nested arrays. Being smaller these are faster to search if the search isn't O(1), hence the suggestion of trie instead of bsearch.
the gain in simplicity makes a bigger array to search worth it. Remember, global tags aren't included in the workspace array of tagmanager, so we're not talking a big number of tags, and we have o(log n) searching.
- Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
ctags c.c already parses local tags
- Overloaded symbols
Since Colombans patch, overloaded symbols are now stable for all practical code (I think theoretically it could get confused if the overloads are on the same line but thats unlikely enough to ignore for human generated code)
If you're talking about master, I think I still experienced wrong parenting on reparsing when removing lines.
- Moving functionality from symbols.c to tagmanager
a. Since its the 100th anniversary of the Titanic sinking, I think that "shuffling the deckchairs" is an apt analogy, the functionality has to be somewhere, its only useful to move it if the destination significantly reduces the effort required.
I don't think I suggested moving functionality. I wondered whether TM could help make symbols.c less complex. I would need to understand the complexity to know whether this is appropriate or not.
Titanic comment is bizarre. My suggestion could equally apply to ctagsmanager.
On 07/05/2012 17:04, Nick Treleaven wrote:
On 02/05/2012 05:46, Lex Trotman wrote:
- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the
workspace work objects are document tags. global tags explained in geany's manual.
difference between source_file and file_entry?
It doesn't look like tm_file_entry_ is really used.
- similarly whats the difference between symbol and tag?
tm_symbol_ doesn't seem to be used.
Also, tm_project_ is not used either.
- Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes). Here is the example again with some numbers so I can refer to them
b. @Nick, when you say sort by scope then name, are you wanting to have an entry in the table for each declaration of the name?
no
- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't search by scope since you don't know if the name is declared in the scope you are in or an outer scope compare p at<1> and<2>
- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol table for a language I'm developing) but the table being searched is usually larger than if you have nested arrays. Being smaller these are faster to search if the search isn't O(1), hence the suggestion of trie instead of bsearch.
the gain in simplicity makes a bigger array to search worth it. Remember, global tags aren't included in the workspace array of tagmanager, so we're not talking a big number of tags, and we have o(log n) searching.
Oops, forget the global/workspace division, we still need to search global scopes. But I don't know why you think this is too slow.
On 8 May 2012 02:04, Nick Treleaven nick.treleaven@btinternet.com wrote:
On 02/05/2012 05:46, Lex Trotman wrote:
Hi All,
To summarise since the thread has several subthreads.
- Tagmanager Understandability
a. I generated the doxygen documentation for tagmanager, it works if you set recursive, but didn't help much:
- if its not OOP why does it say things like "TMWorkspace is derived
from TMWorkObject" and similar?
documentation bug IMO
O-K, clearly the documenter obviously had a meaning for the term in mind since it occurs in several places, what do you think they meant?
- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the
workspace work objects are document tags. global tags explained in geany's manual.
And the work objects/document tags are in a different array to global tags if I read your comment on the next post right?
difference between source_file and file_entry?
It doesn't look like tm_file_entry_ is really used.
Along with your comment below and about project on the next post, sounds like tm code could be reduced significantly. Might help :)
- similarly whats the difference between symbol and tag?
tm_symbol_ doesn't seem to be used.
- Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes). Here is the example again with some numbers so I can refer to them
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members<1> */ { struct c o; o./* struct c members */ p./* struct a members<2> */ } o./* struct a members */ p./* struct a members */ }
a. yep, tries use more memory than an array, the usual speed/space, pick one, tradeoff :)
b. @Nick, when you say sort by scope then name, are you wanting to have an entry in the table for each declaration of the name?
no
So what did you mean?
- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't search by scope since you don't know if the name is declared in the scope you are in or an outer scope compare p at<1> and<2>
- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol table for a language I'm developing) but the table being searched is usually larger than if you have nested arrays. Being smaller these are faster to search if the search isn't O(1), hence the suggestion of trie instead of bsearch.
the gain in simplicity makes a bigger array to search worth it. Remember, global tags aren't included in the workspace array of tagmanager, so we're not talking a big number of tags, and we have o(log n) searching.
Ok, but to date document tags don't include local variables (see below) so they will add many more names, with lots of replication.
- Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
ctags c.c already parses local tags
No it doesn't AFAICT:
on brace open createTags() calls tagCheck() and nest(),
on brace open checkTag() only makes a tag for a preceding function or contextual entity (== class, enum, namespace struct or union) it does not look at the contents of the {},
and nest() only recurses if the { opens a class, enum, namespace, struct, union otherwise it skips everything in the {},
so no local declarations are parsed unless I've missed something?
- Overloaded symbols
Since Colombans patch, overloaded symbols are now stable for all practical code (I think theoretically it could get confused if the overloads are on the same line but thats unlikely enough to ignore for human generated code)
If you're talking about master, I think I still experienced wrong parenting on reparsing when removing lines.
HEAD of master gives me no stability problems with C++, can't cause any problems with C either, can you reproduce?
- Moving functionality from symbols.c to tagmanager
a. Since its the 100th anniversary of the Titanic sinking, I think that "shuffling the deckchairs" is an apt analogy, the functionality has to be somewhere, its only useful to move it if the destination significantly reduces the effort required.
I don't think I suggested moving functionality.
OK, I read it that you were suggesting that.
I wondered whether TM could
help make symbols.c less complex. I would need to understand the complexity to know whether this is appropriate or not.
Titanic comment is bizarre. My suggestion could equally apply to ctagsmanager.
Oh dear, misdirected attempt at black humor failed :(
Though I thought my UK friends used the term "shuffling the deckchairs on the titanic" to mean irrelevant changes. Maybe they learned it from me. But if you were not meaning to move stuff from symbol.c to tagmanager then I guess the implication that just moving stuff doesn't reduce complexity wouldn't make sense either.
Cheers Lex
Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
On 12-05-07 05:03 PM, Lex Trotman wrote:
On 8 May 2012 02:04, Nick Treleavennick.treleaven@btinternet.com wrote:
On 02/05/2012 05:46, Lex Trotman wrote:
- Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
ctags c.c already parses local tags
No it doesn't AFAICT:
I'm guessing he's referring to the "upstream" CTags c.c, which does have a "l" kind for "local variables" (off by default). See `ctags --list-kinds=C`. I'm not sure if the Geany fork has this, was forked before it was added, or if the guy that wrote TM took it out.
Cheers, Matthew Brush
On 8 May 2012 13:27, Matthew Brush mbrush@codebrainz.ca wrote:
On 12-05-07 05:03 PM, Lex Trotman wrote:
On 8 May 2012 02:04, Nick Treleavennick.treleaven@btinternet.com wrote:
On 02/05/2012 05:46, Lex Trotman wrote:
- Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
ctags c.c already parses local tags
No it doesn't AFAICT:
I'm guessing he's referring to the "upstream" CTags c.c, which does have a "l" kind for "local variables" (off by default). See `ctags --list-kinds=C`. I'm not sure if the Geany fork has this, was forked before it was added, or if the guy that wrote TM took it out.
Ok, I havn't looked at Ctags c.c because IIUC from other comments it isn't really mergable with our c.c.
Does upstream c.c use tagmanager, and if so how does it structure scopes? (A good exercise for your compiler :)
Cheers Lex
Cheers, Matthew Brush
Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
On 12-05-07 08:32 PM, Lex Trotman wrote:
On 8 May 2012 13:27, Matthew Brushmbrush@codebrainz.ca wrote:
On 12-05-07 05:03 PM, Lex Trotman wrote:
On 8 May 2012 02:04, Nick Treleavennick.treleaven@btinternet.com wrote:
On 02/05/2012 05:46, Lex Trotman wrote:
- Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
ctags c.c already parses local tags
No it doesn't AFAICT:
I'm guessing he's referring to the "upstream" CTags c.c, which does have a "l" kind for "local variables" (off by default). See `ctags --list-kinds=C`. I'm not sure if the Geany fork has this, was forked before it was added, or if the guy that wrote TM took it out.
Ok, I havn't looked at Ctags c.c because IIUC from other comments it isn't really mergable with our c.c.
Does upstream c.c use tagmanager, and if so how does it structure scopes? (A good exercise for your compiler :)
1) no 1a) I never could understand CTags scopes, something like a 2-byte value, maybe an index into some array?
Cheers, Matthew Brush
Le 08/05/2012 02:03, Lex Trotman a écrit :
On 8 May 2012 02:04, Nick Treleaven nick.treleaven@btinternet.com wrote:
[...]
It doesn't look like tm_file_entry_ is really used.
Along with your comment below and about project on the next post, sounds like tm code could be reduced significantly. Might help :)
Agreed at 100%! If we could cut down TM to remove the code that's actually not used (or not useful for us) would certainly help a lot to towards making it easier to understand. (BTW I think I remember something about Jiří having done something like it a long time ago)
On 12-05-08 05:44 AM, Colomban Wendling wrote:
Le 08/05/2012 02:03, Lex Trotman a écrit :
On 8 May 2012 02:04, Nick Treleavennick.treleaven@btinternet.com wrote:
[...]
It doesn't look like tm_file_entry_ is really used.
Along with your comment below and about project on the next post, sounds like tm code could be reduced significantly. Might help :)
Agreed at 100%! If we could cut down TM to remove the code that's actually not used (or not useful for us) would certainly help a lot to towards making it easier to understand. (BTW I think I remember something about Jiří having done something like it a long time ago)
+10000000
Also, it wouldn't hurt to make the file system structure and coding standard/style as other Geany files. For example:
tagmanager/tm_*.[ch] <- delete include/ dir, maybe remove tm_ prefix tagmanager/mio/* tagmanager/ctags/* <- all non-tm files here
And then we could run the files through GNU Indent or similar program to match Geany's coding style.
Cheers, Matthew Brush
Le 07/05/2012 18:04, Nick Treleaven a écrit :
On 02/05/2012 05:46, Lex Trotman wrote:
Hi All,
To summarise since the thread has several subthreads.
- Tagmanager Understandability
a. I generated the doxygen documentation for tagmanager, it works if you set recursive, but didn't help much:
- if its not OOP why does it say things like "TMWorkspace is derived
from TMWorkObject" and similar?
documentation bug IMO
I don't think so. TM uses a more or less OOP-like approach. See for example TMWorkspace:
typedef struct { TMWorkObject work_object; /*!< The parent work object */ GPtrArray *global_tags; /*!< Global tags loaded at startup */ GPtrArray *work_objects; /*!< An array of TMWorkObject pointers */ } TMWorkspace;
The first field (work_object) is the inherited "class", here TMWorkObject. And you'll see numerous places where the code uses such a derived structure as a TMWorkObject -- since it is one actually --, which looks quite like OOP.
Or see tm_workspace.c:44:tm_create_workspace(): it uses tm_work_object_register() to register itself as a new type of work object with a few methods (or vfuncs), and the initializes iself with tm_work_object_init(), etc.
I very well understand Lex's questionings about how it does actually work, since it brings a second OOP-style programming in C, less well known than GObject -- though of course less complex also, but still (BTW maybe porting to GObject could help?)
- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the
workspace work objects are document tags. global tags explained in geany's manual.
difference between source_file and file_entry?
It doesn't look like tm_file_entry_ is really used.
- similarly whats the difference between symbol and tag?
tm_symbol_ doesn't seem to be used.
- Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes). Here is the example again with some numbers so I can refer to them
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members<1> */ { struct c o; o./* struct c members */ p./* struct a members<2> */ } o./* struct a members */ p./* struct a members */ }
a. yep, tries use more memory than an array, the usual speed/space, pick one, tradeoff :)
b. @Nick, when you say sort by scope then name, are you wanting to have an entry in the table for each declaration of the name?
no
- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't search by scope since you don't know if the name is declared in the scope you are in or an outer scope compare p at<1> and<2>
- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol table for a language I'm developing) but the table being searched is usually larger than if you have nested arrays. Being smaller these are faster to search if the search isn't O(1), hence the suggestion of trie instead of bsearch.
the gain in simplicity makes a bigger array to search worth it. Remember, global tags aren't included in the workspace array of tagmanager, so we're not talking a big number of tags, and we have o(log n) searching.
- Ctags parsers
Agree with Nick that the parsers are usable, but if we start modifying them to handle local declarations then they will be totally incompatible with the Ctags project so I guess it doesn't matter other than for getting languages we don't currently parse.
ctags c.c already parses local tags
- Overloaded symbols
Since Colombans patch, overloaded symbols are now stable for all practical code (I think theoretically it could get confused if the overloads are on the same line but thats unlikely enough to ignore for human generated code)
If you're talking about master, I think I still experienced wrong parenting on reparsing when removing lines.
- Moving functionality from symbols.c to tagmanager
a. Since its the 100th anniversary of the Titanic sinking, I think that "shuffling the deckchairs" is an apt analogy, the functionality has to be somewhere, its only useful to move it if the destination significantly reduces the effort required.
I don't think I suggested moving functionality. I wondered whether TM could help make symbols.c less complex. I would need to understand the complexity to know whether this is appropriate or not.
Well, what symbols.c tries to do when updating the symbols tree is (as documented above update_tree_tags() BTW):
1) update tags that still exist and remove obsolescent ones 2) add the remaining tags (new ones) to the tree
The implementation is a (tiny) little (bit) more complex than that for performances reasons, like the two hash tables and now the possibility for a hash table entry to hold more than one tag, which deals with exact duplicates.
Not sure how tm could help here, unless maybe it provides a tree rather than a flat list.
On 8 May 2012 22:31, Colomban Wendling lists.ban@herbesfolles.org wrote:
Le 07/05/2012 18:04, Nick Treleaven a écrit :
On 02/05/2012 05:46, Lex Trotman wrote:
Hi All,
To summarise since the thread has several subthreads.
- Tagmanager Understandability
a. I generated the doxygen documentation for tagmanager, it works if you set recursive, but didn't help much:
- if its not OOP why does it say things like "TMWorkspace is derived
from TMWorkObject" and similar?
documentation bug IMO
I don't think so. TM uses a more or less OOP-like approach. See for example TMWorkspace:
typedef struct { TMWorkObject work_object; /*!< The parent work object */ GPtrArray *global_tags; /*!< Global tags loaded at startup */ GPtrArray *work_objects; /*!< An array of TMWorkObject pointers */ } TMWorkspace;
The first field (work_object) is the inherited "class", here TMWorkObject. And you'll see numerous places where the code uses such a derived structure as a TMWorkObject -- since it is one actually --, which looks quite like OOP.
Or see tm_workspace.c:44:tm_create_workspace(): it uses tm_work_object_register() to register itself as a new type of work object with a few methods (or vfuncs), and the initializes iself with tm_work_object_init(), etc.
I very well understand Lex's questionings about how it does actually work, since it brings a second OOP-style programming in C, less well known than GObject -- though of course less complex also, but still (BTW maybe porting to GObject could help?)
Thanks Colomban, that helps :)
[...]
Cheers Lex
Le 02/05/2012 06:46, Lex Trotman a écrit :
[...]
- Background/asynchronous whatever you want to call it parsing
a. @Colomban, you say that caches are created on first lookup, doesn't that throw work back into the UI thread which could have been done in the parsing thread?
Well, yes, the UI thread gets the load (unless there is background/async lookups too :)). However in my approach there can be any number of caches [1], and those caches can't really be guessed, so... but maybe it's simply a bad approach. Or the used caches could be hard-coded in CTM so they never get created implicitly -- quite ugly but works).
[1] cache means: a array of tags sorted by a given sort function. Each cache is then used for a particular type of lookups: * simple completions (sort by name); * scope completion (uses more than one cache, sorted by scope and name, and sorted by type); * etc. (I think there is a third one we use somewhere, can't remenber what though)
Le 30/04/2012 19:07, Nick Treleaven a écrit :
On 29/04/2012 15:47, Colomban Wendling wrote:
Le 26/04/2012 18:53, Nick Treleaven a écrit :
On 26/04/2012 16:02, Nick Treleaven wrote:
On 24/04/2012 22:31, Colomban Wendling wrote:
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
BTW this is a good idea to clearly separate CTags from tagmanager. If this change can be applied separately, perhaps it could be merged into master.
It should be quite easy -- though it won't still be a vanilla CTags of course, my own isn't either (yet?).
I just thought it was a separate change from the TM rewrite.
It could very well be I think, basically it only changes the directory structure a little. I'll try to replicate this on TM.
We have a lot of changes from CTags e.g. new languages and improvements to existing parsers (and even CTags itself) which IMO we can't throw away.
It is possible to merge some changes from CTags but for e.g. c.c this has become pretty difficult. I tried this once but it ended up breaking the parser and I didn't work out why.
IMO the biggest issue with CTags is that it isn't really maintained and they rarely accept patches even when it fixes an important bug (e.g. a fix to regex callback parsing with ignored matches).
Yes, we can't use stock CTags everywhere. Because we have some changes that aren't applied upstream, and then merging would be a mess (sadly). And a bit also because TM changes a few things in a few CTags files to "plug" into it -- necessary unless the parsing is done by calling the ctags executable; which isn't necessarily better.
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags& merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
Another option is to remove the workspace array altogether and just have Geany collate tags from each (sorted) source object when needed. Not sure the implications of this yet but it would simplify tagmanager.
Well, tagmanager needs to know all tags to perform e.g. scope completion beyond file's boundaries. And for search too, it would need us to pass it everything, or to perform a search on each document's tags and then merge the result. Doesn't sound sensible at a first glance, but maybe I'm missing something.
It comes to a trade off between merging/sorting results from multiple tag arrays and merging/sorting the whole workspace array even when only one document is reparsed.
Actually I suppose the first one can cause lags which the user could notice if there are a lot of results, whereas the second one only causes problems on reparsing, which is less often than searching.
(unless real-time tag parsing is enabled)
I'm undecided. The second one can be more serious if the calling code doesn't handle e.g. save all specially and the workspace array size is quite big. The special handling is to not update the workspace array until all documents have been parsed, then to resort it. If it's only save all, we can cope, but there may be other ways this can happen.
If TM (could) have an API for that it'd be fine, but I don't think it's a really good idea if it becomes a hack.
Le 08/05/2012 14:12, Colomban Wendling a écrit :
Le 30/04/2012 19:07, Nick Treleaven a écrit :
On 29/04/2012 15:47, Colomban Wendling wrote:
Le 26/04/2012 18:53, Nick Treleaven a écrit :
On 26/04/2012 16:02, Nick Treleaven wrote:
On 24/04/2012 22:31, Colomban Wendling wrote:
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
BTW this is a good idea to clearly separate CTags from tagmanager. If this change can be applied separately, perhaps it could be merged into master.
It should be quite easy -- though it won't still be a vanilla CTags of course, my own isn't either (yet?).
I just thought it was a separate change from the TM rewrite.
It could very well be I think, basically it only changes the directory structure a little. I'll try to replicate this on TM.
Here we go: https://github.com/b4n/geany/commits/tm/tree-refactoring Looks reasonable?
The Autotools and Waf build systems should be OK (tested & running), but I haven't tested the makefile.win32s; so if somebody could check them it'd be awesome.
Cheers, Colomban
Le 08/05/2012 23:04, Colomban Wendling a écrit :
Le 08/05/2012 14:12, Colomban Wendling a écrit :
Le 30/04/2012 19:07, Nick Treleaven a écrit :
On 29/04/2012 15:47, Colomban Wendling wrote:
Le 26/04/2012 18:53, Nick Treleaven a écrit :
On 26/04/2012 16:02, Nick Treleaven wrote:
On 24/04/2012 22:31, Colomban Wendling wrote: > * it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
BTW this is a good idea to clearly separate CTags from tagmanager. If this change can be applied separately, perhaps it could be merged into master.
It should be quite easy -- though it won't still be a vanilla CTags of course, my own isn't either (yet?).
I just thought it was a separate change from the TM rewrite.
It could very well be I think, basically it only changes the directory structure a little. I'll try to replicate this on TM.
Here we go: https://github.com/b4n/geany/commits/tm/tree-refactoring Looks reasonable?
The Autotools and Waf build systems should be OK (tested & running), but I haven't tested the makefile.win32s; so if somebody could check them it'd be awesome.
Nick, could you test this? This change should be harmless apart maybe the makefiles.win32 update, so before pushing it it'd be good if somebody could test and see if it builds OK.
Thanks :) Colomban
[...]
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
Hi Nick,
It is good to see someone supporting tagmanager rather than it just being trashed (by the likes of me :)
One of the problems with tagmanager is its complexity, leading to considerable wariness on the part of many of us about changing it since we don't understand what we might break.
Actually documenting the overall structure of tagmanager and how it is supposed to work would be a good thing, whats a workspace? what is it meant to represent, how are scopes represented? etc.
[...]
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Perhaps Colomban could confirm, but my first thought is that this is for nested scopes.
How does tagmanager handle nested scopes, or how would it need to be modified to do so, considering the example (in C)
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ { struct c o; o./* struct c members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ }
How much needs to be changed in tagmanager so that the right autocompletes can be provided at each comment? (assuming c.c is taught to parse local variables of course)
And of course the same question for Colomban.
Given my global tags merge change (see below), I think tagmanager needs some changes to avoid sorting the workspace tags array each time tags are updated, but this is certainly doable.
In practice I haven't yet tested anything big enough to see any difference in performances between these two backends, and that's something that probably isn't worth bothering about for now.
- this "backend" abstraction might be really overkill, and maybe we
could do better without it?;
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
It is but is it flexible enough to be expanded to nested scopes.
- tags (and most types) are reference-counted (so they aren't
necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
Blame C++ and overloaded names I think.
[...]
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
I think the fact that this isn't clear is the problem :)
Also, later in the thread he says that performance problems with resorting global and workspace tags cannot be fixed with the design of tagmanager. I've been working yesterday on improving this significantly by merging the new tags each time instead of resorting *all* the tags. I hope to commit this in the next few days.
Cool! I haven't done any profiling on either tagmanager or may new attempt, so I can't tell what's actually slow, but any improvement is good to have :)
I finished the global tags merging change: https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd2906786462...
IMO this should make loading global tags fast enough, even for heavy use, as the whole array is now only traversed once.
Excellent, good one Nick.
Cheers Lex
On 27/04/2012 06:30, Lex Trotman wrote:
[...]
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
One of the problems with tagmanager is its complexity, leading to considerable wariness on the part of many of us about changing it since we don't understand what we might break.
I don't see this myself, I see some complicating issues which could be resolved (and I'm willing to work on them), but generally a sound design for what it provides and for extra things we may want to add.
Actually documenting the overall structure of tagmanager and how it is supposed to work would be a good thing, whats a workspace? what is it meant to represent, how are scopes represented? etc.
Isn't it clear from the data structures? Look at TMWorkspace. Scopes and other tag metadata is the same as CTags. Obviously if we had at-a-glance overall documentation that would be good.
One confusing thing is that a TMTag can be used for an actual tag or for a file. Probably that could be cleaned up.
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Perhaps Colomban could confirm, but my first thought is that this is for nested scopes.
I expect the design is better in some respects (and to be fair I didn't look for better things), but finding a tag based on its name is something we are always going to want to be fast. Even for scope completion, you still need to lookup a tag structure from a name string. So I think we will always want a sorted array of tags per document that we can bsearch (or something equally fast).
Also, I've probably sounded quite harsh on Colomban's design, but I'm commenting on what I think is important. I am genuinely interested in why his design decisions are better. It's a lot to take in all at once, so probably needs some explanation. Sorry if I didn't make that clear.
How does tagmanager handle nested scopes, or how would it need to be modified to do so, considering the example (in C)
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ { struct c o; o./* struct c members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ }
How much needs to be changed in tagmanager so that the right autocompletes can be provided at each comment? (assuming c.c is taught to parse local variables of course)
I don't know, but we still need fast tag lookup based on name. If O(n) scope lookup is too slow, we will need additional data structures arranged differently, but whatever we have should have something like O(log n) lookup for names as this is by far the most common operation.
- this "backend" abstraction might be really overkill, and maybe we could do better without it?;
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
It is but is it flexible enough to be expanded to nested scopes.
see above.
- tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
Blame C++ and overloaded names I think.
I looked at the thread about that, and from what I could tell, the problem was for reparsing unsaved files. Wasn't the order OK for files that have just been saved? (Also I don't follow what that has to do with reference counting).
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
I think the fact that this isn't clear is the problem :)
If I'm following you correctly, I think you're saying the design needs to change, which I accept may be true. What I was saying was that understanding the existing code for scope completion is not really that complex.
Regards, Nick
On 29 April 2012 22:07, Nick Treleaven nick.treleaven@btinternet.com wrote:
On 27/04/2012 06:30, Lex Trotman wrote:
[...]
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
One of the problems with tagmanager is its complexity, leading to considerable wariness on the part of many of us about changing it since we don't understand what we might break.
I don't see this myself, I see some complicating issues which could be resolved (and I'm willing to work on them), but generally a sound design for what it provides and for extra things we may want to add.
Maybe my problem is simply the attempt to simulate object orientation and inheritance in C makes it look complicated for a spoilt C++ programmer :)
Actually documenting the overall structure of tagmanager and how it is supposed to work would be a good thing, whats a workspace? what is it meant to represent, how are scopes represented? etc.
Isn't it clear from the data structures? Look at TMWorkspace. Scopes and other tag metadata is the same as CTags. Obviously if we had at-a-glance overall documentation that would be good.
Actually I meant to ask, it does have documentation comments, is it a style that doxygen can read. If so maybe that will help, I'll try tomorrow.
As for scopes, all I can see is nested structures, not nested code scopes. I guess thats not surprising given that they are not parsed by the parsers. So AFAICT solving the example below is still an open question.
One confusing thing is that a TMTag can be used for an actual tag or for a file. Probably that could be cleaned up.
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Perhaps Colomban could confirm, but my first thought is that this is for nested scopes.
I expect the design is better in some respects (and to be fair I didn't look for better things), but finding a tag based on its name is something we are always going to want to be fast. Even for scope completion, you still need to lookup a tag structure from a name string. So I think we will always want a sorted array of tags per document that we can bsearch (or something equally fast).
Yes, we could have one name table and then prune the results to the scope required, or a name table per scope (which makes the tables smaller and simple searches faster).
Also, I've probably sounded quite harsh on Colomban's design, but I'm commenting on what I think is important. I am genuinely interested in why his design decisions are better. It's a lot to take in all at once, so probably needs some explanation. Sorry if I didn't make that clear.
How does tagmanager handle nested scopes, or how would it need to be modified to do so, considering the example (in C)
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ { struct c o; o./* struct c members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ }
How much needs to be changed in tagmanager so that the right autocompletes can be provided at each comment? (assuming c.c is taught to parse local variables of course)
I don't know, but we still need fast tag lookup based on name. If O(n) scope lookup is too slow, we will need additional data structures arranged differently, but whatever we have should have something like O(log n) lookup for names as this is by far the most common operation.
Agree, name lookup should be as fast as possible, O(log n) what about O(1) like compilers :)
Well since we can't use hashes due to the need to identify prefixes, a radix trie seems the best since it can be as fast as a hash ie two passes through the key, and that gives a sub-trie that begins with the prefix. But as you say thats orthogonal to the question of scope layout.
- this "backend" abstraction might be really overkill, and maybe we
could do better without it?;
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
It is but is it flexible enough to be expanded to nested scopes.
see above.
Still open (at least to my mind).
- tags (and most types) are reference-counted (so they aren't
necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
Blame C++ and overloaded names I think.
I looked at the thread about that, and from what I could tell, the problem was for reparsing unsaved files. Wasn't the order OK for files that have just been saved? (Also I don't follow what that has to do with reference counting).
Hmm, you're right, looks like Colomban is duplicating tags in nested caches, so all searches are just one cache, but in fact they are not duplicated, but reference the same tag using a reference count to tell when to destroy it. I am not sure where the reference to real time parsing comes from?
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
I think the fact that this isn't clear is the problem :)
If I'm following you correctly, I think you're saying the design needs to change, which I accept may be true. What I was saying was that understanding the existing code for scope completion is not really that complex.
I think we are talking about different things, IIUC you are talking about nested structures, I am talking about the {} in the example above. At the moment it doesn't exist in the tagmanager and is different to the nested structure problem.
Cheers Lex
Regards, Nick _______________________________________________ Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
On 29/04/2012 14:07, Lex Trotman wrote:
On 29 April 2012 22:07, Nick Treleavennick.treleaven@btinternet.com wrote:
On 27/04/2012 06:30, Lex Trotman wrote: I don't see this myself, I see some complicating issues which could be resolved (and I'm willing to work on them), but generally a sound design for what it provides and for extra things we may want to add.
Maybe my problem is simply the attempt to simulate object orientation and inheritance in C makes it look complicated for a spoilt C++ programmer :)
Eh? Tagmanager isn't OOP.
I expect the design is better in some respects (and to be fair I didn't look for better things), but finding a tag based on its name is something we are always going to want to be fast. Even for scope completion, you still need to lookup a tag structure from a name string. So I think we will always want a sorted array of tags per document that we can bsearch (or something equally fast).
Yes, we could have one name table and then prune the results to the scope required, or a name table per scope (which makes the tables smaller and simple searches faster).
It seems to me that if we supported proper scope completion then we could still have one array per document, sorted first by scope, then by name.
I don't know, but we still need fast tag lookup based on name. If O(n) scope lookup is too slow, we will need additional data structures arranged differently, but whatever we have should have something like O(log n) lookup for names as this is by far the most common operation.
Agree, name lookup should be as fast as possible, O(log n) what about O(1) like compilers :)
Well since we can't use hashes due to the need to identify prefixes, a radix trie seems the best since it can be as fast as a hash ie two passes through the key, and that gives a sub-trie that begins with the prefix. But as you say thats orthogonal to the question of scope layout.
I only know about a simple trie, but wouldn't that use a lot more memory?
I don't think we want name lookup 'as fast as possible', just no slower than we have already.
- tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
Blame C++ and overloaded names I think.
I looked at the thread about that, and from what I could tell, the problem was for reparsing unsaved files. Wasn't the order OK for files that have just been saved? (Also I don't follow what that has to do with reference counting).
Hmm, you're right, looks like Colomban is duplicating tags in nested caches, so all searches are just one cache, but in fact they are not duplicated, but reference the same tag using a reference count to tell when to destroy it. I am not sure where the reference to real time parsing comes from?
Sorry, I meant reparsing not real time updates. But is your problem with overloaded symbols OK when the file is parsed after saving, i.e. the order is only wrong on reparsing?
Hi Nick,
I think maybe you just didn't realize how much everyone doesn't understand TagManager because we always bitch about it on IRC in passing. Actually, you might be the only person who *really* understands it :)
I'll just rant a little bit about some problems with TM, as I see them (and as bitched about on IRC), and maybe that can spin off some discussions on ways we could improve it:
- Not invented here; none of us wrote it and not in Geany's coding style, file system layout and naming convention, etc. I personally see it as an upstream project like Scintilla, even though the upstream project is dead (at least the TM part). - Seems to be overly complex for what it needs to do (this might not be true, but it's how it seems at a glance). - Contains a *whole other fork* of CTags; for me this is the worst part. It's far too difficult to take upstream improvements on files like c.c, for example. - Isn't threaded, blocks the UI for several seconds while parsing many tags files before Geany can start, and even worse for the project plugins that parse all the project files on opening. This makes Geany appear really slow and in some cases *too* slow (ie. several minutes or more, if there's enough files to parse). - Isn't re-entrant or thread-safe, uses lots of global state, I guess this is mostly due to CTags but also I think TM itself has some same issues. This means it's really hard to get tag parsing out of the main thread. - Upstream project doesn't use or support TM anymore, just us. AFAIK they are using a simpler scheme[1] involving forking out to a CTags binary and using a (seemingly) more logical database (sqlite) for storing and accessing tags. - Doesn't complete local variables, scope completion doesn't seem to work properly either. - Doesn't support CTags format files for some reason (though I added this previously in my fork, so it's certainly do-able).
Of course I don't mean to make it sound like TM is garbage, looking at the code shows it's quite well engineered/optimized, and I'm confident that it has lots of good qualities, even if I don't understand them.
Anyways, I'll end ranting here and hope it might give some ideas about the problems some of us see with TM, and we could work towards fixing, if we aren't to replace TM altogether.
Cheers, Matthew Brush
[1] http://git.gnome.org/browse/anjuta/tree/plugins/symbol-db
On 12-04-29 05:07 AM, Nick Treleaven wrote:
On 27/04/2012 06:30, Lex Trotman wrote:
[...]
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
One of the problems with tagmanager is its complexity, leading to considerable wariness on the part of many of us about changing it since we don't understand what we might break.
I don't see this myself, I see some complicating issues which could be resolved (and I'm willing to work on them), but generally a sound design for what it provides and for extra things we may want to add.
Actually documenting the overall structure of tagmanager and how it is supposed to work would be a good thing, whats a workspace? what is it meant to represent, how are scopes represented? etc.
Isn't it clear from the data structures? Look at TMWorkspace. Scopes and other tag metadata is the same as CTags. Obviously if we had at-a-glance overall documentation that would be good.
One confusing thing is that a TMTag can be used for an actual tag or for a file. Probably that could be cleaned up.
- a "multi-cache" one that, as its name suggests, maintains multiple
caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Perhaps Colomban could confirm, but my first thought is that this is for nested scopes.
I expect the design is better in some respects (and to be fair I didn't look for better things), but finding a tag based on its name is something we are always going to want to be fast. Even for scope completion, you still need to lookup a tag structure from a name string. So I think we will always want a sorted array of tags per document that we can bsearch (or something equally fast).
Also, I've probably sounded quite harsh on Colomban's design, but I'm commenting on what I think is important. I am genuinely interested in why his design decisions are better. It's a lot to take in all at once, so probably needs some explanation. Sorry if I didn't make that clear.
How does tagmanager handle nested scopes, or how would it need to be modified to do so, considering the example (in C)
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ { struct c o; o./* struct c members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ }
How much needs to be changed in tagmanager so that the right autocompletes can be provided at each comment? (assuming c.c is taught to parse local variables of course)
I don't know, but we still need fast tag lookup based on name. If O(n) scope lookup is too slow, we will need additional data structures arranged differently, but whatever we have should have something like O(log n) lookup for names as this is by far the most common operation.
- this "backend" abstraction might be really overkill, and maybe we
could do better without it?;
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
It is but is it flexible enough to be expanded to nested scopes.
see above.
- tags (and most types) are reference-counted (so they aren't
necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
Blame C++ and overloaded names I think.
I looked at the thread about that, and from what I could tell, the problem was for reparsing unsaved files. Wasn't the order OK for files that have just been saved? (Also I don't follow what that has to do with reference counting).
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
I think the fact that this isn't clear is the problem :)
If I'm following you correctly, I think you're saying the design needs to change, which I accept may be true. What I was saying was that understanding the existing code for scope completion is not really that complex.
Regards, Nick _______________________________________________ Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
Le 29/04/2012 14:07, Nick Treleaven a écrit :
On 27/04/2012 06:30, Lex Trotman wrote:
[...]
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
One of the problems with tagmanager is its complexity, leading to considerable wariness on the part of many of us about changing it since we don't understand what we might break.
I don't see this myself, I see some complicating issues which could be resolved (and I'm willing to work on them), but generally a sound design for what it provides and for extra things we may want to add.
Actually documenting the overall structure of tagmanager and how it is supposed to work would be a good thing, whats a workspace? what is it meant to represent, how are scopes represented? etc.
Isn't it clear from the data structures? Look at TMWorkspace. Scopes and other tag metadata is the same as CTags. Obviously if we had at-a-glance overall documentation that would be good.
What I personally blame TM for is not the data structure or overall design, but the code. I can't get my head around the implementation. Every time I try to get it, I get a headache and generally no fix :(
But maybe I'm just plain stupid, or maybe I just miss one or two key things of how it's written to get it.
One confusing thing is that a TMTag can be used for an actual tag or for a file. Probably that could be cleaned up.
Agreed, that's something I think that should be quite easy to "fix" and that would make the thing easier to understand at first (though it's not one of the things that makes me not understand TM ;)). BTW, do we actually use any file tags?
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Perhaps Colomban could confirm, but my first thought is that this is for nested scopes.
I expect the design is better in some respects (and to be fair I didn't look for better things), but finding a tag based on its name is something we are always going to want to be fast. Even for scope completion, you still need to lookup a tag structure from a name string. So I think we will always want a sorted array of tags per document that we can bsearch (or something equally fast).
Also, I've probably sounded quite harsh on Colomban's design, but I'm commenting on what I think is important. I am genuinely interested in why his design decisions are better. It's a lot to take in all at once, so probably needs some explanation. Sorry if I didn't make that clear.
Don't worry about sounding harsh or something, I totally agree that changing for changing is not a good idea. To be honest, there were really two reasons why I tried to write a new tagmanager on the first place:
1) because I wasn't able to fix the current one; 2) to support asynchronous parsing.
And actually I haven't added much great design, it actually works quite the same as does TM currently -- per-file tags, a workspace holding a merge of all tags.
[...]
- tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
Blame C++ and overloaded names I think.
I looked at the thread about that, and from what I could tell, the problem was for reparsing unsaved files. Wasn't the order OK for files that have just been saved? (Also I don't follow what that has to do with reference counting).
We don't parse unsaved files at all because TM can't do that. And that's once another thing that should be fixed.
However I don't get the point you're talking about, maybe my answer is OT then.
Cheers, Colomban
Le 27/04/2012 07:30, Lex Trotman a écrit :
[...]
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Perhaps Colomban could confirm, but my first thought is that this is for nested scopes.
Not at all, sorry :) It's only for performance, so search on different criteria can remain fast (just a bsearch()).
How does tagmanager handle nested scopes, or how would it need to be modified to do so, considering the example (in C)
{ struct a o; struct a p; o./* struct a members */ { struct b o; o./* struct b members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ { struct c o; o./* struct c members */ p./* struct a members */ } o./* struct a members */ p./* struct a members */ }
How much needs to be changed in tagmanager so that the right autocompletes can be provided at each comment? (assuming c.c is taught to parse local variables of course)
And of course the same question for Colomban.
To really support such thing would require the parsers to provide a true tree, not a flat thing. However, my scope completion "algorithm" takes into account a few things to try to find The Right Thing™:
1) the file in which the tag appears (e.g. tries to resolve things locally first); 2) the "distance" in lines between the tags (e.g. the tag that comes just before the one to complete has precedence over another one).
but this wouldn't be enough to get correct results with your example; it would really need a tree.
Cheers, Colomban
Hi,
Le 26/04/2012 17:02, Nick Treleaven a écrit :
On 24/04/2012 22:31, Colomban Wendling wrote:
Le 17/04/2012 18:20, Nick Treleaven a écrit : Sorry for the long delays -- and also small activity -- recently. I have/had a lot of non-Geany stuff to do and stuff, the whole story, you know.
No problem.
I finally committed it and pushed it so you can see it, comment, blame, flame& more: see https://github.com/b4n/geany/tree/wip%2Fctagsmanager
A few points, as they comes to my mind:
- it support asynchronous parsing (though not concurrent parsing);
What's the difference? Also, what does it buy us?
What I meant when saying it's asynchronous but not concurrent is that it supports launching the parsers in a separate thread, but it cannot launch several parsers at once. This is because CTags parsers aren't thread-safe (have a lot of global states and no locks).
What asynchronous parsing gives us is quite simple: no blocking. This means that a slow parser (like e.g. the HTML one on Windows before you changed the regex library) wouldn't lock Geany. Same for project plugins that want to parse thousands of files: this would still take a long time, but Geany would be still usable in the meantime.
- it is under the new ctagsmanager/ directory;
- it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
- all types have different names than the tagmanager ones, though currently the API is almost an 1:1 mapping -- and that's maybe a huge mistake?;
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
I think Lex and Matthew probably summarize it quite correctly: it's not that TM is bad or has to be replaced; it is that (I'm) unable to understand it enough to fix anything in it, like scope completion. Maybe it's only me, but I tried hard :)
And the only reason why maybe TagManager would need replacement/large changes is asynchronous parsing. I'm not sure how hard it would be to get it with TM -- but again, maybe it's only that I don't understand it enough.
- there is 2 backends as of today:
- a "simple" one that is simple and that doesn't waste (too much) memory, but that searches in O(n);
Linear searching seems like a bad idea when we currently have O(log n). Removing x tags is O(n * x) by the look of it.
I agree it's not interesting regarding performances, and this backend isn't meant for it. I needed an implementation for early testing, so I wrote it simple. And it showed useful for testing later on too, since because of it's simplicity it isn't bugged -- just slow :)
BTW, how does TagManager do fast searches? I always though it did a sorting with new attributes each time they changed, so in such situations it's even worse than O(n), isn't it?
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Well, adding a tag would require a bsearch() on each cache, yes. However, adding tags is mostly done in a separate thread (if async parsing it used), so it can be less of a problem.
And a search is simply a bsearch() (O(n log n), right?) given the cache already exists. If the cache doesn't exist, it has to be created so yeah on the first search it's slow.
Given my global tags merge change (see below), I think tagmanager needs some changes to avoid sorting the workspace tags array each time tags are updated, but this is certainly doable.
In practice I haven't yet tested anything big enough to see any difference in performances between these two backends, and that's something that probably isn't worth bothering about for now.
- this "backend" abstraction might be really overkill, and maybe we could do better without it?;
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
Multiple backends isn't really useful probably. As said above, the first was mostly a testing one, and I wanted to be able to keep both during development for better testing. At one point I also though maybe there could be some backends optimized for particular situations, like fast search, fast insertion, low memory consumption, etc., but I don't see much use for this anymore.
- tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
It's not strictly needed, but it makes some memory management easier, and fits well with GTK memory management. And this last point helps a lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated and not removed. But that's not new, I already added this in TM.
BTW, what don't you understand in symbols.c? Maybe I could improve the docs.
- tag matching/finding is done using ctm_data_backend_find() (or a convenience wrapper), which takes 2 functions for performing the search:
- a sort function, used to, heh, sort the tags to search and/or the resulting list (the "simple" backend should use it to sort the result; and the "multi-cache" backend uses it to sort the caches)
It doesn't sound a good idea to have to sort the haystack on each lookup. Lookup should be very fast, like tagmanager. Adding/removing tags is always less important because real-time updates can be disabled anyway.
That's why the multicache backend (the one supposed to be somewhat fast) only creates new cache on lookup if there isn't compatible cache yet, which should happen roughly only once, on the first lookup.
However, I still don't understand how TM does address the issue, and I though it actually re-sorted the array on lookup if the attributes changed. And IIUC too, the attributes may change often if performing search A and the B, then A again, the A, B, A, B and so forth.
- a match function, use to check whether a tag should be included in the results. Like the sort function it returns a strcmp()-like value, with the only difference that it probably returns 0 for more tags.
It is somewhat similar to what tagmanager did, but it's more flexible -- and maybe complex, though many sort/match functions would already be provided.
- no pruning is done yet, so there is duplicate in the results;
- there is an (almost) working scope completeion implementation;
- ... I don't see anything to add, so I'll stop here :)
All this isn't of course written in stone: if we already redo a lot of things, let's get something nice -- though IMO we'll always be better than with tagmanager, since each time I wanted to touch it it took me hours, and sometimes I even haven't been able to fix the problem (thinking of e.g. scope completion...).
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
It's probably not some TM-wide problem, but I tried to understand what was the problem, but never got to understand that thing. And certainly not how I would add some more precision in the scope search, like same file, close to the position, etc.
Also, later in the thread he says that performance problems with resorting global and workspace tags cannot be fixed with the design of tagmanager. I've been working yesterday on improving this significantly by merging the new tags each time instead of resorting *all* the tags. I hope to commit this in the next few days.
Cool! I haven't done any profiling on either tagmanager or may new attempt, so I can't tell what's actually slow, but any improvement is good to have :)
I finished the global tags merging change: https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd2906786462...
IMO this should make loading global tags fast enough, even for heavy use, as the whole array is now only traversed once.
That's a really good idea yeah, good job :)
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags & merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
I'm not sure how it is for now (actually i didn't check just now), but yes it looks sensible.
The other changes I've made were in response to a user's problem with generating C tag files, and some changes to the CTags files.
Good. However not stripping G_BEGIN_DECLS confuses the CTags parser and makes particularly makes it think the first identifier after it has "G_BEGIN_DECLS" type; which then may cause problems for e.g. scope completion.
Regards, Colomban
On 29/04/2012 15:42, Colomban Wendling wrote:
- it support asynchronous parsing (though not concurrent parsing);
What's the difference? Also, what does it buy us?
What I meant when saying it's asynchronous but not concurrent is that it supports launching the parsers in a separate thread, but it cannot launch several parsers at once. This is because CTags parsers aren't thread-safe (have a lot of global states and no locks).
What asynchronous parsing gives us is quite simple: no blocking. This means that a slow parser (like e.g. the HTML one on Windows before you changed the regex library) wouldn't lock Geany. Same for project plugins that want to parse thousands of files: this would still take a long time, but Geany would be still usable in the meantime.
Ok. It seems a good idea to support this, but for parsing tags in open documents it doesn't seem particularly useful, as this ought to be fast. The regex problem was unusual IMO and due to an old version of GNU regex.
I don't understand why tagmanager has to be replaced, why not just replace the parts you want to improve? Rewriting it is likely to lead to a new set of bugs and be hard to review and merge changes from master.
I think Lex and Matthew probably summarize it quite correctly: it's not that TM is bad or has to be replaced; it is that (I'm) unable to understand it enough to fix anything in it, like scope completion. Maybe it's only me, but I tried hard :)
And the only reason why maybe TagManager would need replacement/large changes is asynchronous parsing. I'm not sure how hard it would be to get it with TM -- but again, maybe it's only that I don't understand it enough.
Not sure about this, but I think the only messy part of TM is when it updates the workspace array, which I think could be removed (I'll explain this in another reply).
- there is 2 backends as of today:
- a "simple" one that is simple and that doesn't waste (too much) memory, but that searches in O(n);
Linear searching seems like a bad idea when we currently have O(log n). Removing x tags is O(n * x) by the look of it.
I agree it's not interesting regarding performances, and this backend isn't meant for it. I needed an implementation for early testing, so I wrote it simple. And it showed useful for testing later on too, since because of it's simplicity it isn't bugged -- just slow :)
Ok.
BTW, how does TagManager do fast searches? I always though it did a sorting with new attributes each time they changed, so in such situations it's even worse than O(n), isn't it?
For searching, it doesn't do any sorting ever. For adding tags the work object (i.e. document tags) have to be sorted, which I think is good, but also the workspace tags are currently resorted, which I think may be a bad design.
- a "multi-cache" one that, as its name suggests, maintains multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Well, adding a tag would require a bsearch() on each cache, yes. However, adding tags is mostly done in a separate thread (if async parsing it used), so it can be less of a problem.
I haven't studied your design, but I would prefer that any design is efficient on all threads, so the user's computer can use remaining performance for compiling & whatever else they want.
Also, what about global tags? Those can add a lot of tags all at once.
And a search is simply a bsearch() (O(n log n), right?) given the cache already exists. If the cache doesn't exist, it has to be created so yeah on the first search it's slow.
If it can be slow enough for the user to notice this is probably bad.
I don't see why having two is better. The memory overhead for a pointer array is not much vs. the size of the tag structures. Fast searching is important.
Multiple backends isn't really useful probably. As said above, the first was mostly a testing one, and I wanted to be able to keep both during development for better testing. At one point I also though maybe there could be some backends optimized for particular situations, like fast search, fast insertion, low memory consumption, etc., but I don't see much use for this anymore.
Ok.
- tags (and most types) are reference-counted (so they aren't necessarily duplicated, e.g. in the multicache backend);
I don't really understand src/symbols.c since the real-time parsing change, so don't really understand why this is needed.
BTW here I meant reparsing document tags without saving, sorry for any confusion.
It's not strictly needed, but it makes some memory management easier, and fits well with GTK memory management. And this last point helps a lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated and not removed. But that's not new, I already added this in TM.
Yes. Is the reason the tree should be updated and not recreated to preserve fold states and scrollbar position? In fact I'm a bit confused about this, how come old tags are still accessed after reparsing the document with new tags?
BTW, what don't you understand in symbols.c? Maybe I could improve the docs.
It just seems to have grown overcomplex - basically the reparsing change means much more code to manage the symbol tree. Isn't there a second hash table besides the parent tag hash table?
It's possible tagmanager needs better abilities so that the symbol list code doesn't need so much complexity.
- tag matching/finding is done using ctm_data_backend_find() (or a convenience wrapper), which takes 2 functions for performing the search:
- a sort function, used to, heh, sort the tags to search and/or the resulting list (the "simple" backend should use it to sort the result; and the "multi-cache" backend uses it to sort the caches)
It doesn't sound a good idea to have to sort the haystack on each lookup. Lookup should be very fast, like tagmanager. Adding/removing tags is always less important because real-time updates can be disabled anyway.
That's why the multicache backend (the one supposed to be somewhat fast) only creates new cache on lookup if there isn't compatible cache yet, which should happen roughly only once, on the first lookup.
Ok.
However, I still don't understand how TM does address the issue, and I though it actually re-sorted the array on lookup if the attributes changed. And IIUC too, the attributes may change often if performing search A and the B, then A again, the A, B, A, B and so forth.
The arrays are always sorted the same way, and only after tag creation. The search results are filtered by the lookup attributes.
All this isn't of course written in stone: if we already redo a lot of things, let's get something nice -- though IMO we'll always be better than with tagmanager, since each time I wanted to touch it it took me hours, and sometimes I even haven't been able to fix the problem (thinking of e.g. scope completion...).
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
It's probably not some TM-wide problem, but I tried to understand what was the problem, but never got to understand that thing. And certainly not how I would add some more precision in the scope search, like same file, close to the position, etc.
I think if a tag array was sorted by scope first, then by name, then the global name lookup still works as before, but allows O(log n) lookup of all tags with matching scope. So you lookup those scope tags, then you sort the results according to how you like, e.g. line number.
To sort by file I think you would use tmtag->atts.entry.file->work_object.file_name
I finished the global tags merging change: https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd2906786462...
IMO this should make loading global tags fast enough, even for heavy use, as the whole array is now only traversed once.
That's a really good idea yeah, good job :)
For avoiding resorting of workspace tags when only reparsing one source object, we can remove the source object's old tags& merge the new tags. This is all O(n) for the workspace array. I haven't looked into implementing this yet because now it's clear you're working on a competing change.
I'm not sure how it is for now (actually i didn't check just now), but yes it looks sensible.
I think removing the workspace array is probably better, but haven't investigated completely. I'll reply in the other email.
The other changes I've made were in response to a user's problem with generating C tag files, and some changes to the CTags files.
Good. However not stripping G_BEGIN_DECLS confuses the CTags parser and makes particularly makes it think the first identifier after it has "G_BEGIN_DECLS" type; which then may cause problems for e.g. scope completion.
Are you sure? I tested this and it seemed Ok as my change adds those tags to the c_ignore_tags automatically.
Le 30/04/2012 18:54, Nick Treleaven a écrit :
On 29/04/2012 15:42, Colomban Wendling wrote:
- it support asynchronous parsing (though not concurrent parsing);
What's the difference? Also, what does it buy us?
What I meant when saying it's asynchronous but not concurrent is that it supports launching the parsers in a separate thread, but it cannot launch several parsers at once. This is because CTags parsers aren't thread-safe (have a lot of global states and no locks).
What asynchronous parsing gives us is quite simple: no blocking. This means that a slow parser (like e.g. the HTML one on Windows before you changed the regex library) wouldn't lock Geany. Same for project plugins that want to parse thousands of files: this would still take a long time, but Geany would be still usable in the meantime.
Ok. It seems a good idea to support this, but for parsing tags in open documents it doesn't seem particularly useful, as this ought to be fast. The regex problem was unusual IMO and due to an old version of GNU regex.
Right, though a really big file could also be "slow" to parse.
BTW, how does TagManager do fast searches? I always though it did a sorting with new attributes each time they changed, so in such situations it's even worse than O(n), isn't it?
For searching, it doesn't do any sorting ever. For adding tags the work object (i.e. document tags) have to be sorted, which I think is good, but also the workspace tags are currently resorted, which I think may be a bad design.
If it is never resorted, it means ALL lookups are done on the same criteria: the name. Right? If so, how could scope completion be fast? It requires a lot of different search:
1) name search for finding the type of the element to complete; 2) type search for resolving possible typedefs (recursively); 3) scope search for getting the actual results.
Or do I miss something again?
- a "multi-cache" one that, as its name suggests, maintains
multiple caches (sorted tags arrays). it uses a little more memory and is slower on insertion since it maintains several sorted lists, but a search on an already existing cache is very fast.
Won't this be slow for adding many tags at once? How is this design better than tagmanager?
Well, adding a tag would require a bsearch() on each cache, yes. However, adding tags is mostly done in a separate thread (if async parsing it used), so it can be less of a problem.
I haven't studied your design, but I would prefer that any design is efficient on all threads, so the user's computer can use remaining performance for compiling & whatever else they want.
Yeah of course. One possible simple improvement would be to merge only when all tags got parsed, getting something like you did earlier with the patch on TM global tags loading.
But yes, it would still require insertions in multiple caches; but I though it was worth it since it provided fast search for all caches (see above for why I though/thinks multiple cache are useful).
Also, what about global tags? Those can add a lot of tags all at once.
I didn't tried to deal with them yet, but I naively reproduced something like in TM, e.g. another array (cache(s)) for them, so they shouldn't make the whole think much slower. However I'm not certain that having a completely separate array is really good for searching, but again, I just replicated the TM design here.
And again, I must admin I didn't actually implemented this for real yet anyway, so I can't really tell.
And a search is simply a bsearch() (O(n log n), right?) given the cache already exists. If the cache doesn't exist, it has to be created so yeah on the first search it's slow.
If it can be slow enough for the user to notice this is probably bad.
As said in another mail, if it's a problem the required caches can be hard-coded so they are created anyway. I doesn't seem a clean thing to me, but that's very well doable.
It's not strictly needed, but it makes some memory management easier, and fits well with GTK memory management. And this last point helps a lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated and not removed. But that's not new, I already added this in TM.
Yes. Is the reason the tree should be updated and not recreated to preserve fold states and scrollbar position? In fact I'm a bit confused
Yes, that's the most prominent reason. It's to:
* keep fold state * keep selection * keep scroll position * avoid overall flickering
basically it tries to avoid any visible side effects of replacing the tree.
about this, how come old tags are still accessed after reparsing the document with new tags?
That's the magic of reference counting :) The GtkListStore actually holds a reference to the tag, so they can still be used after TM dropped them. But anyway we always update the symbol list to have a reference to the current TM tag/
BTW, what don't you understand in symbols.c? Maybe I could improve the docs.
It just seems to have grown overcomplex - basically the reparsing change means much more code to manage the symbol tree. Isn't there a second hash table besides the parent tag hash table?
Well yes, as said in the other mail there is now 2 has table for performance reasons, but the second one is simply another representation of the tags with almost O(1) access.
It's possible tagmanager needs better abilities so that the symbol list code doesn't need so much complexity.
Hum... unless tagmanager provides a tree already I don't really see what could help.
However, I still don't understand how TM does address the issue, and I though it actually re-sorted the array on lookup if the attributes changed. And IIUC too, the attributes may change often if performing search A and the B, then A again, the A, B, A, B and so forth.
The arrays are always sorted the same way, and only after tag creation. The search results are filtered by the lookup attributes.
Wee the other mail for the question about how it makes search fast when not searching by name.
All this isn't of course written in stone: if we already redo a lot of things, let's get something nice -- though IMO we'll always be better than with tagmanager, since each time I wanted to touch it it took me hours, and sometimes I even haven't been able to fix the problem (thinking of e.g. scope completion...).
I don't really see what the problem understanding it is. I thought scope completion was just tm_workspace_find_scoped and related functions, not some tagmanager-wide problem.
It's probably not some TM-wide problem, but I tried to understand what was the problem, but never got to understand that thing. And certainly not how I would add some more precision in the scope search, like same file, close to the position, etc.
I think if a tag array was sorted by scope first, then by name, then the global name lookup still works as before, but allows O(log n) lookup of all tags with matching scope. So you lookup those scope tags, then you sort the results according to how you like, e.g. line number.
What about matching partial scopes? I mean if one has the "foo::bar::baz" scope, a completion for "bar::" should match that, but that wouldn't work if simply sorted, would it?
The other changes I've made were in response to a user's problem with generating C tag files, and some changes to the CTags files.
Good. However not stripping G_BEGIN_DECLS confuses the CTags parser and makes particularly makes it think the first identifier after it has "G_BEGIN_DECLS" type; which then may cause problems for e.g. scope completion.
Are you sure? I tested this and it seemed Ok as my change adds those tags to the c_ignore_tags automatically.
No, I'm not completely sure, maybe it was another problem actually. I'll check this more carefully.
Hi All,
I did a step back and got some numbers (what! hard evidence in the discussion? :)
Using ctags, including locals in the tags generated from Geany source, slightly more than doubled the number of tags, and for some C++ I have around nearly four times the number.
For Geany bsearching a sorted array, four times the size is only two more iterations, doesn't matter :)
But as Colomban pointed out, with real-time tag generation on, *most* lookups will be from the parser, not Geany. This is needed to see if the name it found is a type so it can tell the statement is a declaration. (c.c does not do this at the moment since it only looks at top level statements and all top level statements are declarations in {} languages).
That workload *will* be proportional to the file size. And the number of insertions will also be proportional to the file size. This is a big change from the current situation where parsing occurs rarely.
For a simple array structure then, whilst parsing, either the search is linear, or the array is re-sorted on each insert, neither is an attractive prospect for larger files with 4* the number of tags.
Also on examining the tags file ctags produced I found that local variable entries did not actually get parsed, ie there was no way to go from a local variable to its type. All that happened was an occurrence of the name was recorded. Also no information about lexical scope is recorded. So even updating to ctags c.c wouldn't give the information on local variables we need to handle the example I posted earlier.
So although improving tagmanager/Colomban manager is still worthwhile, significantly more work is needed on the parsers as well.
And in terms of handling local variables properly and allowing contextual completion ctags looks like a dead end.
Cheers Lex
Am 09.05.2012 07:47, schrieb Lex Trotman:
Using ctags, including locals in the tags generated from Geany source, slightly more than doubled the number of tags, and for some C++ I have around nearly four times the number.
But you only need the tags for the current scope and can drop them if you enter another (non-nested) scope. This surely doesn't double or quadruple the tags.
If I'm editing func A I don't want the locals of func B through Z in my autocompletion list.
Best regards.
On 9 May 2012 16:54, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 09.05.2012 07:47, schrieb Lex Trotman:
Using ctags, including locals in the tags generated from Geany source, slightly more than doubled the number of tags, and for some C++ I have around nearly four times the number.
But you only need the tags for the current scope and can drop them if you enter another (non-nested) scope. This surely doesn't double or quadruple the tags.
You can't drop them from the tags structures because when you are parsing you don't know which scope the cursor is in. So you have to add them all, then decide which ones apply to the current scope.
If I'm editing func A I don't want the locals of func B through Z in my autocompletion list.
Yes, correct, but they have to be in the tags first, then tagmanager has to be taught to choose the in-scope declaration, thats my point the parsers do not generate the information and tagmanager/symbols.c doesn't know how to use it
Cheers Lex
Best regards.
Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
Am 09.05.2012 09:37, schrieb Lex Trotman:
On 9 May 2012 16:54, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 09.05.2012 07:47, schrieb Lex Trotman:
Using ctags, including locals in the tags generated from Geany source, slightly more than doubled the number of tags, and for some C++ I have around nearly four times the number.
But you only need the tags for the current scope and can drop them if you enter another (non-nested) scope. This surely doesn't double or quadruple the tags.
You can't drop them from the tags structures because when you are parsing you don't know which scope the cursor is in. So you have to add them all, then decide which ones apply to the current scope.
Okay, but still only for the current file and not an entire project.
Best regards.
On 9 May 2012 17:40, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 09.05.2012 09:37, schrieb Lex Trotman:
On 9 May 2012 16:54, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 09.05.2012 07:47, schrieb Lex Trotman:
Using ctags, including locals in the tags generated from Geany source, slightly more than doubled the number of tags, and for some C++ I have around nearly four times the number.
But you only need the tags for the current scope and can drop them if you enter another (non-nested) scope. This surely doesn't double or quadruple the tags.
You can't drop them from the tags structures because when you are parsing you don't know which scope the cursor is in. So you have to add them all, then decide which ones apply to the current scope.
Okay, but still only for the current file and not an entire project.
Yes, Geany would only have the open files parsed, I only parsed the entire project to see how much the number of symbols increased if you go parse locals, by using all of Geany I got an average increase, note I only said two times not an absolute number :)
Cheers Lex
Best regards. _______________________________________________ Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel