[Geany-devel] tagmanager changes

Sun Apr 29 13:07:52 UTC 2012

Hi Nick,

I think maybe you just didn't realize how much everyone doesn't 
understand TagManager because we always bitch about it on IRC in 
passing. Actually, you might be the only person who *really* understands 
it :)

I'll just rant a little bit about some problems with TM, as I see them 
(and as bitched about on IRC), and maybe that can spin off some 
discussions on ways we could improve it:

- Not invented here; none of us wrote it and not in Geany's coding 
style, file system layout and naming convention, etc. I personally see 
it as an upstream project like Scintilla, even though the upstream 
project is dead (at least the TM part).
- Seems to be overly complex for what it needs to do (this might not be 
true, but it's how it seems at a glance).
- Contains a *whole other fork* of CTags; for me this is the worst part. 
It's far too difficult to take upstream improvements on files like c.c, 
for example.
- Isn't threaded, blocks the UI for several seconds while parsing many 
tags files before Geany can start, and even worse for the project 
plugins that parse all the project files on opening. This makes Geany 
appear really slow and in some cases *too* slow (ie. several minutes or 
more, if there's enough files to parse).
- Isn't re-entrant or thread-safe, uses lots of global state, I guess 
this is mostly due to CTags but also I think TM itself has some same 
issues. This means it's really hard to get tag parsing out of the main 
thread.
- Upstream project doesn't use or support TM anymore, just us. AFAIK 
they are using a simpler scheme[1] involving forking out to a CTags 
binary and using a (seemingly) more logical database (sqlite) for 
storing and accessing tags.
- Doesn't complete local variables, scope completion doesn't seem to 
work properly either.
- Doesn't support CTags format files for some reason (though I added 
this previously in my fork, so it's certainly do-able).

Of course I don't mean to make it sound like TM is garbage, looking at 
the code shows it's quite well engineered/optimized, and I'm confident 
that it has lots of good qualities, even if I don't understand them.

Anyways, I'll end ranting here and hope it might give some ideas about 
the problems some of us see with TM, and we could work towards fixing, 
if we aren't to replace TM altogether.

Cheers,
Matthew Brush

[1] http://git.gnome.org/browse/anjuta/tree/plugins/symbol-db

On 12-04-29 05:07 AM, Nick Treleaven wrote:
> On 27/04/2012 06:30, Lex Trotman wrote:
>> [...]
>>>
>>> I don't understand why tagmanager has to be replaced, why not just
>>> replace
>>> the parts you want to improve? Rewriting it is likely to lead to a
>>> new set
>>> of bugs and be hard to review and merge changes from master.
>
>> One of the problems with tagmanager is its complexity, leading to
>> considerable wariness on the part of many of us about changing it
>> since we don't understand what we might break.
>
> I don't see this myself, I see some complicating issues which could be
> resolved (and I'm willing to work on them), but generally a sound design
> for what it provides and for extra things we may want to add.
>
>> Actually documenting the overall structure of tagmanager and how it is
>> supposed to work would be a good thing, whats a workspace? what is it
>> meant to represent, how are scopes represented? etc.
>
> Isn't it clear from the data structures? Look at TMWorkspace. Scopes and
> other tag metadata is the same as CTags. Obviously if we had at-a-glance
> overall documentation that would be good.
>
> One confusing thing is that a TMTag can be used for an actual tag or for
> a file. Probably that could be cleaned up.
>
>>>> - a "multi-cache" one that, as its name suggests, maintains multiple
>>>> caches (sorted tags arrays). it uses a little more memory and is
>>>> slower on insertion since it maintains several sorted lists, but a
>>>> search on an already existing cache is very fast.
>>>
>>>
>>> Won't this be slow for adding many tags at once? How is this design
>>> better
>>> than tagmanager?
>>
>> Perhaps Colomban could confirm, but my first thought is that this is
>> for nested scopes.
>
> I expect the design is better in some respects (and to be fair I didn't
> look for better things), but finding a tag based on its name is
> something we are always going to want to be fast. Even for scope
> completion, you still need to lookup a tag structure from a name string.
> So I think we will always want a sorted array of tags per document that
> we can bsearch (or something equally fast).
>
> Also, I've probably sounded quite harsh on Colomban's design, but I'm
> commenting on what I think is important. I am genuinely interested in
> why his design decisions are better. It's a lot to take in all at once,
> so probably needs some explanation. Sorry if I didn't make that clear.
>
>> How does tagmanager handle nested scopes, or how would it need to be
>> modified to do so, considering the example (in C)
>>
>> { struct a o; struct a p;
>> o./* struct a members */
>> { struct b o;
>> o./* struct b members */
>> p./* struct a members */
>> }
>> o./* struct a members */
>> p./* struct a members */
>> { struct c o;
>> o./* struct c members */
>> p./* struct a members */
>> }
>> o./* struct a members */
>> p./* struct a members */
>> }
>>
>> How much needs to be changed in tagmanager so that the right
>> autocompletes can be provided at each comment? (assuming c.c is
>> taught to parse local variables of course)
>
> I don't know, but we still need fast tag lookup based on name. If O(n)
> scope lookup is too slow, we will need additional data structures
> arranged differently, but whatever we have should have something like
> O(log n) lookup for names as this is by far the most common operation.
>
>>>> * this "backend" abstraction might be really overkill, and maybe we
>>>> could do better without it?;
>>>
>>>
>>> I don't see why having two is better. The memory overhead for a pointer
>>> array is not much vs. the size of the tag structures. Fast searching is
>>> important.
>>>
>>
>> It is but is it flexible enough to be expanded to nested scopes.
>
> see above.
>
>>>> * tags (and most types) are reference-counted (so they aren't
>>>> necessarily duplicated, e.g. in the multicache backend);
>>>
>>>
>>> I don't really understand src/symbols.c since the real-time parsing
>>> change,
>>> so don't really understand why this is needed.
>>
>> Blame C++ and overloaded names I think.
>
> I looked at the thread about that, and from what I could tell, the
> problem was for reparsing unsaved files. Wasn't the order OK for files
> that have just been saved? (Also I don't follow what that has to do with
> reference counting).
>
>>> I don't really see what the problem understanding it is. I thought scope
>>> completion was just tm_workspace_find_scoped and related functions,
>>> not some
>>> tagmanager-wide problem.
>>
>> I think the fact that this isn't clear is the problem :)
>
> If I'm following you correctly, I think you're saying the design needs
> to change, which I accept may be true. What I was saying was that
> understanding the existing code for scope completion is not really that
> complex.
>
> Regards,
> Nick
> _______________________________________________
> Geany-devel mailing list
> Geany-devel at uvena.de
> https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel