[Geany-devel] tagmanager changes

Tue May 8 13:24:25 UTC 2012

Le 30/04/2012 18:54, Nick Treleaven a écrit :
> On 29/04/2012 15:42, Colomban Wendling wrote:
>>>> * it support asynchronous parsing (though not concurrent parsing);
>>>
>>> What's the difference? Also, what does it buy us?
>>
>> What I meant when saying it's asynchronous but not concurrent is that it
>> supports launching the parsers in a separate thread, but it cannot
>> launch several parsers at once.  This is because CTags parsers aren't
>> thread-safe (have a lot of global states and no locks).
>>
>> What asynchronous parsing gives us is quite simple: no blocking.  This
>> means that a slow parser (like e.g. the HTML one on Windows before you
>> changed the regex library) wouldn't lock Geany.  Same for project
>> plugins that want to parse thousands of files:  this would still take a
>> long time, but Geany would be still usable in the meantime.
> 
> Ok. It seems a good idea to support this, but for parsing tags in open
> documents it doesn't seem particularly useful, as this ought to be fast.
> The regex problem was unusual IMO and due to an old version of GNU regex.

Right, though a really big file could also be "slow" to parse.

>> BTW, how does TagManager do fast searches?  I always though it did a
>> sorting with new attributes each time they changed, so in such
>> situations it's even worse than O(n), isn't it?
> 
> For searching, it doesn't do any sorting ever. For adding tags the work
> object (i.e. document tags) have to be sorted, which I think is good,
> but also the workspace tags are currently resorted, which I think may be
> a bad design.

If it is never resorted, it means ALL lookups are done on the same
criteria: the name.  Right?  If so, how could scope completion be fast?
 It requires a lot of different search:

1) name search for finding the type of the element to complete;
2) type search for resolving possible typedefs (recursively);
3) scope search for getting the actual results.

Or do I miss something again?

>>>>     - a "multi-cache" one that, as its name suggests, maintains
>>>> multiple
>>>>       caches (sorted tags arrays).  it uses a little more memory and is
>>>>       slower on insertion since it maintains several sorted lists,
>>>> but a
>>>>       search on an already existing cache is very fast.
>>>
>>> Won't this be slow for adding many tags at once? How is this design
>>> better than tagmanager?
>>
>> Well, adding a tag would require a bsearch() on each cache, yes.
>> However, adding tags is mostly done in a separate thread (if async
>> parsing it used), so it can be less of a problem.
> 
> I haven't studied your design, but I would prefer that any design is
> efficient on all threads, so the user's computer can use remaining
> performance for compiling & whatever else they want.

Yeah of course.  One possible simple improvement would be to merge only
when all tags got parsed, getting something like you did earlier with
the patch on TM global tags loading.

But yes, it would still require insertions in multiple caches; but I
though it was worth it since it provided fast search for all caches (see
above for why I though/thinks multiple cache are useful).

> Also, what about global tags? Those can add a lot of tags all at once.

I didn't tried to deal with them yet, but I naively reproduced something
like in TM, e.g. another array (cache(s)) for them, so they shouldn't
make the whole think much slower.  However I'm not certain that having a
completely separate array is really good for searching, but again, I
just replicated the TM design here.

And again, I must admin I didn't actually implemented this for real yet
anyway, so I can't really tell.

>> And a search is simply a bsearch() (O(n log n), right?) given the cache
>> already exists.  If the cache doesn't exist, it has to be created so
>> yeah on the first search it's slow.
> 
> If it can be slow enough for the user to notice this is probably bad.

As said in another mail, if it's a problem the required caches can be
hard-coded so they are created anyway.  I doesn't seem a clean thing to
me, but that's very well doable.

>> It's not strictly needed, but it makes some memory management easier,
>> and fits well with GTK memory management.  And this last point helps a
>> lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated
>> and not removed.
>> But that's not new, I already added this in TM.
> 
> Yes. Is the reason the tree should be updated and not recreated to
> preserve fold states and scrollbar position? In fact I'm a bit confused

Yes, that's the most prominent reason.  It's to:

* keep fold state
* keep selection
* keep scroll position
* avoid overall flickering

basically it tries to avoid any visible side effects of replacing the tree.

> about this, how come old tags are still accessed after reparsing the
> document with new tags?

That's the magic of reference counting :)  The GtkListStore actually
holds a reference to the tag, so they can still be used after TM dropped
them.  But anyway we always update the symbol list to have a reference
to the current TM tag/

>> BTW, what don't you understand in symbols.c?  Maybe I could improve the
>> docs.
> 
> It just seems to have grown overcomplex - basically the reparsing change
> means much more code to manage the symbol tree. Isn't there a second
> hash table besides the parent tag hash table?

Well yes, as said in the other mail there is now 2 has table for
performance reasons, but the second one is simply another representation
of the tags with almost O(1) access.

> It's possible tagmanager needs better abilities so that the symbol list
> code doesn't need so much complexity.

Hum... unless tagmanager provides a tree already I don't really see what
could help.

>> However, I still don't understand how TM does address the issue, and I
>> though it actually re-sorted the array on lookup if the attributes
>> changed.  And IIUC too, the attributes may change often if performing
>> search A and the B, then A again, the A, B, A, B and so forth.
> 
> The arrays are always sorted the same way, and only after tag creation.
> The search results are filtered by the lookup attributes.

Wee the other mail for the question about how it makes search fast when
not searching by name.

>>>> All this isn't of course written in stone: if we already redo a lot of
>>>> things, let's get something nice -- though IMO we'll always be better
>>>> than with tagmanager, since each time I wanted to touch it it took me
>>>> hours, and sometimes I even haven't been able to fix the problem
>>>> (thinking of e.g. scope completion...).
>>>
>>> I don't really see what the problem understanding it is. I thought scope
>>> completion was just tm_workspace_find_scoped and related functions, not
>>> some tagmanager-wide problem.
>>
>> It's probably not some TM-wide problem, but I tried to understand what
>> was the problem, but never got to understand that thing.  And certainly
>> not how I would add some more precision in the scope search, like same
>> file, close to the position, etc.
> 
> I think if a tag array was sorted by scope first, then by name, then the
> global name lookup still works as before, but allows O(log n) lookup of
> all tags with matching scope. So you lookup those scope tags, then you
> sort the results according to how you like, e.g. line number.

What about matching partial scopes?  I mean if one has the
"foo::bar::baz" scope, a completion for "bar::" should match that, but
that wouldn't work if simply sorted, would it?

>>> The other changes I've made were in response to a user's problem with
>>> generating C tag files, and some changes to the CTags files.
>>
>> Good.  However not stripping G_BEGIN_DECLS confuses the CTags parser and
>> makes particularly makes it think the first identifier after it has
>> "G_BEGIN_DECLS" type; which then may cause problems for e.g. scope
>> completion.
> 
> Are you sure? I tested this and it seemed Ok as my change adds those
> tags to the c_ignore_tags automatically.

No, I'm not completely sure, maybe it was another problem actually.
I'll check this more carefully.