[Geany-devel] Request: multithreaded tag generation?

Jiří Techet techet at xxxxx
Fri Nov 11 21:13:25 UTC 2011


Hi Harold,

On Mon, Nov 7, 2011 at 16:35, Harold Aling <geany at sait.nl> wrote:
> Dear Geany Devs,
>
> I recently switched from GeanyPRJ to Gproject. Since Gproject doesn't
> support multiple open projects I have to switch between projects, but
> it takes up to 4 minutes to close one project and open another. A
> project consists of roughly 1000-2000 php-related files.

How much of this time is spent by opening the project and how much by
closing? What time did the same take with granyprj? I think I have an
explanation for longer project close times - geanyprj didn't properly
remove the tags from the workspace and that's why the closing times
were faster. If you don't mind that the tags aren't freed from the
memory, you can comment-out lines 461 and 462 from gproject-project.c:

	if (g_prj->generate_tags)
		g_hash_table_foreach(g_prj->file_tag_table,
(GHFunc)workspace_remove_tag, NULL);

This should result in the same close times. If the open times are
longer too with gproject, I'll have to investigate what's wrong.

Now the interesting part about the tag manager is that from some
number of files, removing file tags from the workspace takes longer
than adding them. Without properly studying tag manager sources, I
think I know what the reason is: it looks as if the hierarchy of tags
was connected by pointers in one direction only so it is easy to find
a particular tag but when you have a tag and you want to find which
tags refer to this tag, you have to go through all the tags to find
references to the particular tag. This is what happens when you want
to remove a tag from the workspace because you need to delete all
references to this tag.

I have also experienced something like quadratic time complexity when
generating tags for many files - the time needed to insert the tags
definitely doesn't grow linearly with the number of files. This means
that the problem isn't slow parsing of files by the ctags part of the
tag manager but rather the creation of the hierarchy by the tag
manager part.

This also means that parsing the files in parallel won't help - it's
the tag hierarchy creation which is slow for high number of files. And
finding some fine-grained locking so more threads can create the
hierarchy in parallel is a complete overkill.

Having a good look at the tag manager, understanding the code and
finding a way to avoid this quadratic time behavior (possibly by
rewriting it completely) is something I'd really love to do (could be
a lot of fun). The only problem is time right now (I might have more
time in a few months but no promisses).

Cheers,
Jiri



More information about the Devel mailing list