Hi All,
I have been investigating bug report "suspected memory leak - ID: 3415254" and have gathered the following data (before my patience ran out). The tests reported here are with Git 55646df8 unless otherwise noted.
Geany left all day without any activity (with files open) did not increase in memory usage (and the same with 0.20, the reported version).
I then tried opening and closing geany/src/* repeatedly. The following table shows Geany memory usage (in Mb) when all files are closed and when all open and the difference. The last column shows that the first two times extra memory was used to tab through all the open files, but after that no more was used.
closed open delta tabbing 5.1 27.9 22.8 1.1 28.8 31.2 0.4 1.1 32.1 33.0 0.9 0.0 32.8 34.8 2.0 0.0 34.6 35.9 1.3 0.0 35.7 36.4 0.7 0.0 36.2 37.4 1.2 0.0 37.2 38.1 0.9 0.0 37.9 41.8 2.9 0.0 41.6 42.3 0.7 0.0 42.0 42.8 0.8 0.0 42.6 43.3 0.7 0.0 43.1 44.3 1.2 0.0 44.1 44.7 0.4 0.0 44.5 45.7 1.2 0.0
To me the key points are:
1. each time all files are closed only 200k is returned to the OS, but since the first open uses 22.8Mb but later ones less, Geany isn't leaking all of it, it is being held by the allocators.
2. ignoring the first open, the amount of extra memory used to open all the files varied from 0.4Mb to 2.9Mb. Average 1.1Mb or 5% of the initial 22.8Mb and standard deviation of 0.665Mb or 3% of the 22.8Mb.
3. Simple leaks are unlikely to cause the large deviation in the memory increase, each open of the same set of files would tend to leak the same amount. It is therefore likely that fragmentation effects cause the large deviation, but it may not be the whole cause of the increase.
Since Geany uses three different allocators, each of which has differing policies for holding onto memory, it is going to be difficult to separate real leaks from allocator effects.
For the bug reporter to have accumulated 300Mb of memory over "a few" days would have needed about 500 file opens per day, but maybe somewhat less as editing increases the memory usage.
So I don't think we have to worry excessively that we have a major leak, but keep an eye open for any possible problems.
Note that all my testing was with no plugins and only a couple of filetypes. There may be leaks and other effects in plugins and/or other Scintilla lexers. If the OP posts more information we may know more.
Cheers Lex
On 14/10/2011 04:09, Lex Trotman wrote:
Hi All,
I have been investigating bug report "suspected memory leak - ID: 3415254"
http://sourceforge.net/tracker/?func=detail&atid=787791&aid=3415254&...
- each time all files are closed only 200k is returned to the OS, but
since the first open uses 22.8Mb but later ones less, Geany isn't leaking all of it, it is being held by the allocators.
- ignoring the first open, the amount of extra memory used to open
all the files varied from 0.4Mb to 2.9Mb. Average 1.1Mb or 5% of the initial 22.8Mb and standard deviation of 0.665Mb or 3% of the 22.8Mb.
- Simple leaks are unlikely to cause the large deviation in the
memory increase, each open of the same set of files would tend to leak the same amount. It is therefore likely that fragmentation effects cause the large deviation, but it may not be the whole cause of the increase.
Since Geany uses three different allocators, each of which has differing policies for holding onto memory, it is going to be difficult to separate real leaks from allocator effects.
Interesting.
I'd like to continue the discussion from the report about allocator usage:
Lex: "I think probably most of the fragmentation is due to having three allocators in use, libc, Glib and libc++, and they not sharing or returning memory they acquire. This is likely to outweigh the cost/benefit of switching to g_slice given the effort required since types or sizes are needed in all frees."
Firstly, we already are using g_slice quite a lot - probably the most useful place is for allocating TMTag structures. Obviously GLib uses it internally.
Secondly, see: http://developer.gnome.org/glib/2.28/glib-Memory-Slices.html#glib-Memory-Sli...
"This is accompanied by extra caching logic to keep freed memory around for some time before returning it to the system."
Thirdly, GLib recommends using g_slice for fixed size allocations, and IIUC g_slice completely prevents fragmentation, besides being faster (anecdotally about 2x).
I don't know if converting our other uses of g_new for fixed size structures is worth it, but I would say it is good practice to use g_slice in all new code.
Le 14/10/2011 16:24, Nick Treleaven a écrit :
[...]
I don't know if converting our other uses of g_new for fixed size structures is worth it, but I would say it is good practice to use g_slice in all new code.
I think if we allocate a struct a significant amount of times it could be a good thing, yep. However if it's a rare allocation I doubt (according to GLib's docs) using GSlice really make much sense, so maybe it isn't worth converting all uses. But I agree we should probably use GSlice where we g_new() a structure more than once.
Cheers, Colomban
[...]
Since Geany uses three different allocators, each of which has differing policies for holding onto memory, it is going to be difficult to separate real leaks from allocator effects.
Interesting.
I'd like to continue the discussion from the report about allocator usage:
Lex: "I think probably most of the fragmentation is due to having three allocators in use, libc, Glib and libc++, and they not sharing or returning memory they acquire. This is likely to outweigh the cost/benefit of switching to g_slice given the effort required since types or sizes are needed in all frees."
Firstly, we already are using g_slice quite a lot - probably the most useful place is for allocating TMTag structures. Obviously GLib uses it internally.
Sure using gslice will help but, as you say below, its unlikely to be worth the effort just for this purpose, doing it for speed and other reasons might be useful but thats another discussion.
1. gslice is good when you allocate lots of things the same size, and you know the size, since that must be supplied to slice_free, but a large portion of our allocations are strings, so we don't remember the size (see for eg tagmanager).
2. Even if all of Geany used gslice, libc still uses system malloc so we can't get rid of it (and we need to be careful not to mix these memories)
3. Scintilla still uses the C++ allocator, which isn't malloc/free (IIUC its what is sometimes called a SLUB allocator mixed with a Lea allocator, ie a cross between g_slice and malloc)
Secondly, see: http://developer.gnome.org/glib/2.28/glib-Memory-Slices.html#glib-Memory-Sli...
"This is accompanied by extra caching logic to keep freed memory around for some time before returning it to the system."
Yes, it waits 15 seconds before freeing empty slabs back to the system, thats what I mean by "each of which has differing policies for holding onto memory". This is actually observable, but it only frees about 100k of the 28Mb. malloc and C++ work differently.
Thirdly, GLib recommends using g_slice for fixed size allocations, and IIUC g_slice completely prevents fragmentation, besides being faster (anecdotally about 2x).
No, even the glib developers don't claim it completely prevents fragmentation :) It completely prevents *internal* fragmentation within a slab since all chunks are the same size, but no slab can be returned to the system or re-used for another size if it has any chunks still in use. And this form of fragmentation is very likely since Geany and GTK and tagmanager etc all hold on to random bits of memory (because they need to, its not a bug), thus blocking release of slabs.
I don't know if converting our other uses of g_new for fixed size structures is worth it, but I would say it is good practice to use g_slice in all new code.
Agree, I don't think we are going to be able to totally cure the "problem" but we can waste lots of effort on it.
@Colomban, g_slice benefits come from allocating the same *size* structures, they don't need to be the same structure, but its much harder to analyse since strings are all sorts of sizes.
Cheers Lex
Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
On 15/10/2011 00:08, Lex Trotman wrote:
- gslice is good when you allocate lots of things the same size, and
you know the size, since that must be supplied to slice_free, but a large portion of our allocations are strings, so we don't remember the size (see for eg tagmanager).
Yes, by fixed size alloc I meant fixed size types, not strings.
Secondly, see: http://developer.gnome.org/glib/2.28/glib-Memory-Slices.html#glib-Memory-Sli...
"This is accompanied by extra caching logic to keep freed memory around for some time before returning it to the system."
Yes, it waits 15 seconds before freeing empty slabs back to the system, thats what I mean by "each of which has differing policies for holding onto memory". This is actually observable, but it only frees about 100k of the 28Mb. malloc and C++ work differently.
Thirdly, GLib recommends using g_slice for fixed size allocations, and IIUC g_slice completely prevents fragmentation, besides being faster (anecdotally about 2x).
No, even the glib developers don't claim it completely prevents fragmentation :) It completely prevents *internal* fragmentation
Agreed.
within a slab since all chunks are the same size, but no slab can be returned to the system or re-used for another size if it has any chunks still in use. And this form of fragmentation is very likely since Geany and GTK and tagmanager etc all hold on to random bits of memory (because they need to, its not a bug), thus blocking release of slabs.
Yep, this is a problem.
I don't know if converting our other uses of g_new for fixed size structures is worth it, but I would say it is good practice to use g_slice in all new code.
Agree, I don't think we are going to be able to totally cure the "problem" but we can waste lots of effort on it.
Actually it probably depends on the context as to whether g_slice is a good idea.
Le 14/10/2011 05:09, Lex Trotman a écrit :
Hi All,
Hey,
I ran Valgrind quite a lot on Geany lately -- though I never ran it in production because it is slooooow --, and thus will comment with what I know and have seen.
I have been investigating bug report "suspected memory leak - ID: 3415254" and have gathered the following data (before my patience ran out). The tests reported here are with Git 55646df8 unless otherwise noted.
Geany left all day without any activity (with files open) did not increase in memory usage (and the same with 0.20, the reported version).
I then tried opening and closing geany/src/* repeatedly. The following table shows Geany memory usage (in Mb) when all files are closed and when all open and the difference. The last column shows that the first two times extra memory was used to tab through all the open files, but after that no more was used.
closed open delta tabbing 5.1 27.9 22.8 1.1 28.8 31.2 0.4 1.1 32.1 33.0 0.9 0.0 32.8 34.8 2.0 0.0 34.6 35.9 1.3 0.0 35.7 36.4 0.7 0.0 36.2 37.4 1.2 0.0 37.2 38.1 0.9 0.0 37.9 41.8 2.9 0.0 41.6 42.3 0.7 0.0 42.0 42.8 0.8 0.0 42.6 43.3 0.7 0.0 43.1 44.3 1.2 0.0 44.1 44.7 0.4 0.0 44.5 45.7 1.2 0.0
To me the key points are:
- each time all files are closed only 200k is returned to the OS, but
since the first open uses 22.8Mb but later ones less, Geany isn't leaking all of it, it is being held by the allocators.
Geany at least keeps all the GeanyDocuments alive and tries to re-use them (document_create() at line 564). This avoids having to re-allocate the document struct everytime, though it'll then never release this memory. However, either the user won't open so much files at once or she's likely to keep having a large amount of open files, so I don't think it's a (the?) problem -- and anyway I guess the allocation would be quite small. However, we'd probably better use GSlice here (for fragmentation and maybe caching), and maybe even release the memory to GSlice so it does all the management, and could even release it to the system at some point if it makes sense to it.
Also note that we never release the global tags, so it's likely the first run loads those and be part of the 22Mb. However, on my installation the C global tags only use something like 400k in memory.
- ignoring the first open, the amount of extra memory used to open
all the files varied from 0.4Mb to 2.9Mb. Average 1.1Mb or 5% of the initial 22.8Mb and standard deviation of 0.665Mb or 3% of the 22.8Mb.
- Simple leaks are unlikely to cause the large deviation in the
memory increase, each open of the same set of files would tend to leak the same amount. It is therefore likely that fragmentation effects cause the large deviation, but it may not be the whole cause of the increase.
We are not the only possible source of leaks. At least Valgrind pointed GTK leaks quite a lot in the file chooser dialog.
Also, it seems FontConfig, Pango or Scintilla (either can misuse the one down) have a few leaks -- I think it's not (much of) Scintilla's fault, but I don't know it very well to debug it. Some looks like one-time leaks, so probably unfreed initializations, some other seems to happen more often.
Since I only speak of "definitely lost" leaks (as Valgrind calls them), this should not include too much false-positives, though it still can -- we had a few "definitely lost" leaks on Geany that I have/could fix but that would not reduce the real memory usage since they would only be freed at closing time anyways.
Since Geany uses three different allocators, each of which has differing policies for holding onto memory, it is going to be difficult to separate real leaks from allocator effects.
You could easily reduce this to 2 by using the G_SLICE=always-malloc environment variable when running the tests. This makes the GSlice API simply wrap plain malloc/free for easier debugging ;)
For the bug reporter to have accumulated 300Mb of memory over "a few" days would have needed about 500 file opens per day, but maybe somewhat less as editing increases the memory usage.
That's the problem of profiling and testing, we generally do this on unreal situations... either by lack of time or because it has other constraints, but heh, still not real.
So I don't think we have to worry excessively that we have a major leak, but keep an eye open for any possible problems.
Agreed, but keeping a eye on memory usage is always a good thing :)
Note that all my testing was with no plugins and only a couple of filetypes. There may be leaks and other effects in plugins and/or other Scintilla lexers. If the OP posts more information we may know more.
I also mostly tested C and python, not much others.
Just a few thoughts/points.
Cheers, Colomban
[...]
Geany at least keeps all the GeanyDocuments alive and tries to re-use them (document_create() at line 564). This avoids having to re-allocate the document struct everytime, though it'll then never release this memory. However, either the user won't open so much files at once or she's likely to keep having a large amount of open files, so I don't think it's a (the?) problem -- and anyway I guess the allocation would be quite small.
But what that might be doing is preventing release of whole slabs back to the system.
However, we'd probably better use GSlice here (for fragmentation and maybe caching), and maybe even release the memory to GSlice so it does all the management, and could even release it to the system at some point if it makes sense to it.
We should probably just release anything we are finished with and let the allocator do its thing, not try to second guess it, we'll always be wrong :)
Also note that we never release the global tags, so it's likely the first run loads those and be part of the 22Mb. However, on my installation the C global tags only use something like 400k in memory.
That probably comes under the category of "need" to keep.
- ignoring the first open, the amount of extra memory used to open
all the files varied from 0.4Mb to 2.9Mb. Average 1.1Mb or 5% of the initial 22.8Mb and standard deviation of 0.665Mb or 3% of the 22.8Mb.
- Simple leaks are unlikely to cause the large deviation in the
memory increase, each open of the same set of files would tend to leak the same amount. It is therefore likely that fragmentation effects cause the large deviation, but it may not be the whole cause of the increase.
We are not the only possible source of leaks. At least Valgrind pointed GTK leaks quite a lot in the file chooser dialog.
Well Valgrind and GTK don't entirely get on, so false positives are common. Also according to gdb, invoking the file dialog creates another thread somewhere, so Valgrind may be seeing the thread stack.
Also, it seems FontConfig, Pango or Scintilla (either can misuse the one down) have a few leaks -- I think it's not (much of) Scintilla's fault, but I don't know it very well to debug it. Some looks like one-time leaks, so probably unfreed initializations, some other seems to happen more often.
Scintilla certainly does once-only initialisations that are not returned, thats a fairly common C++ idiom. And I would expect fontfonfig to cache stuff too, so it is going to be hard to debug. I think all we should do is report it if we are reasonably sure the problem exists, even if we can't find the mechanism.
Since I only speak of "definitely lost" leaks (as Valgrind calls them), this should not include too much false-positives, though it still can -- we had a few "definitely lost" leaks on Geany that I have/could fix but that would not reduce the real memory usage since they would only be freed at closing time anyways.
Well, in this bug nobody is complaining about the actual memory usage, just the growth over time.
Since Geany uses three different allocators, each of which has differing policies for holding onto memory, it is going to be difficult to separate real leaks from allocator effects.
You could easily reduce this to 2 by using the G_SLICE=always-malloc environment variable when running the tests. This makes the GSlice API simply wrap plain malloc/free for easier debugging ;)
Tried with this, average growth was almost identical, but the deviation was about half that previously observed. Makes sense.
For the bug reporter to have accumulated 300Mb of memory over "a few" days would have needed about 500 file opens per day, but maybe somewhat less as editing increases the memory usage.
That's the problem of profiling and testing, we generally do this on unreal situations... either by lack of time or because it has other constraints, but heh, still not real.
Or onset of boredom :)
So I don't think we have to worry excessively that we have a major leak, but keep an eye open for any possible problems.
Agreed, but keeping a eye on memory usage is always a good thing :)
BTW, I noticed after opening all those files, often a few seconds after they were open there was a few seconds of 400% cpu usage (its a quad core machine). Since nothing we do uses this sort of parallel processing, I wonder if its a memory manager busily trying to coalesce adjacent chunks. After all 95% of the memory is re-used.
Cheers Lex
PS Colomban, did you see my question on mio use?
On 14/10/2011 19:26, Colomban Wendling wrote:
- each time all files are closed only 200k is returned to the OS, but
since the first open uses 22.8Mb but later ones less, Geany isn't leaking all of it, it is being held by the allocators.
Geany at least keeps all the GeanyDocuments alive and tries to re-use them (document_create() at line 564). This avoids having to re-allocate the document struct everytime, though it'll then never release this memory. However, either the user won't open so much files at once or she's likely to keep having a large amount of open files, so I don't think it's a (the?) problem -- and anyway I guess the allocation would be quite small.
Exactly, 100 GeanyDocument is still small. Also we don't reuse the private document structures. The reason for caching geanydocument structs is not efficiency, but so we can inspect doc->is_valid even after a document is closed.
In hindsight I don't think this was really a good idea, but it came about mainly for backward compatibility with existing code (originally the document array was a fixed size).
At the time we could have fixed the code that relied on geanydocuments never being freed, but now I think it might be too late.
However, we'd probably better use GSlice here (for fragmentation and maybe caching), and maybe even release the memory to GSlice so it does all the management, and could even release it to the system at some point if it makes sense to it.
If we didn't need to keep geanydocuments around, I think g_slice would be Ok for this. As it is we could change the geanydocumentprivate struct to use it, but it might not help much.