Enrico Tröger a écrit :
On Thu, 23 Oct 2008 18:57:09 +0200, Enrico Tröger enrico.troeger@uvena.de wrote:
About the g_strcasecmp() problem: I don't think the right solution is to copy the old, broken code from GLib into Geany. As the docs mention, there is a reason why this function is deprecated and should not used. IMO we should rewrite the code which uses this function with the alternative functions mentioned in the docs.
Update: Yesterday, I did some work on this and wrote utils_str_casecmp() which uses g_utf8_casefold() and g_utf8_collate() as suggested in the GLib API docs. It seems to work, for simple examples actually better than g_strcasecmp(). But the new code is noticeably slower, about factor 100 (!).
When starting Geany with a fresh configuration and opening about 30 files with different filetypes, we almost have 1000 calls to this function (by only counting calls which actually pass g_utf8_collate()). (we could eliminate many of these calls by changing utils_atob() to be case-sensitive or something smarter, this is used when reading the filetype definition files to evaluate false/true strings into booleans, maybe it's enough to check for true and TRUE, and treat anything else as false.)
I've attached a testcase with some simple test strings to test the correctness of both implementations. These show the new code works better but please note that the code uses only simple example strings, without more complex unicode characters).
The testcase additionally does some speed comparisons of both functions, so you can easily see the big difference in execution time. Simply change the value of the CALLS macro to change the amount of calls of both functions. Note: the gap in execution time of both implementations might be even bigger if the test strings are not encoded in UTF-8 but any other encoding and so have to be passed through g_locale_to_utf8().
Conclusion: I'm not sure anymore if it's worth to use this code because of the execution time. Assuming (read: hoping) that most users will use Ascii-only characters in filenames (for which the code is mostly used) it maybe isn't worth. OTOH utils_str_casecmp() can surely be still improved to get it run faster.
There is another possibility :
see the attached testcase, an updated version of the first one. It has an additional variant of the code, utils_str_casecmp_easy() which simply converts the strings to compare into lowercase using g_utf8_strdown() and then compare them with strcmp(). This is to some extend also 'broken' because it doesn't take different locale-specific case sorting rules into account. But therefore it is noticeable faster than the full variant. It's only about ten times slower than g_strcasecmp() which is IMO not that bad and so I'm thinking this could go in.
Regards, Enrico
Hi,
I think this is a good compromise, and I think most people using Geany will not be annoyed by the approximation since most programmers just use full ASCII filenames. And even with non-ASCII filenames, the approximation is IMO very acceptable.
Regards, Colomban