2011/10/25 Enrico Tröger enrico.troeger@uvena.de:
On Tue, 25 Oct 2011 01:12:23 +1100, Lex wrote:
Dear Encoding specialist (or Colomban by default :)
Lol, this is a sweet. But please don't count my answer as "Enrico is an encoding specialist".
Of course Enricoding :)
The thread [1] has exposed a problem with the way Geany handles encodings that it gets from the extraction regexes and from the locale.
[...]
I think the only real problem is that we get an encoding from the locale which doesn't match one of our predefined strings (at the top of src/encodings.c). And this is the only point we should fix, so that the further code relying on the index of the mentioned mapping keeps working.
I think the problem is we have a set of pre-defined strings whose only real use should be to make the document->set encoding menu, but which have snuck into other areas.
Some quick ideas to find a solution for the problem:
- try to define whether this is a Windows-only problem or whether it
might happen on non-Windows systems as well
All my locales are UTF-8 so someone else will have to check that one.
AFAICT regex extracted charsets used to open the file will default to leaving the text alone if it happens to validate as utf-8 otherwise the same problem can happen. So its not just windows locales.
- we should review the way we retrieve the locale name from the system,
for Windows in particular
Regexes as well.
- try to create an additional mapping of possible other locale names
which can be directly mapped to the known ones known by Geany*
Thats possibly non-portable or a maintenance issue.
- there is a file charset.alias or something with a similar name used
by iconv, IIRC. And this file holds a mapping of alias names for encodings resp. charsets. I don't remember the details right now but on Windows it would be especially easy to distribute such an additional mapping. Though I still need to find some useful documentation on that and howto do it properly.
IIUC this is a GNU iconv artifact, but g_convert uses the system iconv if it exists so how to do it portably is the question.
Having a UTF-8 only system means I can't do anything about developing any fixes so someone else is going to have to do that, sorry.
Cheers Lex