Dear Encoding specialist (or Colomban by default :)
The thread [1] has exposed a problem with the way Geany handles encodings that it gets from the extraction regexes and from the locale.
The reference thread shows that Geany gets an encoding CP1252 from the locale, this is fine for reading the file since iconv and therefore g_convert recognises it and loads it fine.
Places where Geany just uses the string such as the status bar also work fine.
But Geany does not recognise CP1252 (Geany only knows WINDOWS-1252) so places that use the encoding index default to UTF-8.
So Geany saves the encoding in the session file as UTF-8 which is incorrect. The document->set encoding menu also defaults to UTF-8 because it uses the index to select the option.
Geany could fix the sessions by just saving the encoding name not an index, but the menu is a problem and may need an "unknown (no change)" option.
Or Geany could refuse to use encodings that it doesn't recognise, but that is somewhat restrictive :(
And I don't know what other parts of Geany may also be affected.
Any other ideas?
Cheers Lex
[1] http://article.gmane.org/gmane.editors.geany.general/6285
On Tue, 25 Oct 2011 01:12:23 +1100, Lex wrote:
Dear Encoding specialist (or Colomban by default :)
Lol, this is a sweet. But please don't count my answer as "Enrico is an encoding specialist".
The thread [1] has exposed a problem with the way Geany handles encodings that it gets from the extraction regexes and from the locale.
Lex, thanks for spending time and efforts to track this issue down and all the debugging which was involved. Actually much appreciated even if otherwise not explicitly said. This is also true for the rest of the gang taking care of the geany-users mailing list in the last weeks/months. Yeah.
But Geany does not recognise CP1252 (Geany only knows WINDOWS-1252) so places that use the encoding index default to UTF-8.
I think the only real problem is that we get an encoding from the locale which doesn't match one of our predefined strings (at the top of src/encodings.c). And this is the only point we should fix, so that the further code relying on the index of the mentioned mapping keeps working.
Some quick ideas to find a solution for the problem:
- try to define whether this is a Windows-only problem or whether it might happen on non-Windows systems as well - we should review the way we retrieve the locale name from the system, for Windows in particular - try to create an additional mapping of possible other locale names which can be directly mapped to the known ones known by Geany*
* there is a file charset.alias or something with a similar name used by iconv, IIRC. And this file holds a mapping of alias names for encodings resp. charsets. I don't remember the details right now but on Windows it would be especially easy to distribute such an additional mapping. Though I still need to find some useful documentation on that and howto do it properly.
Regards, Enrico
2011/10/25 Enrico Tröger enrico.troeger@uvena.de:
On Tue, 25 Oct 2011 01:12:23 +1100, Lex wrote:
Dear Encoding specialist (or Colomban by default :)
Lol, this is a sweet. But please don't count my answer as "Enrico is an encoding specialist".
Of course Enricoding :)
The thread [1] has exposed a problem with the way Geany handles encodings that it gets from the extraction regexes and from the locale.
[...]
I think the only real problem is that we get an encoding from the locale which doesn't match one of our predefined strings (at the top of src/encodings.c). And this is the only point we should fix, so that the further code relying on the index of the mentioned mapping keeps working.
I think the problem is we have a set of pre-defined strings whose only real use should be to make the document->set encoding menu, but which have snuck into other areas.
Some quick ideas to find a solution for the problem:
- try to define whether this is a Windows-only problem or whether it
might happen on non-Windows systems as well
All my locales are UTF-8 so someone else will have to check that one.
AFAICT regex extracted charsets used to open the file will default to leaving the text alone if it happens to validate as utf-8 otherwise the same problem can happen. So its not just windows locales.
- we should review the way we retrieve the locale name from the system,
for Windows in particular
Regexes as well.
- try to create an additional mapping of possible other locale names
which can be directly mapped to the known ones known by Geany*
Thats possibly non-portable or a maintenance issue.
- there is a file charset.alias or something with a similar name used
by iconv, IIRC. And this file holds a mapping of alias names for encodings resp. charsets. I don't remember the details right now but on Windows it would be especially easy to distribute such an additional mapping. Though I still need to find some useful documentation on that and howto do it properly.
IIUC this is a GNU iconv artifact, but g_convert uses the system iconv if it exists so how to do it portably is the question.
Having a UTF-8 only system means I can't do anything about developing any fixes so someone else is going to have to do that, sorry.
Cheers Lex
On 10/24/2011 04:03 PM, Lex Trotman wrote:
2011/10/25 Enrico Trögerenrico.troeger@uvena.de:
On Tue, 25 Oct 2011 01:12:23 +1100, Lex wrote:
[snip all]
I came upon some stuff while trying to improve Mousepad's encoding support:
http://cihar.com/software/enca/doc/ch01.html
and also a little useful:
http://developer.gnome.org/glib/2.30/glib-I18N.html
Just some reading, I'm not sure they're entirely relevant.
Cheers, Matthew Brush
Hi All,
If no one has any better ideas I'll just fix the immediate problem by saving the text of the encoding in the files list instead of the index. Anything more complex can wait.
Since Geany doesn't seem to care that the file was loaded with an encoding it doesn't know and it uses the text of the encoding to pass to g_convert() when saving the file there should be no problems.
Cheers Lex