How do I get Geany to recognize (Linux text) files as UTF-8 encoded?
The files in question are legacy Windows txt files, written in French (i.e. with lots of accents) which I have converted to mode: Unix (LF) encoding:UTF-8 by a Perl script that does
"iconv -f CP1252 -t UTF-8 --output=$tempfile $infile" and "dos2unix -n -f $tempfile $outfile"
It appears that if the infile has a final \x{OA} character, then this arrives in the outfile.
I can open these files with JEdit or Kate, no problem. But Geany's behaviour with such files is inconsistent.
Sometimes Geany refuses to do anything, saying "... does not look like a text file, or the file encoding is not supported",
Sometimes Geany renders the file using encoding UTF-16 LE, which makes it look as if written in Mandarin Chinese.
And sometimes Geany opens such 'problem' files correctly, as UTF-8. So far as I can see, this tends to be the case if there are already several txt files open.
I have tried putting the line /* geany_encoding=utf-8 */ as line 1 of a problem file, but that does not seem to have any consistent effect.
The other options re encoding: -Using the file open dialog -Using the "Reload as" menu item -Using the "Set encoding" menu item
are not really ideal for browsing lots and lots of files.
Richard H
Hi,
How do I get Geany to recognize (Linux text) files as UTF-8 encoded?
The files in question are legacy Windows txt files, written in French (i.e. with lots of accents) which I have converted to mode: Unix (LF) encoding:UTF-8 by a Perl script that does
"iconv -f CP1252 -t UTF-8 --output=$tempfile $infile"
and "dos2unix -n -f $tempfile $outfile"
-f for 'force binary files'? Geany can't handle binary files.
It appears that if the infile has a final \x{OA} character, then this arrives in the outfile.
\x0A ist \n, hard to imagine this really confuses Geany that much.
I can open these files with JEdit or Kate, no problem. But Geany's behaviour with such files is inconsistent.
Sometimes Geany refuses to do anything, saying "... does not look like a text file, or the file encoding is not supported",
Sometimes Geany renders the file using encoding UTF-16 LE, which makes it look as if written in Mandarin Chinese.
And sometimes Geany opens such 'problem' files correctly, as UTF-8. So far as I can see, this tends to be the case if there are already several txt files open.
I have tried putting the line /* geany_encoding=utf-8 */ as line 1 of a problem file, but that does not seem to have any consistent effect.
Without having a look at the code, I was sure in-file headers would take precedence over guessed encodings.
Anyway, it's quuite hard to help here without knowing about what files we are talking here. Could you share some of the problematic files? If not possible in public, at least via private mail?
Regards, Enrico
On 22 October 2013 05:07, Enrico Tröger enrico.troeger@uvena.de wrote:
Hi,
How do I get Geany to recognize (Linux text) files as UTF-8 encoded?
The files in question are legacy Windows txt files, written in French (i.e. with lots of accents) which I have converted to mode: Unix (LF) encoding:UTF-8 by a Perl script that does
"iconv -f CP1252 -t UTF-8 --output=$tempfile $infile"
and "dos2unix -n -f $tempfile $outfile"
-f for 'force binary files'? Geany can't handle binary files.
In default convert mode --ascii I believe dos2unix expects only ascii chars, so it needs a -f to make it accept UTF-8 encodings. Given that this is running on the output of iconv this *should* be ok, unless the original files contained NULs or was not CP1252.
It appears that if the infile has a final \x{OA} character, then this arrives in the outfile.
\x0A ist \n, hard to imagine this really confuses Geany that much.
Especially as we have an option to add this to files when they are saved :)
I can open these files with JEdit or Kate, no problem. But Geany's behaviour with such files is inconsistent.
Sometimes Geany refuses to do anything, saying "... does not look like a text file, or the file encoding is not supported",
Sometimes Geany renders the file using encoding UTF-16 LE, which makes it look as if written in Mandarin Chinese.
This sort of thing happens to me with Windows files that have *not* been converted to UTF-8, are you *sure* the iconv was successful? Are the files CP1252 or maybe ISO-8859-1 or some other code page?
And sometimes Geany opens such 'problem' files correctly, as UTF-8. So far as I can see, this tends to be the case if there are already several txt files open.
Do you mean the behaviour changes for a particular file depending on if there are already several text files open?
I have tried putting the line /* geany_encoding=utf-8 */ as line 1 of a problem file, but that does not seem to have any consistent effect.
Without having a look at the code, I was sure in-file headers would take precedence over guessed encodings.
Your memory is fine Enrico :)
The order (in the absence of a user forced selection) is:
1) Use the encoding the regex found, *if it converts and validates*. For files with the line above it should be consistent, especially as there is a first try special case for utf-8 that validates. That is unless the file contains NULs or had a conversion error from the regex matched encoding or won't validate as UTF-8, in which case Geany assumes that the regex just matched some random text and so goes on to try the steps below.
2) Use the encoding in the locale, if it converts without error and validates. What locale do you have set?
3) Get desperate :) try each encoding in the list (in the order of the menu->document->set encodings->* list) first successful conversion to successfully validate wins. This heuristic is probably where you are getting strange encodings selected.
Some further things to try, in the open dialog, Geany gives you the chance to select the encoding to use. Do your "problematic" files work if you select UTF-8 instead of "detect"?
As Enrico said above, Geany will not load a file containing NULs, thats one of the causes of the "binary file" error message, so check if the files contain NULs. Gedit does accept NULs IIUC.
Cheers Lex
Anyway, it's quuite hard to help here without knowing about what files we are talking here. Could you share some of the problematic files? If not possible in public, at least via private mail?
Regards, Enrico
-- Not sent from my smartphone. _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users