[Geany-Users] How to recognize encoding?

Lex Trotman elextr at xxxxx
Tue Oct 22 00:40:19 UTC 2013


On 22 October 2013 05:07, Enrico Tröger <enrico.troeger at uvena.de> wrote:

> Hi,
>
> >How do I get Geany to recognize (Linux text) files
> >as UTF-8 encoded?
> >
> >The files in question are legacy Windows txt files,
> >written in French (i.e. with lots of accents)
> >which I have converted to   mode: Unix (LF)   encoding:UTF-8
> >by a Perl script that does
> >
> >     "iconv -f CP1252 -t UTF-8 --output=$tempfile $infile"
> >and
> >     "dos2unix -n -f $tempfile $outfile"
>
> -f for 'force binary files'? Geany can't handle binary files.
>

In default convert mode --ascii I believe dos2unix expects only ascii
chars, so it needs a -f to make it accept UTF-8 encodings.  Given that this
is running on the output of iconv this *should* be ok, unless the original
files contained NULs or was not CP1252.



>
>
> >It appears that if the infile has a final \x{OA} character,
> >then this arrives in the outfile.
>
> \x0A ist \n, hard to imagine this really confuses Geany that much.
>

Especially as we have an option to add this to files when they are saved :)



>
> >
> >I can open these files with JEdit or Kate, no problem.
> >But Geany's behaviour with such files is inconsistent.
> >
> >Sometimes Geany refuses to do anything,
> >saying "... does not look like a text
> >file, or the file encoding is not supported",
> >
> >Sometimes Geany renders the file  using encoding
> >UTF-16 LE, which makes it look as if written in
> >Mandarin Chinese.
>

This sort of thing happens to me with Windows files that have *not* been
converted to UTF-8, are you *sure* the iconv was successful?  Are the files
CP1252 or maybe ISO-8859-1 or some other code page?



> >
> >And sometimes Geany opens such 'problem' files correctly,
> >as UTF-8. So far as I can see, this tends to be the
> >case if there are already several txt files open.
>

Do you mean the behaviour changes for a particular file depending on if
there are already several text files open?



> >
> >I have tried putting the line /* geany_encoding=utf-8 */
> >as line 1 of a problem file, but that does not seem to
> >have any consistent effect.
>
> Without having a look at the code, I was sure in-file headers would
> take precedence over guessed encodings.
>

Your memory is fine Enrico :)

The order (in the absence of a user forced selection) is:

1) Use the encoding the regex found, *if it converts and validates*.  For
files with the line above it should be consistent, especially as there is a
first try special case for utf-8 that validates.  That is unless the file
contains NULs or had a conversion error from the regex matched encoding or
won't validate as UTF-8, in which case Geany assumes that the regex just
matched some random text and so goes on to try the steps below.

2) Use the encoding in the locale, if it converts without error and
validates.  What locale do you have set?

3) Get desperate :) try each encoding in the list (in the order of the
menu->document->set encodings->* list) first successful conversion to
successfully validate wins.  This heuristic is probably where you are
getting strange encodings selected.

Some further things to try, in the open dialog, Geany gives you the chance
to select the encoding to use.  Do your "problematic" files work if you
select UTF-8 instead of "detect"?

As Enrico said above, Geany will not load a file containing NULs, thats one
of the causes of the "binary file" error message, so check if the files
contain NULs.  Gedit does accept NULs IIUC.


Cheers
Lex


> Anyway, it's quuite hard to help here without knowing about what files
> we are talking here.
> Could you share some of the problematic files? If not possible in
> public, at least via private mail?
>
>
> Regards,
> Enrico
>
> --
> Not sent from my smartphone.
> _______________________________________________
> Users mailing list
> Users at lists.geany.org
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geany.org/pipermail/users/attachments/20131022/88e485f2/attachment.html>


More information about the Users mailing list