<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">>As Enrico said above, Geany will
not load a file containing NULs, thats one of the >causes of
the "binary file" error message, so check if the files contain <br>
<br>
That indeed seems to be the problem.<br>
Appears that when Windows saves an email as text,<br>
it puts \x{00} at the end of the file<br>
which persuades Geany to open in encoding UTF-16LE<br>
<br>
At all events, I have added the line<br>
s/\x{00}//;<br>
to my Perl script that does DOS to Unix,<br>
and, for all the files that I have tried so far,<br>
Geany is now happy.<br>
<br>
Many thanks<br>
<br>
Richard H<br>
<br>
On 10/22/2013 02:40 AM, Lex Trotman wrote:<br>
</div>
<blockquote
cite="mid:CAKhWKDPUT35HMXXDHsOKwZ5=FPJof8cBoVMeFfoQUsfgGn3HZg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On 22 October 2013 05:07, Enrico
Tröger <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:enrico.troeger@uvena.de" target="_blank">enrico.troeger@uvena.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<div class="im"><br>
>How do I get Geany to recognize (Linux text) files<br>
>as UTF-8 encoded?<br>
><br>
>The files in question are legacy Windows txt files,<br>
>written in French (i.e. with lots of accents)<br>
>which I have converted to mode: Unix (LF)
encoding:UTF-8<br>
>by a Perl script that does<br>
><br>
> "iconv -f CP1252 -t UTF-8 --output=$tempfile
$infile"<br>
>and<br>
> "dos2unix -n -f $tempfile $outfile"<br>
<br>
</div>
-f for 'force binary files'? Geany can't handle binary
files.<br>
</blockquote>
<div><br>
</div>
<div>In default convert mode --ascii I believe dos2unix
expects only ascii chars, so it needs a -f to make it
accept UTF-8 encodings. Given that this is running on the
output of iconv this *should* be ok, unless the original
files contained NULs or was not CP1252.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
<br>
>It appears that if the infile has a final \x{OA}
character,<br>
>then this arrives in the outfile.<br>
<br>
</div>
\x0A ist \n, hard to imagine this really confuses Geany
that much.<br>
</blockquote>
<div><br>
</div>
<div>Especially as we have an option to add this to files
when they are saved :)</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
><br>
>I can open these files with JEdit or Kate, no
problem.<br>
>But Geany's behaviour with such files is
inconsistent.<br>
><br>
>Sometimes Geany refuses to do anything,<br>
>saying "... does not look like a text<br>
>file, or the file encoding is not supported",<br>
><br>
>Sometimes Geany renders the file using encoding<br>
>UTF-16 LE, which makes it look as if written in<br>
>Mandarin Chinese.<br>
</div>
</blockquote>
<div><br>
</div>
<div>This sort of thing happens to me with Windows files
that have *not* been converted to UTF-8, are you *sure*
the iconv was successful? Are the files CP1252 or maybe
ISO-8859-1 or some other code page?</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
><br>
>And sometimes Geany opens such 'problem' files
correctly,<br>
>as UTF-8. So far as I can see, this tends to be the<br>
>case if there are already several txt files open.<br>
</div>
</blockquote>
<div><br>
</div>
<div>Do you mean the behaviour changes for a particular file
depending on if there are already several text files open?</div>
<div>
<br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
><br>
>I have tried putting the line /*
geany_encoding=utf-8 */<br>
>as line 1 of a problem file, but that does not seem
to<br>
>have any consistent effect.<br>
<br>
</div>
Without having a look at the code, I was sure in-file
headers would<br>
take precedence over guessed encodings.<br>
</blockquote>
<div><br>
</div>
<div>Your memory is fine Enrico :)</div>
<div><br>
</div>
<div>The order (in the absence of a user forced selection)
is:</div>
<div><br>
</div>
<div>1) Use the encoding the regex found, *if it converts
and validates*. For files with the line above it should
be consistent, especially as there is a first try special
case for utf-8 that validates. That is unless the file
contains NULs or had a conversion error from the regex
matched encoding or won't validate as UTF-8, in which case
Geany assumes that the regex just matched some random text
and so goes on to try the steps below.</div>
<div><br>
</div>
<div>2) Use the encoding in the locale, if it converts
without error and validates. What locale do you have set?</div>
<div><br>
</div>
<div>3) Get desperate :) try each encoding in the list (in
the order of the menu->document->set encodings->*
list) first successful conversion to successfully validate
wins. This heuristic is probably where you are getting
strange encodings selected.</div>
<div><br>
</div>
<div>Some further things to try, in the open dialog, Geany
gives you the chance to select the encoding to use. Do
your "problematic" files work if you select UTF-8 instead
of "detect"? </div>
<div><br>
</div>
<div>As Enrico said above, Geany will not load a file
containing NULs, thats one of the causes of the "binary
file" error message, so check if the files contain NULs.
Gedit does accept NULs IIUC.</div>
<div> </div>
<div><br>
</div>
<div>Cheers</div>
<div>Lex</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Anyway, it's quuite hard to help here without knowing
about what files<br>
we are talking here.<br>
Could you share some of the problematic files? If not
possible in<br>
public, at least via private mail?<br>
<br>
<br>
Regards,<br>
Enrico<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Not sent from my smartphone.<br>
</font></span>
<div class="HOEnZb">
<div class="h5">_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@lists.geany.org">Users@lists.geany.org</a><br>
<a moz-do-not-send="true"
href="https://lists.geany.org/cgi-bin/mailman/listinfo/users"
target="_blank">https://lists.geany.org/cgi-bin/mailman/listinfo/users</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@lists.geany.org">Users@lists.geany.org</a>
<a class="moz-txt-link-freetext" href="https://lists.geany.org/cgi-bin/mailman/listinfo/users">https://lists.geany.org/cgi-bin/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
</body>
</html>