Re: [Geany] Geany UTF-16/32 bug and a possible "fix"

24 May 2007


      On Wed, 23 May 2007 21:17:03 +0300, Harri Koskinen
geany_fi@fastmonkey.org wrote:
Hi,
...
I noticed that if I disable the NULL-check from the document.c file
Geany then loads UTF16 and UTF32 encoded files correctly.
A small 'patch' is attached for quick & dirty testing :-)
Thanks.
With your patch the test
if (filedata->len != (gsize) st.st_size)
will never be executed because filedata->len is exactly st.st_size.
This works for UTF32 encoded files but it prevents completely opening
files which just contain one or more NULL bytes. At the moment, UTF32
files can't be opened (I know) it isn't better. Two weeks ago, I spent
about two or three days finding a better algorithm but without an
acceptable result. The real problem is in the code to detect the
character encodings. Because basically we could open files containing
NULL bytes without problems but then the encoding detection fails.
I won't apply the patch because it only helps opening UTF-32, UTF-16
still fails. But I just committed a fix which at least enables opening
of UTF-16 and UTF-32 encoded files with a valid BOM(Byte-Order-Mark).
We still need a better way to differentiate between files which just
contains NULL bytes and files which are properly encoded in UTF-16/32
and therefore contain NULL bytes. Any pointers are welcome.
If anyone is interested in testing or improving the code, I attach a
tarball with some test files in different encodings (don't wonder
about the contents of these files, just test files ;-)).
Regards,
Enrico
-- 
Get my GPG key from http://www.uvena.de/pub.key

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Geany] Geany UTF-16/32 bug and a possible "fix"