On Wed, 23 May 2007 21:17:03 +0300, Harri Koskinen geany_fi@fastmonkey.org wrote:
Hi,
I noticed that if I disable the NULL-check from the document.c file Geany then loads UTF16 and UTF32 encoded files correctly.
A small 'patch' is attached for quick & dirty testing :-)
Thanks. With your patch the test if (filedata->len != (gsize) st.st_size) will never be executed because filedata->len is exactly st.st_size. This works for UTF32 encoded files but it prevents completely opening files which just contain one or more NULL bytes. At the moment, UTF32 files can't be opened (I know) it isn't better. Two weeks ago, I spent about two or three days finding a better algorithm but without an acceptable result. The real problem is in the code to detect the character encodings. Because basically we could open files containing NULL bytes without problems but then the encoding detection fails.
I won't apply the patch because it only helps opening UTF-32, UTF-16 still fails. But I just committed a fix which at least enables opening of UTF-16 and UTF-32 encoded files with a valid BOM(Byte-Order-Mark).
We still need a better way to differentiate between files which just contains NULL bytes and files which are properly encoded in UTF-16/32 and therefore contain NULL bytes. Any pointers are welcome.
If anyone is interested in testing or improving the code, I attach a tarball with some test files in different encodings (don't wonder about the contents of these files, just test files ;-)).
Regards, Enrico