[Github-comments] [geany/geany] Geany encoding determination broken? (#2910)

drws notifications at xxxxx
Sat Oct 2 21:39:06 UTC 2021


> I explained there is no such thing

Of course there is backward compatibility in many 8-bit encodings. An ASCII-encoded file can be opened and treated as ISO-8859-1 for example (well, maybe not so easily in Geany...) since the former is a subset of the latter.

> "no specified encoding" might be a good label for the setting

Maybe in the `Set Encoding` submenu, but as you already mentioned, it is still misleading in the Preferences. Since you described it as _Don't try anything first, just search_, could it simply be named `Auto-detect`, `Auto-search` or something like that?

Also, doesn't the `Without encoding (None)` setting effectively nullify its parent `Use fixed encoding when opening non-Unicode files` and if so, is actually redundant? 

> the file happened to be a valid encoding

Autodetection is particularly sensitive to the first few bytes, but not in every case. I still think there's something wrong with encoding detection in Geany, setting the 8-bit values aside. I hoped OP examples would be convincing enough, but I'll provide more relevant examples when I encounter that again. 

> The encoding detection is not buggy, the files are

Oh, come on. As if Geany was a sticky notes application, not a versatile developer's tool. I agreed in the very beginning that hex editor was the correct generic answer, but the question remains whether Geany is really going to refuse opening hybrid files that it potentially could. Although it is apparent now that some additional hackery would be needed for that.

> the solution would be to fix the file

Of course and I'm doing that while also outputting an ordinary log file in the very same program. But this is a case of a debugging log with raw 8-bit values and 99.9% of content being ASCII and I was interested in the actual values above 128.

> the Geany buffer has to be valid UTF-8 with no embedded NULLs

I understand the NULL value limitation and even then I think the user should be actively notified and also given a chance to load file up to the first NULL occurrence, with a red data-loss warning (and possibly a `Save As...` shortcut) included of course. But not having a chance at all is not really a solution.

> the input must be valid in UTF-8

Setting the NULL limitation aside, we are talking about values 1-255, which are valid UTF-8. So theoretically any nonzero uint8_t data can be represented in such a buffer or am I mistaken somewhere?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/2910#issuecomment-932823912
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.geany.org/pipermail/github-comments/attachments/20211002/e9fc7833/attachment.htm>


More information about the Github-comments mailing list