[Github-comments] [geany] Unable to watch patch file created byTortoiseSVN if path contains non ASCII characters (#873)

Colomban Wendling notifications at xxxxx
Mon Jan 18 13:52:47 UTC 2016


Well… what can we do?  No, we don't support files with multiple encodings in them (and I'm not aware of any tool handling that either).  How would you suggest Geany treat this file?

Magically recognizing which chunks are encoded in which encoding is not really viable, because detecting encodings is virtually impossible but for a few selected encodings (like UTF-8, but then again it can be opened just fine as e.g. CP1251, it's valid, at worse a bit odd).
The rocket scientists of this area are using statistics of most likely character occurrences to try and make the best choice, but here again it's purely statistical, and can easily be wrong (imagine a file in ISO-8859-1 with only `œ` in it, it's not statistically likely yet totally valid).  The best solution remains letting the user choose.  Or using a mostly unambiguous encoding, like UTF-8 :)

In the case of a diff file, we *could* probably either try and look at each file on disc and guess its encoding (if we can find it, unlikely as it's not an absolute path, and there's no guarantee the user viewing the file has the repository on his machine), or maybe "simply" recognize chunks in diffs and guess each separately.
This however presents several problems
+ requires special handling of diff files at the loading level
+ requires *very* special handling of diff files at the save level, to be able to do the proper conversion in the opposite direction.  This part would be especially tricky.
+ requires parsing the file even before converting it, which might or might not be a problem (assuming all encodings are ASCII-compatible, and diff only uses ASCII control characters, it should be doable)
+ doing all this leads to even more encoding guessing than what currently happens, leading to even more room for choosing the wrong one at some point; and it makes user override a lot harder.



---
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/873#issuecomment-172532037
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.geany.org/pipermail/github-comments/attachments/20160118/a08008af/attachment.html>


More information about the Github-comments mailing list