[Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)
Zenaan Harkness
notifications at xxxxx
Mon Sep 19 11:12:41 UTC 2016
On Mon, Sep 19, 2016 at 03:30:09AM -0700, Colomban Wendling wrote:
> It's pretty messy but fair enough. However, we probably won't do
> that, because being able to have a fixed encoding in the data we load
> means that we have to handle encoding conversion in a single place,
> instead of everywhere something touches the data -- and there are a
> lot of code that does that, it's and editor after all.
> Also, as UTF-8 can represent virtually any textual data (anything
> inside Unicode), it would only help with invalid input (like here) or
> binary data (which probably would better be handled with a hex
> filter). So I'm afraid it won't happen.
> If someone has a nice solution though, I'd love to be proven wrong.
Well, thinking about it, if it was a wanted feature, I would do this as
follows:
- have the raw valid text as a UTF-8 (of course) "linear array"
(might be a window onto disk for large files, etc)
- indexing layers above this, to quickly identify graphemes, word
boundaries, line boundaries and any other points of interest, such as:
- "invalid bytes insertion points" along with the corresponding invalid
byte sequences
- this way, those parts of the program (most of them) that don't want
or need to handle invalid bytes, don't have to
- and you have an easy index to re-insert the invalid sequences on
saving, or some display/view onto the file that can represent
invalid bytes
- and you can offer easy options to the user such as "save without
invalid bytes" "encode invalid bytes according to some format" etc
Should be easy, and should also be how the program is implemented.
At least, that's how a superior programmer would implement it ;)
See for reference:
https://zenaan.github.io/zen/javadoc/zen/lang/string.html
Regards, and thanks again,
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/1238#issuecomment-247966199
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.geany.org/pipermail/github-comments/attachments/20160919/230219e5/attachment.html>
More information about the Github-comments
mailing list