<blockquote>
<p>I'm not sure why we accept invalid UTF-8 (well, it's structurally valid, but contains reserved code points),</p>
</blockquote>

<p>From pickyweedia "Not decoding surrogate halves makes it impossible to store invalid UTF-16, such as Windows filenames, as UTF-8. Therefore, detecting these as errors is often not implemented and there are attempts to define this behavior formally (see WTF-8 and CESU below)."</p>

<p>And Glib needs to round trip Windows filenames to/from UTF-8 so its reasonable that it doesn't object.</p>

<blockquote>
<p>Or we don't use the same thing for UTF-8 (GLib) than UTF-16 (iconv through GLib), and GLib is more forgiving.</p>
</blockquote>

<p>Well, for files that Geany thinks are UTF-8 (or is told by the user are UTF-8) we don't do a conversion, just validate, so its different in that way.</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/geany/geany/issues/1238#issuecomment-248172325">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/ABDrJzjeNLpDFlKVQynV7vz0n9ooct4aks5qry-RgaJpZM4KAGgV">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/ABDrJ63M0J5EoypK_O2u8wnClEa6S5Zaks5qry-RgaJpZM4KAGgV.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
  <link itemprop="url" href="https://github.com/geany/geany/issues/1238#issuecomment-248172325"></link>
  <meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/geany/geany","title":"geany/geany","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/geany/geany"}},"updates":{"snippets":[{"icon":"PERSON","message":"@elextr in #1238: \u003e I'm not sure why we accept invalid UTF-8 (well, it's structurally valid, but contains reserved code points),\r\n\r\nFrom pickyweedia \"Not decoding surrogate halves makes it impossible to store invalid UTF-16, such as Windows filenames, as UTF-8. Therefore, detecting these as errors is often not implemented and there are attempts to define this behavior formally (see WTF-8 and CESU below).\"\r\n\r\nAnd Glib needs to round trip Windows filenames to/from UTF-8 so its reasonable that it doesn't object.\r\n\r\n\u003e  Or we don't use the same thing for UTF-8 (GLib) than UTF-16 (iconv through GLib), and GLib is more forgiving.\r\n\r\nWell, for files that Geany thinks are UTF-8 (or is told by the user are UTF-8) we don't do a conversion, just validate, so its different in that way."}],"action":{"name":"View Issue","url":"https://github.com/geany/geany/issues/1238#issuecomment-248172325"}}}</script>