<p><code>grep</code>'ing <code>'[^\x00-\x7F]'</code> will show you everything that is not in the ascii range. You are negating the set so it's not going to show you <code>\x00</code> that's null and represented in the ascii range. Is there are a reason there are so many non ascii characters in this file? If there is other code hitting this are they "polluting" it? Beyond learning how to fix it, you probably should find out why this is happening.</p>
<p>You could try deleting the null bytes, to see if that helps:</p>
<pre><code>cat file | tr -d '\000' > new_file 

diff file new_file
</code></pre>
<p>You can also try segmenting the file (for testing) into smaller chunks until you find the problem area. You may want to use <code>od</code> for a closer inspection.</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/geany/geany/issues/1540#issuecomment-316847830">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/ABDrJ2s-4cBD_BN4k9Cd8KuwC6CI7QxGks5sP9Q0gaJpZM4OXZIY">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/ABDrJ3Rhjan7YvQn9JCHdmEnZk_diDp4ks5sP9Q0gaJpZM4OXZIY.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
  <link itemprop="url" href="https://github.com/geany/geany/issues/1540#issuecomment-316847830"></link>
  <meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/geany/geany","title":"geany/geany","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/geany/geany"}},"updates":{"snippets":[{"icon":"PERSON","message":"@AdamDanischewski in #1540: `grep`'ing `'[^\\x00-\\x7F]'` will show you everything that is not in the ascii range. You are negating the set so it's not going to show you `\\x00` that's null and represented in the ascii range. Is there are a reason there are so many non ascii characters in this file? If there is other code hitting this are they \"polluting\" it? Beyond learning how to fix it, you probably should find out why this is happening. \r\n\r\nYou could try deleting the null bytes, to see if that helps:  \r\n```\r\ncat file | tr -d '\\000' \u003e new_file \r\n\r\ndiff file new_file\r\n``` \r\nYou can also try segmenting the file (for testing) into smaller chunks until you find the problem area. You may want to use `od` for a closer inspection. \r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/geany/geany/issues/1540#issuecomment-316847830"}}}</script>