grep'ing '[^\x00-\x7F]' will show you everything that is not in the ascii range. You are negating the set so it's not going to show you \x00 that's null and represented in the ascii range. Is there are a reason there are so many non ascii characters in this file? If there is other code hitting this are they "polluting" it? Beyond learning how to fix it, you probably should find out why this is happening.

You could try deleting the null bytes, to see if that helps:

cat file | tr -d '\000' > new_file 

diff file new_file

You can also try segmenting the file (for testing) into smaller chunks until you find the problem area. You may want to use od for a closer inspection.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.