On Thu, 13 Nov 2008 19:36:50 +0100, Colomban Wendling ban-ubuntu@club-internet.fr wrote:
Enrico Tröger a écrit :
Oh, ok. Now that you say it, it's obvious. We pass always UTF-8 text to 'grep' but this doesn't match when the file is encoded in any other encoding (and you are using non-Ascii characters).
I added some code in SVN r3221 to provide an encoding list in the Find in Files dialog. The set encoding is used to convert the entered search text into and to display the search results. Actually, the entered search text can be in UTF-8 or in the specified encoding though I only tested it with UTF-8 text since this is the most common case (everything you select or copy from within Geany is UTF-8, always even the file encoding is something else).
Any feedback is welcome.
Regards, Enrico
Hi,
Stop me if I say anything stupid, but can't be the research pattern translated to the encoding of each file to match its encoding? It sounds me better than only provide an encoding choice, because choosing an encoding won't really help if some files are in another
Well, of course it'd be better if we would could know the encoding of each file, convert the search text into this encoding and then do the search. But there are a few problems with that: we run 'grep [options] search text' in the chosen directory. So, we run one command for all files in this directory (and maybe subdirectories). So we need one search text for all files. Additionally, to search every file with its own encoding would mean to read every file before to detect its encoding. So, we would read the file, detect its encoding and then search it with grep. Bah. Alternatively, to be more effective it'd be better to directly search the file after opened it to detect the encoding. But this would rewriting almost all of the current code.
And last but not least is there still our most loved problem of correctly detecting file encodings. This has never been worked reliable. (i.e. try to open a cp1251 encoded file in Geany, it opens as ISO-8859-1 except your system locale is cp1251 too).
encoding. Furthermore, sometimes users don't or won't care (and don't know) about file encodings, for example if they work with files created with another editor or another system.
I completely agree with you on that but I don't know a better way, see above.
Regards, Enrico