[Geany-devel] Search in files & encodings

Enrico Tröger enrico.troeger at xxxxx
Fri Nov 14 17:40:27 UTC 2008


On Thu, 13 Nov 2008 19:36:50 +0100, Colomban Wendling
<ban-ubuntu at club-internet.fr> wrote:

>Enrico Tröger a écrit :
>> Oh, ok. Now that you say it, it's obvious. We pass always UTF-8 text
>> to 'grep' but this doesn't match when the file is encoded in any
>> other encoding (and you are using non-Ascii characters).
>>
>> I added some code in SVN r3221 to provide an encoding list in the
>> Find in Files dialog.
>> The set encoding is used to convert the entered search text into and
>> to display the search results.
>> Actually, the entered search text can be in UTF-8 or in the specified
>> encoding though I only tested it with UTF-8 text since this is the
>> most common case (everything you select or copy from within Geany is
>> UTF-8, always even the file encoding is something else).
>>
>> Any feedback is welcome.
>>
>>
>> Regards,
>> Enrico
>>   
>Hi,
>
>Stop me if I say anything stupid, but can't be the research pattern
>translated to the encoding of each file to match its encoding?
>It sounds me better than only provide an encoding choice, because
>choosing an encoding won't really help if some files are in another

Well, of course it'd be better if we would could know the encoding of
each file, convert the search text into this encoding and then do the
search.
But there are a few problems with that:
we run 'grep [options] search text' in the chosen directory. So, we run
one command for all files in this directory (and maybe subdirectories).
So we need one search text for all files.
Additionally, to search every file with its own encoding would mean to
read every file before to detect its encoding. So, we would read the
file, detect its encoding and then search it with grep. Bah.
Alternatively, to be more effective it'd be better to directly search
the file after opened it to detect the encoding. But this would
rewriting almost all of the current code.

And last but not least is there still our most loved problem of
correctly detecting file encodings. This has never been worked reliable.
(i.e. try to open a cp1251 encoded file in Geany, it opens as
ISO-8859-1 except your system locale is cp1251 too).


>encoding. Furthermore, sometimes users don't or won't care (and don't
>know) about file encodings, for example if they work with files created
>with another editor or another system.

I completely agree with you on that but I don't know a better way, see
above.



Regards,
Enrico

-- 
Get my GPG key from http://www.uvena.de/pub.asc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.geany.org/pipermail/devel/attachments/20081114/106190f4/attachment.pgp>


More information about the Devel mailing list