[geany/geany] grep couldn't find anything if used non UTF8 codepage (#1321)

List overview All Threads

newer

older

[geany/geany-plugins] Macros:...

Re: [geany/geany] [Discussion:...

Mikhail

24 Nov 2016 24 Nov '16

11:56 a.m.

grep couldn't find anything if used non UTF8 codepage

Example: https://youtu.be/OMs5XNwggTQ

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321

Attachments:

attachment.htm (text/html — 1.8 KB)

Show replies by date

Colomban Wendling

24 Nov 24 Nov

12:09 p.m.

Yeah, I know.

@eht16 I guess this is one of the reasons why *spawn* didn't use `W` variants (as it has a comment about encoding conversion):

...

GLib converts the argument vector to UNICODE. For non-UTF8 arguments, the result is often "Invalid string in argument vector at %d: %s: Invalid byte sequence in conversion input" (YMMV). Our tools (make, grep, gcc, ...) are "ANSI", so converting to UNICODE and then back only causes problems.

The thing is that to pass in the search string, we try and pass the binary representation in the specified encoding for the search. That works OK so long as the spawn code accepts arbitrary byte sequences (well, 0-terminated only), but not if it has to be valid. Not doing the input conversion on Windows seems to work OK for the input part, not so much in better results or output. Maybe we could mess a little with `LC_CTYPE` and set it to `LC_CTYPE=${LC_CTYPE%.*}.ourencoding`, maybe it'd help. Or maybe not, I don't know.

I guess this issue is an argument for @codebrainz's proposition of rewriting this with custom code ^^

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262751191

elextr

25 Nov 25 Nov

12:42 a.m.

Actually the whole FIF thing is broken if the files being searched are not in a known encoding, we know the locale encoding and we know our buffer is UTF-8, but grep doesn't know about encodings of files it searches. The random choice of UTF-8 or locale for the search string will only find the file if it is encoded the same way. This is NOT something we can fix, even with custom search. Better to just document it only works for files in UTF-8 or locale encoding, whichever we choose.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262856381

Mikhail

5:50 a.m.

I believe that the best way to fix this is add grep option with which the specified search string encoding. It also fix this: ![25 11 2016 09-49-49-375](https://cloud.githubusercontent.com/assets/200750/20614792/8ae283e8-b2f4-11e...)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262880405

Matthew Brush

6:31 a.m.

The Find in Files dialog allows you to choose encoding, and also allows specifying additional options for grep, if that helps.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262883638

Mikhail

6:39 a.m.

...

The Find in Files dialog allows you to choose encoding, and also allows specifying additional options for grep, if that helps.

Currently this option not passed to grep (because grep not support encoding option) this option used for convert source search string to target encoding and also used for decode grep output in utf8 (internal geany encoding). But this is wrong way because also decoded returned file names, which always returned in locale encoding.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262884415

Enrico Tröger

6:47 a.m.

I'm so tired of Windows, `grep` and weird encoding stuff. Not that is a solution, but our lives would be so much easier if everyone could finally use UTF-8.

I agree with @b4n some of the problems might be gone with a custom implemention of the FiF feature (beside eliminating the dependency to `grep`).

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262885097

Matthew Brush

6:50 a.m.

...

I agree with @b4n some of the problems might be gone with a custom implemention of the FiF feature

I'm trying to psych myself up to do it. I think it will be relatively trivial to implement the actual search, but I'm very worried how hard it will be actually integrate into the existing `search.c` code. I'd wager the integration will take 10x longer and result in more lines changed than all of the lines added in the actual new routines to perform the recursive search using Glib/GIO.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-262885371

Erick Paquin

29 Dec 29 Dec

11:36 p.m.

@NTMan , is this issue still valid? No updates since 2016. YouTube link is no longer working.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-1367618123 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/1321/1367618123@github.com

elextr

11:51 p.m.

@erickpaquin nothing has changed.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1321#issuecomment-1367622296 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/1321/1367622296@github.com

576

Age (days ago)

2802

Last active (days ago)

github-comments@lists.geany.org

9 comments

6 participants

tags (0)

participants (6)

Colomban Wendling
elextr
Enrico Tröger
Erick Paquin
Matthew Brush
Mikhail