[Geany-devel] Find in files - Re: Patches required by gproject
Jiří Techet
techet at xxxxx
Tue Jun 22 13:19:02 UTC 2010
On Tue, Jun 22, 2010 at 13:46, Nick Treleaven
<nick.treleaven at btinternet.com> wrote:
> On Mon, 21 Jun 2010 20:53:58 +0200
> Jiří Techet <techet at gmail.com> wrote:
>
>> >> Having a filetype pattern in the find in files dialog could be useful.
>> >> Note that --include is a GNU grep extension, so a blank file pattern
>> >> should be the default and should not generate the option to grep so as
>> >> to maintain portability.
>> >
>> > Geany currently passes a list of all files and directories to Grep. I
>>
>> Does it? To me it looks it only calls grep in the given directory.
>
> If the -r (GNU extension) recursive option is set, it just passes '.'
> as the path instead of any filenames.
>
> Without -r, I'm not sure how to call Grep without passing a list of
> files. Maybe I'm missing something, but that was why I implemented
> passing all filenames. (See how search_get_argv() is used).
>
I see, you meant the non-recursive version.
>> > think it may be best if Geany does the filtering (and hence also the
>> > recursing). Also we may want to always filter out hidden files and
>> > broken links.
>>
>> There is one problem here - the command line may be too long. By posix
>> ARG_MAX is at least 4096 but this will be definitely too little for
>> thousands of files. There are three options how to solve this:
>>
>> 1. Call grep separately for every single file. This is too slow. I
>> tested something like that some time ago and for about 10000 source
>> files it takes 30 seconds just to execute grep so many times. To find
>> e.g. "torvalds" in all *.c;*.h files of linux kernel using -r and
>> --include it takes only 2 seconds (the first search is always slow
>> because the files have to be read from disk, but any subsequent search
>> is really fast as the files are cached by the OS). The speed is very
>> important for me.
>>
>> 2. Use xargs - this introduces one more external dependency for geany
>> so it probably isn't the preferable solution.
>>
>> 3. Implement some alternative to xargs and call grep repeatably only
>> for as many files that can be passed on command line.
>>
>> Only (3) seems to be a reasonable solution but it means some extra
>> work. Right now I find the easiest way to implement it using --include
>> - if no pattern or * pattern is specified, --include will be omitted
>> (as Lex suggested) and no error reported if the grep doesn't support
>> it.
>
> AFAICT, you still have to pass filenames even when using --include
> (when non-recursive).
>
> Personally we've not had complaints about the argument length limit, so
> I'm not too bothered about it yet. Also modern systems may extend the
> POSIX limit, but obviously we can't rely on that. Maybe wait and see if
> it's a problem?
The problem is already here for big projects. You don't see it if you
call it for a single directory - the number of files in one directory
isn't usually so huge but if you recursively search the whole project
directory structure, you may get a lot of source files. At work I use
SLES9 which has ARG_MAX=131072 and the project has about 8000 source
files. If you assume the path is about 40 characters (can be more with
projects with deep directory structure), you get 8000*40=320000 and
you're over the limit.
Personally I find using the gnu extensions much more safe. After
looking at grep's git history, I found that --include was introduced
in 2001 (GTK 2 didn't exist at that time and linux kernel 2.4 was just
freshly released). On the other hand the ARG_MAX limit was increased
quite recently in kernel 2.6.23 in 2007 (see
http://www.in-ulm.de/~mascheck/various/argmax/ to see the limits for
other platforms) and older kernels are still in use. I'm also not
aware of any non-gnu implementations of grep (that might not implement
--include).
To me it looks much easier to use what's already present in grep right
now and if this isn't satisfactory, it could be reimplemented the way
you propose in the future. In addition if no patterns are specified,
geany will behave the same way it did until now so there is a sane
fallback solution for people with very old grep.
Regards,
Jiri
>
> Regards,
> Nick
> _______________________________________________
> Geany-devel mailing list
> Geany-devel at uvena.de
> http://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
>
More information about the Devel
mailing list