[Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins
Lex Trotman
elextr at xxxxx
Fri May 29 13:22:03 UTC 2015
>
>> Does it have any major drawbacks? I read it has to some kind
>> "prefix table" prior to running the search, but I guess that's
>> negligible for all reasonable search terms?
>
> I think KMP does not have drawbacks. The prefix table is of length m+1,
> m is the length of the string P for which you wish to find all occurrences
> in some string T. For example, P is "hello", m == 5, and you build a
> prefix table of length m+1 == 6. Then T may be any long string, for the
> long string you do not need any additional memory. You only need the extra
> m+1 size_t locations for the prefix table, m is length of P, the string you
> wish to search.
Its drawback is it is more complicated, and as I said you have to
benchmark to see if its worth it, things that were worthwhile back
when KMP was invented may not be worthwhile today.
>
>
> Have a great day,
> Marius Ioan Buzea
>
>
>
>
>
>
> --------------------------------------------
> On Fri, 5/29/15, Thomas Martitz <kugel at rockbox.org> wrote:
>
> Subject: Re: [Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins
> To: devel at lists.geany.org
> Date: Friday, May 29, 2015, 1:49 PM
>
> Am 29.05.2015 um 12:44
> schrieb marius buzea:
> > Hello,
> >
> >
> >
> With KMP it is possible to search all occurrences of a m
> length string, into a n length string,
> >
> using O(m+n) machine operations. Next page:
> > http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
> > describes the algorithm.
> >
> >
> >
> > The KMP works well
> with the utf-8 encoding of unicode. One property of
> utf8 is that
> > the encoding one unicode
> symbol is not a substring of another utf8 substring.
> This
> > property allows to take the utf-8
> encoding of the string you wish to search, and to
> > find this utf8 encoding string, in the
> utf8 encoding of the text string. Geany
> uses
> > scintilla, and scintilla uses utf8
> to encode the document it displays, and scintilla has
> > a command that gives the raw utf8 byte
> array for a [start, end) range. So, KMP
> > gives great speed for searching all
> occurrences, and may be used with the underlying
> > text representation of scintilla used by
> geany. The utf-8 encoding of a unicode
> > string of length n, is less than 6n, each
> utf8 encoding is at most 6 bytes.
> >
> >
> >
> >
> I also think that including this functionality/feature into
> Geany core would be a good choice.
> > It
> would be a small tradeoff between keeping the core small,
> and adding this new functionality,
> > but
> this is your choice.
> >
> >
> >
> >
> If you wish to extend automark, then this is good choice
> too. If you wish, and if it helps,
> > please reuse any part of the
> implementation provided here:
> > http://sourceforge.net/p/geanyhighlightselectedword/code/HEAD/tree/trunk/GeanyHighlightSelectedWord/GeanyHighlightSelectedWord.c
> > If needed, I would help.
> >
> >
> >
> What should I do next? Should I not do the
> pull request for GeanyHighlightSelectedWord?
> > It is okay with me.
> GeanyHighlightSelectedWord would then be still available at
> sourceforge until
> > Geany provides this
> functionality from its core, or from automark.
> >
>
>
> I wonder if this algorithm should be applied to
> all searches, and thus
> be integrated into
> scintilla. Does it have any major drawbacks? I read
> it has to some kind "prefix table"
> prior to running the search, but I
> guess
> that's negligible for all reasonable search terms?
>
> Best regards
> _______________________________________________
> Devel mailing list
> Devel at lists.geany.org
> https://lists.geany.org/cgi-bin/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.geany.org
> https://lists.geany.org/cgi-bin/mailman/listinfo/devel
More information about the Devel
mailing list