[Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins

Fri May 29 10:49:52 UTC 2015

Am 29.05.2015 um 12:44 schrieb marius buzea:
> Hello,
>
>
> With KMP it is possible to search all occurrences of a m length string, into a n length string,
> using O(m+n) machine operations.    Next page:
>          http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
> describes the algorithm.
>
>
>
> The KMP works well with the utf-8 encoding of unicode.    One property of utf8 is that
> the encoding one unicode symbol is not a substring of another utf8 substring.      This
> property allows to take the utf-8 encoding of the string you wish to search, and to
> find this utf8 encoding string, in the utf8 encoding of the text string.       Geany uses
> scintilla, and scintilla uses utf8 to encode the document it displays, and scintilla has
> a command that gives the raw utf8 byte array for a [start, end) range.      So, KMP
> gives great speed for searching all occurrences, and may be used with the underlying
> text representation of scintilla used by geany.     The utf-8 encoding of a unicode
> string of length n, is less than 6n, each utf8 encoding is at most 6 bytes.
>
>
>
> I also think that including this functionality/feature into Geany core would be a good choice.
> It would be a small tradeoff between keeping the core small, and adding this new functionality,
> but this is your choice.
>
>
>
> If you wish to extend automark, then this is good choice too.   If you wish, and if it helps,
> please reuse any part of the implementation provided here:
>    http://sourceforge.net/p/geanyhighlightselectedword/code/HEAD/tree/trunk/GeanyHighlightSelectedWord/GeanyHighlightSelectedWord.c
> If needed, I would help.
>
>
> What should I do next?     Should I not do the pull request for GeanyHighlightSelectedWord?
> It is okay with me.    GeanyHighlightSelectedWord would then be still available at sourceforge until
> Geany provides this functionality from its core, or from automark.
>

I wonder if this algorithm should be applied to all searches, and thus 
be integrated into scintilla. Does it have any major drawbacks? I read 
it has to some kind "prefix table" prior to running the search, but I 
guess that's negligible for all reasonable search terms?

Best regards