[Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins

marius buzea magnetudinbuda at xxxxx
Fri May 29 11:40:10 UTC 2015




> I wonder if this algorithm should be applied to all searches, and thus 
> be integrated into scintilla. 

The KMP may be used to find all occurrences of a string P, into a string T.
Say P is "hello", then KMP may be used to find all occurrences of "hello",
in another string T, T may be the document scintilla is displaying.   You
could use KMP to also do case insensitive search, if first you change P to
be all upper case, and then when you use T[i], you change T[i] to be upper
case as well.     This are two use-cases that KMP may do.    I do not know
all possibilities scintilla defines for searching.    This may be analyzed.

KMP may be also used to find occurrences iteratively, once you compute the
prefix table, you can record the state of the KMP search in a structure,
instead of writing the KMP in one function.    I did this in
GeanyHighlighSelectedWord.



> Does it have any major drawbacks? I read  it has to some kind
> "prefix table" prior to running the search, but I  guess that's
> negligible for all reasonable search terms?

I think KMP does not have drawbacks.   The prefix table is of length m+1,
m is the length of the string P for which you wish to find all occurrences
in some string T.    For example, P is "hello", m == 5, and you build a
prefix table of length m+1 == 6.    Then T may be any long string, for the
long string you do not need any additional memory.   You only need the extra
m+1 size_t locations for the prefix table, m is length of P, the string you
wish to search.


Have a great day,
Marius Ioan Buzea






--------------------------------------------
On Fri, 5/29/15, Thomas Martitz <kugel at rockbox.org> wrote:

 Subject: Re: [Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins
 To: devel at lists.geany.org
 Date: Friday, May 29, 2015, 1:49 PM
 
 Am 29.05.2015 um 12:44
 schrieb marius buzea:
 > Hello,
 >
 >
 >
 With KMP it is possible to search all occurrences of a m
 length string, into a n length string,
 >
 using O(m+n) machine operations.    Next page:
 >          http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
 > describes the algorithm.
 >
 >
 >
 > The KMP works well
 with the utf-8 encoding of unicode.    One property of
 utf8 is that
 > the encoding one unicode
 symbol is not a substring of another utf8 substring.     
 This
 > property allows to take the utf-8
 encoding of the string you wish to search, and to
 > find this utf8 encoding string, in the
 utf8 encoding of the text string.       Geany
 uses
 > scintilla, and scintilla uses utf8
 to encode the document it displays, and scintilla has
 > a command that gives the raw utf8 byte
 array for a [start, end) range.      So, KMP
 > gives great speed for searching all
 occurrences, and may be used with the underlying
 > text representation of scintilla used by
 geany.     The utf-8 encoding of a unicode
 > string of length n, is less than 6n, each
 utf8 encoding is at most 6 bytes.
 >
 >
 >
 >
 I also think that including this functionality/feature into
 Geany core would be a good choice.
 > It
 would be a small tradeoff between keeping the core small,
 and adding this new functionality,
 > but
 this is your choice.
 >
 >
 >
 >
 If you wish to extend automark, then this is good choice
 too.   If you wish, and if it helps,
 > please reuse any part of the
 implementation provided here:
 >    http://sourceforge.net/p/geanyhighlightselectedword/code/HEAD/tree/trunk/GeanyHighlightSelectedWord/GeanyHighlightSelectedWord.c
 > If needed, I would help.
 >
 >
 >
 What should I do next?     Should I not do the
 pull request for GeanyHighlightSelectedWord?
 > It is okay with me.   
 GeanyHighlightSelectedWord would then be still available at
 sourceforge until
 > Geany provides this
 functionality from its core, or from automark.
 >
 
 
 I wonder if this algorithm should be applied to
 all searches, and thus 
 be integrated into
 scintilla. Does it have any major drawbacks? I read 
 it has to some kind "prefix table"
 prior to running the search, but I 
 guess
 that's negligible for all reasonable search terms?
 
 Best regards
 _______________________________________________
 Devel mailing list
 Devel at lists.geany.org
 https://lists.geany.org/cgi-bin/mailman/listinfo/devel
 


More information about the Devel mailing list