On 29 May 2015 at 23:22, Lex Trotman elextr@gmail.com wrote:
Does it have any major drawbacks? I read it has to some kind "prefix table" prior to running the search, but I guess that's negligible for all reasonable search terms?
I think KMP does not have drawbacks. The prefix table is of length m+1, m is the length of the string P for which you wish to find all occurrences in some string T. For example, P is "hello", m == 5, and you build a prefix table of length m+1 == 6. Then T may be any long string, for the long string you do not need any additional memory. You only need the extra m+1 size_t locations for the prefix table, m is length of P, the string you wish to search.
Its drawback is it is more complicated, and as I said you have to benchmark to see if its worth it, things that were worthwhile back when KMP was invented may not be worthwhile today.
I might add that the search time may be irrelevant compared to the time to display unless you have a huge buffer and a very small window.
Have a great day, Marius Ioan Buzea
On Fri, 5/29/15, Thomas Martitz kugel@rockbox.org wrote:
Subject: Re: [Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins To: devel@lists.geany.org Date: Friday, May 29, 2015, 1:49 PM
Am 29.05.2015 um 12:44 schrieb marius buzea:
Hello,
With KMP it is possible to search all occurrences of a m length string, into a n length string,
using O(m+n) machine operations. Next page:
http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
describes the algorithm.
The KMP works well
with the utf-8 encoding of unicode. One property of utf8 is that
the encoding one unicode
symbol is not a substring of another utf8 substring. This
property allows to take the utf-8
encoding of the string you wish to search, and to
find this utf8 encoding string, in the
utf8 encoding of the text string. Geany uses
scintilla, and scintilla uses utf8
to encode the document it displays, and scintilla has
a command that gives the raw utf8 byte
array for a [start, end) range. So, KMP
gives great speed for searching all
occurrences, and may be used with the underlying
text representation of scintilla used by
geany. The utf-8 encoding of a unicode
string of length n, is less than 6n, each
utf8 encoding is at most 6 bytes.
I also think that including this functionality/feature into Geany core would be a good choice.
It
would be a small tradeoff between keeping the core small, and adding this new functionality,
but
this is your choice.
If you wish to extend automark, then this is good choice too. If you wish, and if it helps,
please reuse any part of the
implementation provided here:
http://sourceforge.net/p/geanyhighlightselectedword/code/HEAD/tree/trunk/Gea... If needed, I would help.
What should I do next? Should I not do the pull request for GeanyHighlightSelectedWord?
It is okay with me.
GeanyHighlightSelectedWord would then be still available at sourceforge until
Geany provides this
functionality from its core, or from automark.
I wonder if this algorithm should be applied to all searches, and thus be integrated into scintilla. Does it have any major drawbacks? I read it has to some kind "prefix table" prior to running the search, but I guess that's negligible for all reasonable search terms?
Best regards _______________________________________________ Devel mailing list Devel@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/devel
Devel mailing list Devel@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/devel