[Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins
magnetudinbuda at xxxxx
Fri May 29 11:40:10 UTC 2015
> I wonder if this algorithm should be applied to all searches, and thus
> be integrated into scintilla.
The KMP may be used to find all occurrences of a string P, into a string T.
Say P is "hello", then KMP may be used to find all occurrences of "hello",
in another string T, T may be the document scintilla is displaying. You
could use KMP to also do case insensitive search, if first you change P to
be all upper case, and then when you use T[i], you change T[i] to be upper
case as well. This are two use-cases that KMP may do. I do not know
all possibilities scintilla defines for searching. This may be analyzed.
KMP may be also used to find occurrences iteratively, once you compute the
prefix table, you can record the state of the KMP search in a structure,
instead of writing the KMP in one function. I did this in
> Does it have any major drawbacks? I read it has to some kind
> "prefix table" prior to running the search, but I guess that's
> negligible for all reasonable search terms?
I think KMP does not have drawbacks. The prefix table is of length m+1,
m is the length of the string P for which you wish to find all occurrences
in some string T. For example, P is "hello", m == 5, and you build a
prefix table of length m+1 == 6. Then T may be any long string, for the
long string you do not need any additional memory. You only need the extra
m+1 size_t locations for the prefix table, m is length of P, the string you
wish to search.
Have a great day,
Marius Ioan Buzea
On Fri, 5/29/15, Thomas Martitz <kugel at rockbox.org> wrote:
Subject: Re: [Geany-Devel] pull request on GitHub, to add GeanyHighlightSelectedWords, into Geany Plugins
To: devel at lists.geany.org
Date: Friday, May 29, 2015, 1:49 PM
Am 29.05.2015 um 12:44
schrieb marius buzea:
With KMP it is possible to search all occurrences of a m
length string, into a n length string,
using O(m+n) machine operations. Next page:
> describes the algorithm.
> The KMP works well
with the utf-8 encoding of unicode. One property of
utf8 is that
> the encoding one unicode
symbol is not a substring of another utf8 substring.
> property allows to take the utf-8
encoding of the string you wish to search, and to
> find this utf8 encoding string, in the
utf8 encoding of the text string. Geany
> scintilla, and scintilla uses utf8
to encode the document it displays, and scintilla has
> a command that gives the raw utf8 byte
array for a [start, end) range. So, KMP
> gives great speed for searching all
occurrences, and may be used with the underlying
> text representation of scintilla used by
geany. The utf-8 encoding of a unicode
> string of length n, is less than 6n, each
utf8 encoding is at most 6 bytes.
I also think that including this functionality/feature into
Geany core would be a good choice.
would be a small tradeoff between keeping the core small,
and adding this new functionality,
this is your choice.
If you wish to extend automark, then this is good choice
too. If you wish, and if it helps,
> please reuse any part of the
implementation provided here:
> If needed, I would help.
What should I do next? Should I not do the
pull request for GeanyHighlightSelectedWord?
> It is okay with me.
GeanyHighlightSelectedWord would then be still available at
> Geany provides this
functionality from its core, or from automark.
I wonder if this algorithm should be applied to
all searches, and thus
be integrated into
scintilla. Does it have any major drawbacks? I read
it has to some kind "prefix table"
prior to running the search, but I
that's negligible for all reasonable search terms?
Devel mailing list
Devel at lists.geany.org
More information about the Devel