Hi!
I saw that "Remove duplicate lines - simple?" item on the Plugin wishlist, and thought I'd have a go at it this weekend.
The results are at https://github.com/unwind/geanyuniq, and I would love some feedback.
There is a huge known issue: because I tried to be clever(TM) and install the menu item in a suitable place (rather than just at the bottom of the Tools menu), this will likely break for localized versions of Geany. Not good.
My idea on how to fix that is to make it customizable where to insert the menu, so the user can enter the properly localized text to search for. Other ideas? Is it "by design" that it's so hard for plugins to add commands to arbitrary locations within Geany's menus? The GIMP does this differently, with an abstract "menu path" concept that makes it portable and easy to add items wherever.
Ignoring the menu item issue, the command is by default bound to Shift+Control+D, and always runs over the entire document, ignoring (and actually removing) the selection. Does this make sense? Should it run only over the selection, if one is present? It outputs a line of text to the Status window saying how many lines were deleted (if any were deleted), is that a good idea, or annoying/spammy?
Thanks!
Regards,
/Emil
On Mon, Feb 20, 2012 at 10:14 AM, Frank Lanitz frank@frank.uvena.de wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
Should it run only over the selection, if one is present?
Yes. selection given: Only act on selection non selection given: whole document.
Right. That does make sense, but there a few intricacies involved that made me postpone it a bit:
1. Deleting lines in the middle of the selection will make it a bit less well-defined where the selection ends, basically. 2. The selection might both start in the middle of a line, and in the middle of a line, which makes it a bit weird for a strict line-oriented operation like this.
Still, I'll try to implement something that makes sense for selections, too. Thanks.
Regards,
/Emil
On 02/20/2012 10:57 AM, Emil Brink wrote:
On Mon, Feb 20, 2012 at 10:14 AM, Frank Lanitzfrank@frank.uvena.de wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
Should it run only over the selection, if one is present?
Yes. selection given: Only act on selection non selection given: whole document.
Right. That does make sense, but there a few intricacies involved that made me postpone it a bit:
- Deleting lines in the middle of the selection will make it a bit
less well-defined where the selection ends, basically. 2. The selection might both start in the middle of a line, and in the middle of a line, which makes it a bit weird for a strict line-oriented operation like this.
Still, I'll try to implement something that makes sense for selections, too. Thanks.
Require the selection to be of whole lines. Else do nothing and properly inform the user, since it makes no sense (as ypou say).
Denis
Am 20.02.2012 11:19, schrieb spir:
On 02/20/2012 10:57 AM, Emil Brink wrote:
On Mon, Feb 20, 2012 at 10:14 AM, Frank Lanitzfrank@frank.uvena.de wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
Should it run only over the selection, if one is present?
Yes. selection given: Only act on selection non selection given: whole document.
Right. That does make sense, but there a few intricacies involved that made me postpone it a bit:
- Deleting lines in the middle of the selection will make it a bit
less well-defined where the selection ends, basically. 2. The selection might both start in the middle of a line, and in the middle of a line, which makes it a bit weird for a strict line-oriented operation like this.
Still, I'll try to implement something that makes sense for selections, too. Thanks.
Require the selection to be of whole lines. Else do nothing and properly inform the user, since it makes no sense (as ypou say).
Right. I was going to suggest doing it like the toggle comment command. Don't consider the exact seleciton but the lines which are selected, even if only partially selected. I.e. expand the selection to the whole line internally.
E.g. (X indicates selection, - is an unselected character) -------- ---XX XXX XX--- -------
Line 2,3 and 4 would be counted.
On 02/21/2012 11:45 AM, Thomas Martitz wrote:
Am 20.02.2012 11:19, schrieb spir:
On 02/20/2012 10:57 AM, Emil Brink wrote:
On Mon, Feb 20, 2012 at 10:14 AM, Frank Lanitzfrank@frank.uvena.de wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
Should it run only over the selection, if one is present?
Yes. selection given: Only act on selection non selection given: whole document.
Right. That does make sense, but there a few intricacies involved that made me postpone it a bit:
- Deleting lines in the middle of the selection will make it a bit
less well-defined where the selection ends, basically. 2. The selection might both start in the middle of a line, and in the middle of a line, which makes it a bit weird for a strict line-oriented operation like this.
Still, I'll try to implement something that makes sense for selections, too. Thanks.
Require the selection to be of whole lines. Else do nothing and properly inform the user, since it makes no sense (as ypou say).
Right. I was going to suggest doing it like the toggle comment command. Don't consider the exact seleciton but the lines which are selected, even if only partially selected. I.e. expand the selection to the whole line internally.
E.g. (X indicates selection, - is an unselected character)
---XX XXX XX---
Line 2,3 and 4 would be counted.
I guess if the selection is not of whole lines (probably selecting from the margin, then, instead of in text), you'd better do nothing (and at best inform the user).
Denis
Am 21.02.2012 12:40, schrieb spir:
On 02/21/2012 11:45 AM, Thomas Martitz wrote:
Am 20.02.2012 11:19, schrieb spir:
On 02/20/2012 10:57 AM, Emil Brink wrote:
On Mon, Feb 20, 2012 at 10:14 AM, Frank Lanitzfrank@frank.uvena.de wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
Should it run only over the selection, if one is present?
Yes. selection given: Only act on selection non selection given: whole document.
Right. That does make sense, but there a few intricacies involved that made me postpone it a bit:
- Deleting lines in the middle of the selection will make it a bit
less well-defined where the selection ends, basically. 2. The selection might both start in the middle of a line, and in the middle of a line, which makes it a bit weird for a strict line-oriented operation like this.
Still, I'll try to implement something that makes sense for selections, too. Thanks.
Require the selection to be of whole lines. Else do nothing and properly inform the user, since it makes no sense (as ypou say).
Right. I was going to suggest doing it like the toggle comment command. Don't consider the exact seleciton but the lines which are selected, even if only partially selected. I.e. expand the selection to the whole line internally.
E.g. (X indicates selection, - is an unselected character)
---XX XXX XX---
Line 2,3 and 4 would be counted.
I guess if the selection is not of whole lines (probably selecting from the margin, then, instead of in text), you'd better do nothing (and at best inform the user).
This is not how toggle comment (and a number of other commands) works. And I like how it works. And I think consistency is a good thing.
Best regards.
[snip]
This is not how toggle comment (and a number of other commands) works. And I like how it works. And I think consistency is a good thing.
I seem to have unleashed a bike shed! :) Greatly enjoing the feedback, thanks a lot.
I agree with Thomas, that sounds like a reasonable behavior to just interpret the selection as being the whole line. Incidentally, that's quite trivial to implement. :)
I have already updated the code to remove the sneaky menu-insertion, now the plugin just appends to the Tools menu, and there is no longer a hardcoded key binding. Thanks.
Regards,
/Emil
I agree with Thomas, that sounds like a reasonable behavior to just interpret the selection as being the whole line. Incidentally, that's quite trivial to implement. :)
Done now, the code is up on GitHub. I even remembered to update the README to describe how the selection is treated.
Again, feedback very welcome, if someone could took the time to see if it builds, and behaves the way they expect that would be great.
Thanks!
/Emil
Am 21.02.2012 20:19, schrieb Emil Brink:
I agree with Thomas, that sounds like a reasonable behavior to just interpret the selection as being the whole line. Incidentally, that's quite trivial to implement. :)
Done now, the code is up on GitHub. I even remembered to update the README to describe how the selection is treated.
Again, feedback very welcome, if someone could took the time to see if it builds, and behaves the way they expect that would be great.
Just checked out last version (4a6751) and thought it might could be useful if not only duplicates right behind each other can be cleaned up. E.g. a list like
foo baa baa foo
shall become
foo baa
Cheers, Frank
[...]
Again, feedback very welcome, if someone could took the time to see if it builds, and behaves the way they expect that would be great.
Just checked out last version (4a6751) and thought it might could be useful if not only duplicates right behind each other can be cleaned up. E.g. a list like
foo baa baa foo
shall become
foo baa
Thanks! That's an interesting idea for sure, but I think I consider that a different function altogether. Perhaps the menu item should be re-labelled to more clearly indicate what the action is, though. I can see how one might expect the operation you describe, the total removal of duplicate lines from the document/selection, from the current label. Hm. Suggestions welcome, here. :)
Regards,
/Emil
Am 23.02.2012 17:59, schrieb Frank Lanitz:
Am 21.02.2012 20:19, schrieb Emil Brink:
I agree with Thomas, that sounds like a reasonable behavior to just interpret the selection as being the whole line. Incidentally, that's quite trivial to implement. :)
Done now, the code is up on GitHub. I even remembered to update the README to describe how the selection is treated.
Again, feedback very welcome, if someone could took the time to see if it builds, and behaves the way they expect that would be great.
Just checked out last version (4a6751) and thought it might could be useful if not only duplicates right behind each other can be cleaned up. E.g. a list like
foo baa baa foo
shall become
foo baa
Without having tried the plugin (but just reading this thread) I would have expected to do be able to do this actually.
Best regards
[delete only adjacent vs delete all duplicates globally]
Without having tried the plugin (but just reading this thread) I would have expected to do be able to do this actually.
I have now updated the code to add a second command ("Delete All Duplicate Lines") which does delete all duplicate lines, not only those that are adjacent. The implementation is based on a Bloom filter, which seems to work pretty well for the test cases I've used.
As always, any and all feedback is welcome.
Regards,
/Emil
On 02/20/2012 10:14 AM, Frank Lanitz wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
Should it run only over the selection, if one is present?
Yes. selection given: Only act on selection non selection given: whole document.
+1
Take user actions into account. If they don't want it, it's easy enough to un-select ;-)
Emil Brink wrote:
I saw that "Remove duplicate lines - simple?" item on the Plugin wishlist, and thought I'd have a go at it this weekend. [...]
Without trying it, it sounds like a nice tool. For the minimalist *nix hackers amongst us, however, the following does nicely as a command on the "Edit -> Format -> Send Selection" to menu:
uniq
But it doesn't have the features that you describe for your plugin, e.g. it only says this in the Messages window:
20:20:56: Passing data and executing custom command: uniq
Am 20.02.2012 10:22, schrieb Ross McKay:
Without trying it, it sounds like a nice tool. For the minimalist *nix hackers amongst us, however, the following does nicely as a command on the "Edit -> Format -> Send Selection" to menu:
uniq
sort -u is doing nearly the same but also sorting in one rush ;)
Cheers, Frank
Am 20.02.2012 09:49, schrieb Emil Brink:
I saw that "Remove duplicate lines - simple?" item on the Plugin wishlist, and thought I'd have a go at it this weekend.
Nice ;)
The results are at https://github.com/unwind/geanyuniq, and I would love some feedback.
Without having a too deep look, just a few thought: * you should consider using PLUGIN_SET_TRANSLATABLE_INFO macro instead of PLUGIN_SET_INFO. Its allowing also to translated informatons inside geany's plugin manager * Why do you giving prev (l66) a fixed size? IIRC this will prevent lines to being checked in case of they are huger than 512 char (or e.g. ~128 real UTF32 chars)
There is a huge known issue: because I tried to be clever(TM) and install the menu item in a suitable place (rather than just at the bottom of the Tools menu), this will likely break for localized versions of Geany. Not good.
My idea on how to fix that is to make it customizable where to insert the menu, so the user can enter the properly localized text to search for. Other ideas? Is it "by design" that it's so hard for plugins to add commands to arbitrary locations within Geany's menus? The GIMP does this differently, with an abstract "menu path" concept that makes it portable and easy to add items wherever.
At least as far as I know this just wasn't under discussion by now. Its current just some kind of a stack.
Ignoring the menu item issue, the command is by default bound to Shift+Control+D,
I'd remove default binding. E.g. I already have this one in another use.
and always runs over the entire document, ignoring (and actually removing) the selection. Does this make sense? Should it run only over the selection, if one is present?
As mentioned before: Yes.
It outputs a line of text to the Status window saying how many lines were deleted (if any were deleted), is that a good idea, or annoying/spammy?
I like that idea.
Are you thinking of adding it to geany-plugins project? ;)
Cheers, Frank
On Mon, Feb 20, 2012 at 10:40 AM, Frank Lanitz frank@frank.uvena.de wrote:
Am 20.02.2012 09:49, schrieb Emil Brink:
I saw that "Remove duplicate lines - simple?" item on the Plugin wishlist, and thought I'd have a go at it this weekend.
Nice ;)
Thanks!
The results are at https://github.com/unwind/geanyuniq, and I would love some feedback.
Without having a too deep look, just a few thought:
- you should consider using PLUGIN_SET_TRANSLATABLE_INFO macro instead
of PLUGIN_SET_INFO. Its allowing also to translated informatons inside geany's plugin manager
Yes, absolutely. I haven't looked into the details of translating plugins, but I did see that macro.
- Why do you giving prev (l66) a fixed size? IIRC this will prevent
lines to being checked in case of they are huger than 512 char (or e.g. ~128 real UTF32 chars)
No, glib's GStrings are always dynamic, this is just a way of trying to avoiding re-allocating the string on first assign() since I expect many lines of "typical" text files to be shorter than 512 bytes.
[menu cleverness]
At least as far as I know this just wasn't under discussion by now. Its current just some kind of a stack.
Not sure what you're referring to here. The Tools menu?
Ignoring the menu item issue, the command is by default bound to Shift+Control+D,
I'd remove default binding. E.g. I already have this one in another use.
Aha, that might be a good point, leaving the assigning of the key to the user.
It outputs a line of text to the Status window saying how many lines were deleted (if any were deleted), is that a good idea, or annoying/spammy?
I like that idea.
Cool.
Are you thinking of adding it to geany-plugins project? ;)
Yes, this seems general-purpose enough to make sense for that project, and also since it was on the wishlist it might be useful/sought after.
To the people saying "you can just pipe through uniq/sort": yes, of course.
But that creates another process, and pipes the data (twice!) between the two. I haven't tested it, but I'd wager that doing it directly against the Scintilla widget is at least twice as fast, and uses way less memory too. That was the point, plus maybe making it a little bit more user-accessible (the mere existence of the item on the wishlist seems to signify that "most users" don't just think of piping through an external program). So I think it's worthwhile. :)
Regards,
/Emil
[...]
The results are at https://github.com/unwind/geanyuniq, and I would love some feedback.
Whilst I don't think this is an everyday use plugin, I am sure it can occasionally be useful and will be worthwhile putting in plugins somewhere.
I am not sure if it is big enough for its own plugin, but I suspect its use will be uncommon enough that it should not be in add-ons, so I suppose by itself is fine for now.
[...]
[menu cleverness]
At least as far as I know this just wasn't under discussion by now. Its current just some kind of a stack.
I am not sure, but I believe that there are several reasons why adding items into other menus is not encouraged:
1. it means that the standard menus have to then be fixed, so you know where you are putting the new item. This is the case even if using a path or the path will be wrong. 2. what happens when several plugins all try the same place, could make lousy menus, and then it also changes the menu and violates item 1. 3. "hiding" the plugin menu item somewhere in the rest of the menus makes it hard to find
So keeping plugin menu items constrained is a good idea.
Not sure what you're referring to here. The Tools menu?
Ignoring the menu item issue, the command is by default bound to Shift+Control+D,
I'd remove default binding. E.g. I already have this one in another use.
Aha, that might be a good point, leaving the assigning of the key to the user.
Agree, plugins shouldn't set default keybindings without user input because of the possibility of clashing with other user definitions.
@Frank, can this be made a general recommendation somewhere about plugins?
It outputs a line of text to the Status window saying how many lines were deleted (if any were deleted), is that a good idea, or annoying/spammy?
I like that idea.
Cool.
Don't worry too much over using the status window, 90% of users don't seem to notice *anything* that comes up there anyway, me included :)
Are you thinking of adding it to geany-plugins project? ;)
Yes, this seems general-purpose enough to make sense for that project, and also since it was on the wishlist it might be useful/sought after.
To the people saying "you can just pipe through uniq/sort": yes, of course.
But on *ix only.
But that creates another process, and pipes the data (twice!) between the two. I haven't tested it, but I'd wager that doing it directly against the Scintilla widget is at least twice as fast, and uses way less memory too. That was the point, plus maybe making it a little bit more user-accessible (the mere existence of the item on the wishlist seems to signify that "most users" don't just think of piping through an external program). So I think it's worthwhile. :)
I wouldn't guarantee the performance stuff without trying it, remember Unix is made for piping stuff around processes. But it is irrelevant anyway for a rarely used function.
Seems like an occasionally useful plugin, thanks for your efforts.
Cheers Lex
Regards,
/Emil _______________________________________________ Geany mailing list Geany@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany
Am 20.02.2012 11:32, schrieb Lex Trotman:
Ignoring the menu item issue, the command is by default bound to
> Shift+Control+D,
I'd remove default binding. E.g. I already have this one in another use.
Aha, that might be a good point, leaving the assigning of the key to the user.
Agree, plugins shouldn't set default keybindings without user input because of the possibility of clashing with other user definitions.
@Frank, can this be made a general recommendation somewhere about plugins?
If this is a common thought, of course. Never thought about as I had the imagination that its my personal feeling only. ;) Would require change on geany core plugin interface docu to point it out as well as put into docu for plugins...
Cheers, Frank
On 20 February 2012 21:45, Frank Lanitz frank@frank.uvena.de wrote:
Am 20.02.2012 11:32, schrieb Lex Trotman:
Ignoring the menu item issue, the command is by default bound to
>> Shift+Control+D,
I'd remove default binding. E.g. I already have this one in another use.
Aha, that might be a good point, leaving the assigning of the key to the user.
Agree, plugins shouldn't set default keybindings without user input because of the possibility of clashing with other user definitions.
@Frank, can this be made a general recommendation somewhere about plugins?
If this is a common thought, of course. Never thought about as I had the imagination that its my personal feeling only. ;)
Well there are two of us so far :)
Would require change on geany core plugin interface docu to point it out as well as put into docu for plugins...
Oh, you want to do it *properly* ...
Ok, started another thread.
Cheers Lex
Lex Trotman wrote:
Whilst I don't think this is an everyday use plugin, I am sure it can occasionally be useful and will be worthwhile putting in plugins somewhere.
Anyone doing a bit of data wrangling will come across this requirement often enough to want to do it from their editor (grrr, but exactly why I love the send selection to command thing -- which of course allows any number of useful scripts to be added to the list).
[...]
To the people saying "you can just pipe through uniq/sort": yes, of course.
But on *ix only.
+1; I reckon the Windows mob would certainly want to do this often enough to make this a useful plugin.
In fact, the predominant Scintilla- based editor on Windows, Notepad++, has a nifty plugin called TextFX that does thus plus dozens of other really nice text transforms because they can be so handy. Perhaps this plugin can one day become the Geany equivalent :)
On 21 February 2012 14:34, Ross McKay rosko@zeta.org.au wrote:
Lex Trotman wrote:
Whilst I don't think this is an everyday use plugin, I am sure it can occasionally be useful and will be worthwhile putting in plugins somewhere.
Anyone doing a bit of data wrangling will come across this requirement
Why do people still do data wrangling in text this many years after the invention of the day-tar base. (rhetorical question ;-)
[...]
In fact, the predominant Scintilla- based editor on Windows, Notepad++, has a nifty plugin called TextFX that does thus plus dozens of other really nice text transforms because they can be so handy. Perhaps this plugin can one day become the Geany equivalent :)
Maybe Emil could take that as a challenge, at least he knows where to look for ideas :) and this plugin could be the start of an incremental implementation.
Cheers Lex
Lex Trotman wrote:
Why do people still do data wrangling in text this many years after the invention of the day-tar base. (rhetorical question ;-)
I often need to wrangle day-tar with a text editor to simplify migrating it from some truly horrid Microsoft Access pile to something less horrid, at least when it's only going to be a one-off migration and scripting it would take longer, but that aside...
I quite often grab chunks of text from such sources as XML files, SQL schemas, CSV nastiness, visually-impenetrable legacy code etc. and use Geany to strip them back to useful field names that I can then drop into code classes, HTML forms, etc. using a mix of regex and piped commands. I used to do that sort of thing with a variety of sed, awk and python scripting but generally find I can do it all much faster directly in Geany these days (especially with the most frequently used transforms bound to a couple of keystrokes).
[...]
In fact, the predominant Scintilla- based editor on Windows, Notepad++, has a nifty plugin called TextFX that does thus plus dozens of other really nice text transforms because they can be so handy. Perhaps this plugin can one day become the Geany equivalent :)
Maybe Emil could take that as a challenge, at least he knows where to look for ideas :) and this plugin could be the start of an incremental implementation.
I was hoping so, or at least it might serve as a good base for the next hacker :)
On Mon, Feb 20, 2012 at 11:00 PM, Ross McKay rosko@zeta.org.au wrote:
Lex Trotman wrote:
Why do people still do data wrangling in text this many years after the invention of the day-tar base. (rhetorical question ;-)
He said it's rhetorical, yet you answer anyway. Which is just fine by me, because I was going to answer too, regardless if anyone else did!
My situation is very similar: Lots of data from disparate sources, data is going to have to pass through some lingua franca anyway, and more often than not, that's going to be some kind of plain text.
Compounding the issue for me is that I'm a Windows Weenie, so I don't have the fancy *nix command-line tools, though I do use Python quite a lot.
So editor/IDE support for day-tar wrangling is definitely a useful feature. In my opinion, a full-blown database is actually a more heavyweight and overengineered solution than a well-executed editor plug-in for plenty of use cases.
John