Hi guys,
TL;DR: code refactoring, boring stuff :D
I tried a rewrite of highlighting.c:styleset_*() functions for all this code to be better readable and hopefully more easily maintainable. I ended up with no logic change (yet at least ^^), but a big refactoring, in the form of arrays mapping the different values.
The code itself can be found here: https://github.com/b4n/geany/commits/highlighting-rewrite, the most important commit being https://github.com/b4n/geany/commit/ab2663e9aef8aa17af330c54eb7e6223647aa0f5
Basically, that's how I made this:
A filetype/styling definition [1] is split in 4 variables/arrays:
* highlighting_lexer_<LANG>: (guint) the SCI lexer, e.g. SCLEX_CPP * highlighting_styles_<LANG>: (HLStyle[]) SCI style/named style mappings, e.g. SCE_ASM_DEFAULT to "default" * highlighting_keywords_<LANG>: (HLKeyword[]) keywords ID/name mappings, e.g. 0 mapped to "primary" * highlighting_properties_<LANG>: (HLProperty[]) default SCI properties and their values, only used to disable fold.cpp.comment.explicit ATM, may be dropped maybe.
These definitions are enough to do all the initialization and SCI setup, and because of the naming scheme, can be hidden behind a macro:
STYLESET_INIT_FROM_MAPPING(ft_id, config, configh, <LANG>) STYLESET_FROM_MAPPING(sci, ft_id, <LANG>)
These does preprocessing "##" magic to hook the different idendifiers and call (roughly, there are actually array length too):
styleset_init_from_mapping(ft_id, config, configh, styles, keywords) styleset_from_mapping(sci, ft_id, lexer, styles, keywords, properties)
Finally, the old init_styleset_case(geany_ft_id, init_func) can be replaced by init_styleset_case_auto(<LANG>), since <LANG> is the same as in GEANY_FILETYPE_<LANG>.
OK, this may look a little complicated at first glance (though the old wasn't that straightforward either). But OTOH, the definitions are (IMO) far easier to read and update:
* SCI style/named styles are next to each other * No need to allocate enough space manually (new_styleset()), it's done automatically * All the repetitive calls are in proper loops, making the actually useful information more readable. * ...
What's left to do is porting DocBook, HTML, PHP and XML.
For DocBook it is because it uses SCI_STYLESETEOLFILLED that my port don't support yet (could easily with one more column);
For HTML, PHP and XML it's a bit harder, because they use a weird mixed setup (that I didn't try to understand too deeply yet). IMO, first thing to do would be to make each have its own styles, and inherit XML's one in filetypes definitions ([styling=XML]) rather than in-code. There wouldn't be manual duplication with the proposed setup, just alias XML definition to HTML and PHP. Not sure what to do with the Python filetype dependency though.
Anyway, I'd like your opinion on this, what to improve, what you feel good and bad about.
Regards, Colomban
[1] currently in new highlightingmappings.h header, but may go back to highlighting.c or to another .c since it's actually array definitions, not declarations
On 11/08/2011 02:55 PM, Colomban Wendling wrote:
Hi guys,
TL;DR: code refactoring, boring stuff :D
I tried a rewrite of highlighting.c:styleset_*() functions for all this code to be better readable and hopefully more easily maintainable. I ended up with no logic change (yet at least ^^), but a big refactoring, in the form of arrays mapping the different values.
The code itself can be found here: https://github.com/b4n/geany/commits/highlighting-rewrite, the most important commit being https://github.com/b4n/geany/commit/ab2663e9aef8aa17af330c54eb7e6223647aa0f5
Basically, that's how I made this:
A filetype/styling definition [1] is split in 4 variables/arrays:
- highlighting_lexer_<LANG>: (guint) the SCI lexer, e.g. SCLEX_CPP
- highlighting_styles_<LANG>: (HLStyle[]) SCI style/named style mappings, e.g. SCE_ASM_DEFAULT to "default"
- highlighting_keywords_<LANG>: (HLKeyword[]) keywords ID/name mappings, e.g. 0 mapped to "primary"
- highlighting_properties_<LANG>: (HLProperty[]) default SCI properties and their values, only used to disable fold.cpp.comment.explicit ATM, may be dropped maybe.
These definitions are enough to do all the initialization and SCI setup, and because of the naming scheme, can be hidden behind a macro:
STYLESET_INIT_FROM_MAPPING(ft_id, config, configh,<LANG>) STYLESET_FROM_MAPPING(sci, ft_id,<LANG>)
These does preprocessing "##" magic to hook the different idendifiers and call (roughly, there are actually array length too):
styleset_init_from_mapping(ft_id, config, configh, styles, keywords) styleset_from_mapping(sci, ft_id, lexer, styles, keywords, properties)
Finally, the old init_styleset_case(geany_ft_id, init_func) can be replaced by init_styleset_case_auto(<LANG>), since<LANG> is the same as in GEANY_FILETYPE_<LANG>.
OK, this may look a little complicated at first glance (though the old wasn't that straightforward either). But OTOH, the definitions are (IMO) far easier to read and update:
- SCI style/named styles are next to each other
- No need to allocate enough space manually (new_styleset()), it's done
automatically
- All the repetitive calls are in proper loops, making the actually
useful information more readable.
- ...
These changes solve my biggest gripes with highlighting.c :) I *think* there might be some performance advantages to doing it this way too since all of the data is static/const and known upfront. I think the compiler can treat it differently now (but I'm not a GCC expert, so I could be wrong).
What's left to do is porting DocBook, HTML, PHP and XML.
For DocBook it is because it uses SCI_STYLESETEOLFILLED that my port don't support yet (could easily with one more column);
Is it possible to add the extra column, and just not use it in all the other arrays where it's not required?
For HTML, PHP and XML it's a bit harder, because they use a weird mixed setup (that I didn't try to understand too deeply yet). IMO, first thing to do would be to make each have its own styles, and inherit XML's one in filetypes definitions ([styling=XML]) rather than in-code. There wouldn't be manual duplication with the proposed setup, just alias XML definition to HTML and PHP. Not sure what to do with the Python filetype dependency though.
I *hate* how all these languages are mixed into a single lexer :)
Anyway, I'd like your opinion on this, what to improve, what you feel good and bad about.
Heh, I actually did a tiny bit of work (mostly thinking) about basically the exact same thing. I have a script I wrote in Python that builds the lookup table/array code (nearly identical to yours) from Scintilla.iface, we could perhaps look at using that to re-generate the tables when Scintilla updates, or maybe even just for checking after updates. I don't know if it's necessary though now that you've done the bulk of the tables and they don't change *that* much between Scintilla versions. Just a thought.
As we had discussed on IRC, I'd still rather not be storing duplicate name-mapping in (hard)code like this. It seems cleaner to me for example, to have the filetypes.* containing a '[style_map]' section which maps named styles to scintilla style/state numbers. The problem of course is that it would need comments everywhere that have the SCE_LANG_FOO constants, otherwise there would just be plain magic integers everywhere. I guess this approach is similar to SciTE properties files.
Overall I think your changes are a great improvement in maintainability and readability (and maybe performance). Nice work!
BTW, should we test your current repo or wait for the other filetypes to be implemented?
Cheers, Matthew Brush
Le 09/11/2011 02:46, Matthew Brush a écrit :
On 11/08/2011 02:55 PM, Colomban Wendling wrote:
[...]
These changes solve my biggest gripes with highlighting.c :) I *think* there might be some performance advantages to doing it this way too since all of the data is static/const and known upfront. I think the compiler can treat it differently now (but I'm not a GCC expert, so I could be wrong).
I doubt it, the compiler already knew the names (literals, so data) and styles (contsants, easy to optimize), and the code was inlined. Now it uses loops, and I doubt the compiler will inline them since they are used multiple times, so I guess there would be a small performance loss -- though I honestly doubt it's noticeable.
I'm no a GCC/ELF expert either, just my guess.
What's left to do is porting DocBook, HTML, PHP and XML.
For DocBook it is because it uses SCI_STYLESETEOLFILLED that my port don't support yet (could easily with one more column);
Is it possible to add the extra column, and just not use it in all the other arrays where it's not required?
Hehe, good idea! C implicitly initializes to 0/NULL/whateverfits missing members, so it's just fine. I did it for the already existing HLKeyword.merge, and used it for the HLStyle.fill_eol.
For HTML, PHP and XML it's a bit harder, because they use a weird mixed setup (that I didn't try to understand too deeply yet). IMO, first thing to do would be to make each have its own styles, and inherit XML's one in filetypes definitions ([styling=XML]) rather than in-code. There wouldn't be manual duplication with the proposed setup, just alias XML definition to HTML and PHP. Not sure what to do with the Python filetype dependency though.
I *hate* how all these languages are mixed into a single lexer :)
Unfortunately I won't fix this to make Scintilla call the appropriate lexer at the appropriate point in the appropriate file, and us set the appropriate styles at the appropriate moment [...] :-'
Anyway, I'd like your opinion on this, what to improve, what you feel good and bad about.
Heh, I actually did a tiny bit of work (mostly thinking) about basically the exact same thing. I have a script I wrote in Python that builds the lookup table/array code (nearly identical to yours) from Scintilla.iface, we could perhaps look at using that to re-generate the tables when Scintilla updates, or maybe even just for checking after updates. I don't know if it's necessary though now that you've done the bulk of the tables and they don't change *that* much between Scintilla versions. Just a thought.
It'd need adjustments from one side or another because there are a few "mappings" done in the code directly, like i.e. both SCE_D_TYPEDEF and SCE_D_WORD5 mapping to "typedef"; and as you said we anyway would need to update some code manually at some point.
But maybe it'd be interesting, maybe it'd help not to miss any new styles -- though actually checking the new Scintilla version's diff should be enough.
As we had discussed on IRC, I'd still rather not be storing duplicate name-mapping in (hard)code like this. It seems cleaner to me for example, to have the filetypes.* containing a '[style_map]' section which maps named styles to scintilla style/state numbers. The problem of course is that it would need comments everywhere that have the SCE_LANG_FOO constants, otherwise there would just be plain magic integers everywhere. I guess this approach is similar to SciTE properties files.
I agree we better not map twice, but I don't really like the plain integers like in SciTE .properties files -- even with the comments they aren't that readable. So... not sure.
Overall I think your changes are a great improvement in maintainability and readability (and maybe performance). Nice work!
Thanks :)
BTW, should we test your current repo or wait for the other filetypes to be implemented?
You could've done both, there was no point where a filetype stopped working. Now all are ported to the new system, so you can test without any more questions :)
BTW, I think XML/HTML/PHP might need more attention. Although I did my best and thin I didn't missed anything (and tried it a little), it changed the code far more than with the other filtypes, so maybe I *did* miss something.
Other filtypes should be just fine, it wasn't really hard to convert (C&P, plus replacing a few things -- just a little boring) and I doubt I made much mistakes. However, this doesn't mean testing is not useful, better not trust a big change like this before testing it :p
Cheers, Colomban
On 11/08/2011 02:55 PM, Colomban Wendling wrote:
[...]
Anyway, I'd like your opinion on this, what to improve, what you feel good and bad about.
Ok, now that I have tried it and hacked it a little, I'd say it's quite a bit easier to follow and seems to work well with my minimal testing.
Just a few comments to make after playing with it a little.
The first is that it would be better to move the use of filetype lexer ids[1] in highlighting.c highlighting_init_styles() and highlighting_set_styles() into highlightingmappings.h. I'm not sure if it's possible, but it would make adding filetypes more straightforward since you wouldn't need to touch highlighting.c at all anymore.
The second comment is that use of all those cpp macros makes it a little confusing when getting errors/warnings from the compiler. I guess it's unavoidable, but I thought I'd mention it since it caused me a little trouble :)
The third comment is just a reminder[2] to update the HACKING file for the information about adding filetypes.
Cheers, Matthew Brush
[1] Is this the same as the SciLexer.h constants? [2] You probably just didn't get to doing this yet
Le 10/11/2011 04:16, Matthew Brush a écrit :
On 11/08/2011 02:55 PM, Colomban Wendling wrote:
[...]
Anyway, I'd like your opinion on this, what to improve, what you feel good and bad about.
Ok, now that I have tried it and hacked it a little, I'd say it's quite a bit easier to follow and seems to work well with my minimal testing.
Just a few comments to make after playing with it a little.
The first is that it would be better to move the use of filetype lexer ids[1] in highlighting.c highlighting_init_styles() and highlighting_set_styles() into highlightingmappings.h. I'm not sure if it's possible, but it would make adding filetypes more straightforward since you wouldn't need to touch highlighting.c at all anymore.
You mean SCLEX_FOO? If yes, I did it already .
The second comment is that use of all those cpp macros makes it a little confusing when getting errors/warnings from the compiler. I guess it's unavoidable, but I thought I'd mention it since it caused me a little trouble :)
Yeah I know, it's the one thing I though some maybe wouldn't like. I just dropped one no more useful macro indirection, so it's a *little* better, but will still be cryptic if one sets lexer_FOO to EMPTY_PROPERTIES (:-').
The question is: is the improvement worth the magic macro? I think yes, but I want(ed) you opinion.
The third comment is just a reminder[2] to update the HACKING file for the information about adding filetypes.
Yeah I know, will do :)
So, is anybody against me merging it after making the last few adjustments (like doc update)?
Another question is whether I should keep the highlightingmappings.h file as-is, or rename it (to what?)/merge it back in highlighting.c?
Cheers, Colomban
[...]
So, is anybody against me merging it after making the last few adjustments (like doc update)?
Another question is whether I should keep the highlightingmappings.h file as-is, or rename it (to what?)/merge it back in highlighting.c?
Personally I think this is a worthwhile improvement in maintainability so am in favour of committing it. Keeping the mappings file separate is worthwhile, name is fine.
Cheers Lex
Le 12/11/2011 00:32, Lex Trotman a écrit :
[...]
So, is anybody against me merging it after making the last few adjustments (like doc update)?
Another question is whether I should keep the highlightingmappings.h file as-is, or rename it (to what?)/merge it back in highlighting.c?
Personally I think this is a worthwhile improvement in maintainability so am in favour of committing it. Keeping the mappings file separate is worthwhile, name is fine.
It's now merged :)
Cheers, Colomban
Hi,
I updated Elias's patch[1] to add Objective-C support on top of the new highlighting changes from Colomban[2]. I added a few things that were missing from the original patch too.
It's causing a warning that I'm having trouble tracking down because of the cpp macro problem I mentioned in the original thread:
CC highlighting.o highlighting.c: In function ‘highlighting_set_styles’: highlighting.c:1086:3: warning: passing argument 3 of ‘styleset_from_mapping’ makes integer from pointer without a cast [enabled by default] highlighting.c:833:13: note: expected ‘guint’ but argument is of type ‘const struct HLProperty *’
But otherwise it seems to work OK in some basic tests, at least considering I don't know Objective-C besides the C parts.
Cheers, Matthew Brush
[1] https://sourceforge.net/tracker/?func=detail&aid=3325139&group_id=15... [2] I tried to make a pull request but you can't make a fork of a fork, and then I was going to make an "Issue" but those are disabled, so I'm sending to the ML :)
Never mind about the warning, I'm stupid.
Fixed patch is attached.
Cheers, Matthew Brush
Am 10.11.2011 05:06, schrieb Matthew Brush:
+++ b/tagmanager/objc.c @@ -0,0 +1,1146 @@
+/* +* Copyright (c) 2010, Vincent Berthoux +* +* This source code is released for free distribution under the terms of the +* GNU General Public License. +* +* This module contains functions for generating tags for Objective C +* language files. +*/ +/* +* INCLUDE FILES +*/
The license is quite unclear. Also no standard disclaimer.
Best regards.
On Thu, Nov 10, 2011 at 7:05 PM, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 10.11.2011 05:06, schrieb Matthew Brush:
+++ b/tagmanager/objc.c @@ -0,0 +1,1146 @@
+/* +* Copyright (c) 2010, Vincent Berthoux +* +* This source code is released for free distribution under the terms of the +* GNU General Public License. +* +* This module contains functions for generating tags for Objective C +* language files. +*/ +/* +* INCLUDE FILES +*/
The license is quite unclear. Also no standard disclaimer.
Good point Thomas, we need to know where this file came from. Maybe that project has a central LICENSE file with the details. We can copy that into the file. Also should acknowledge the source. Not the least so we can find it in circumstances like this.
Cheers Lex
Best regards.
Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
Le 10/11/2011 09:22, Lex Trotman a écrit :
On Thu, Nov 10, 2011 at 7:05 PM, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 10.11.2011 05:06, schrieb Matthew Brush:
+++ b/tagmanager/objc.c @@ -0,0 +1,1146 @@
+/* +* Copyright (c) 2010, Vincent Berthoux +* +* This source code is released for free distribution under the terms of the +* GNU General Public License. +* +* This module contains functions for generating tags for Objective C +* language files. +*/ +/* +* INCLUDE FILES +*/
The license is quite unclear. Also no standard disclaimer.
Good point Thomas, we need to know where this file came from. Maybe that project has a central LICENSE file with the details. We can copy that into the file. Also should acknowledge the source. Not the least so we can find it in circumstances like this.
Although I agree the header misses some infos, it's simply from the CTags project [1] like all other parsers, which is licensed under GPLv2 if I trust the COPYING file.
I then think this patch is OK, or at least isn't worst than the current status. But if you want to make licensing clearer, please, go ahead :)
Cheers, Colomban
On Sun, Nov 13, 2011 at 11:58 AM, Colomban Wendling lists.ban@herbesfolles.org wrote:
Le 10/11/2011 09:22, Lex Trotman a écrit :
On Thu, Nov 10, 2011 at 7:05 PM, Thomas Martitz thomas.martitz@student.htw-berlin.de wrote:
Am 10.11.2011 05:06, schrieb Matthew Brush:
+++ b/tagmanager/objc.c @@ -0,0 +1,1146 @@
+/* +* Copyright (c) 2010, Vincent Berthoux +* +* This source code is released for free distribution under the terms of the +* GNU General Public License. +* +* This module contains functions for generating tags for Objective C +* language files. +*/ +/* +* INCLUDE FILES +*/
The license is quite unclear. Also no standard disclaimer.
Good point Thomas, we need to know where this file came from. Maybe that project has a central LICENSE file with the details. We can copy that into the file. Also should acknowledge the source. Not the least so we can find it in circumstances like this.
Although I agree the header misses some infos, it's simply from the CTags project [1] like all other parsers, which is licensed under GPLv2 if I trust the COPYING file.
Right, I just wasn't which project it came from.
Cheers Lex
I then think this patch is OK, or at least isn't worst than the current status. But if you want to make licensing clearer, please, go ahead :)
Meh, our copying contains the same license, so no change needed.
Cheers, Colomban
[1] http://ctags.svn.sf.net/viewvc/ctags/trunk/ _______________________________________________ Geany-devel mailing list Geany-devel@uvena.de https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
Le 10/11/2011 05:06, Matthew Brush a écrit :
Never mind about the warning, I'm stupid.
Fixed patch is attached.
Committed as you asked on IRC. I fixed a few things, but was almost OK.
Cheers, Colomban