On 11/08/2011 02:55 PM, Colomban Wendling wrote:
Hi guys,
TL;DR: code refactoring, boring stuff :D
I tried a rewrite of highlighting.c:styleset_*() functions for all this code to be better readable and hopefully more easily maintainable. I ended up with no logic change (yet at least ^^), but a big refactoring, in the form of arrays mapping the different values.
The code itself can be found here: https://github.com/b4n/geany/commits/highlighting-rewrite, the most important commit being https://github.com/b4n/geany/commit/ab2663e9aef8aa17af330c54eb7e6223647aa0f5
Basically, that's how I made this:
A filetype/styling definition [1] is split in 4 variables/arrays:
- highlighting_lexer_<LANG>: (guint) the SCI lexer, e.g. SCLEX_CPP
- highlighting_styles_<LANG>: (HLStyle[]) SCI style/named style mappings, e.g. SCE_ASM_DEFAULT to "default"
- highlighting_keywords_<LANG>: (HLKeyword[]) keywords ID/name mappings, e.g. 0 mapped to "primary"
- highlighting_properties_<LANG>: (HLProperty[]) default SCI properties and their values, only used to disable fold.cpp.comment.explicit ATM, may be dropped maybe.
These definitions are enough to do all the initialization and SCI setup, and because of the naming scheme, can be hidden behind a macro:
STYLESET_INIT_FROM_MAPPING(ft_id, config, configh,<LANG>) STYLESET_FROM_MAPPING(sci, ft_id,<LANG>)
These does preprocessing "##" magic to hook the different idendifiers and call (roughly, there are actually array length too):
styleset_init_from_mapping(ft_id, config, configh, styles, keywords) styleset_from_mapping(sci, ft_id, lexer, styles, keywords, properties)
Finally, the old init_styleset_case(geany_ft_id, init_func) can be replaced by init_styleset_case_auto(<LANG>), since<LANG> is the same as in GEANY_FILETYPE_<LANG>.
OK, this may look a little complicated at first glance (though the old wasn't that straightforward either). But OTOH, the definitions are (IMO) far easier to read and update:
- SCI style/named styles are next to each other
- No need to allocate enough space manually (new_styleset()), it's done
automatically
- All the repetitive calls are in proper loops, making the actually
useful information more readable.
- ...
These changes solve my biggest gripes with highlighting.c :) I *think* there might be some performance advantages to doing it this way too since all of the data is static/const and known upfront. I think the compiler can treat it differently now (but I'm not a GCC expert, so I could be wrong).
What's left to do is porting DocBook, HTML, PHP and XML.
For DocBook it is because it uses SCI_STYLESETEOLFILLED that my port don't support yet (could easily with one more column);
Is it possible to add the extra column, and just not use it in all the other arrays where it's not required?
For HTML, PHP and XML it's a bit harder, because they use a weird mixed setup (that I didn't try to understand too deeply yet). IMO, first thing to do would be to make each have its own styles, and inherit XML's one in filetypes definitions ([styling=XML]) rather than in-code. There wouldn't be manual duplication with the proposed setup, just alias XML definition to HTML and PHP. Not sure what to do with the Python filetype dependency though.
I *hate* how all these languages are mixed into a single lexer :)
Anyway, I'd like your opinion on this, what to improve, what you feel good and bad about.
Heh, I actually did a tiny bit of work (mostly thinking) about basically the exact same thing. I have a script I wrote in Python that builds the lookup table/array code (nearly identical to yours) from Scintilla.iface, we could perhaps look at using that to re-generate the tables when Scintilla updates, or maybe even just for checking after updates. I don't know if it's necessary though now that you've done the bulk of the tables and they don't change *that* much between Scintilla versions. Just a thought.
As we had discussed on IRC, I'd still rather not be storing duplicate name-mapping in (hard)code like this. It seems cleaner to me for example, to have the filetypes.* containing a '[style_map]' section which maps named styles to scintilla style/state numbers. The problem of course is that it would need comments everywhere that have the SCE_LANG_FOO constants, otherwise there would just be plain magic integers everywhere. I guess this approach is similar to SciTE properties files.
Overall I think your changes are a great improvement in maintainability and readability (and maybe performance). Nice work!
BTW, should we test your current repo or wait for the other filetypes to be implemented?
Cheers, Matthew Brush