[Geany-devel] Highlighting setup refactoring?

Wed Nov 9 01:46:03 UTC 2011

On 11/08/2011 02:55 PM, Colomban Wendling wrote:
> Hi guys,
>
> TL;DR: code refactoring, boring stuff :D
>
>
> I tried a rewrite of highlighting.c:styleset_*() functions for all this
> code to be better readable and hopefully more easily maintainable.  I
> ended up with no logic change (yet at least ^^), but a big refactoring,
> in the form of arrays mapping the different values.
>
> The code itself can be found here:
> https://github.com/b4n/geany/commits/highlighting-rewrite, the most
> important commit being
> https://github.com/b4n/geany/commit/ab2663e9aef8aa17af330c54eb7e6223647aa0f5
>
> Basically, that's how I made this:
>
>
> A filetype/styling definition [1] is split in 4 variables/arrays:
>
> * highlighting_lexer_<LANG>: (guint)
> 	the SCI lexer, e.g. SCLEX_CPP
> * highlighting_styles_<LANG>: (HLStyle[])
> 	SCI style/named style mappings, e.g. SCE_ASM_DEFAULT to
> 	"default"
> * highlighting_keywords_<LANG>: (HLKeyword[])
> 	keywords ID/name mappings, e.g. 0 mapped to "primary"
> * highlighting_properties_<LANG>: (HLProperty[])
> 	default SCI properties and their values, only used to disable
> 	fold.cpp.comment.explicit ATM, may be dropped maybe.
>
> These definitions are enough to do all the initialization and SCI setup,
> and because of the naming scheme, can be hidden behind a macro:
>
> STYLESET_INIT_FROM_MAPPING(ft_id, config, configh,<LANG>)
> STYLESET_FROM_MAPPING(sci, ft_id,<LANG>)
>
> These does preprocessing "##" magic to hook the different idendifiers
> and call (roughly, there are actually array length too):
>
> styleset_init_from_mapping(ft_id, config, configh, styles, keywords)
> styleset_from_mapping(sci, ft_id, lexer, styles, keywords, properties)
>
> Finally, the old init_styleset_case(geany_ft_id, init_func) can be
> replaced by init_styleset_case_auto(<LANG>), since<LANG>  is the same as
> in GEANY_FILETYPE_<LANG>.
>
>
> OK, this may look a little complicated at first glance (though the old
> wasn't that straightforward either).  But OTOH, the definitions are
> (IMO) far easier to read and update:
>
> * SCI style/named styles are next to each other
> * No need to allocate enough space manually (new_styleset()), it's done
> automatically
> * All the repetitive calls are in proper loops, making the actually
> useful information more readable.
> * ...
>

These changes solve my biggest gripes with highlighting.c :)  I *think* 
there might be some performance advantages to doing it this way too 
since all of the data is static/const and known upfront.  I think the 
compiler can treat it differently now (but I'm not a GCC expert, so I 
could be wrong).

>
> What's left to do is porting DocBook, HTML, PHP and XML.
>
> For DocBook it is because it uses SCI_STYLESETEOLFILLED that my port
> don't support yet (could easily with one more column);
>

Is it possible to add the extra column, and just not use it in all the 
other arrays where it's not required?

> For HTML, PHP and XML it's a bit harder, because they use a weird mixed
> setup (that I didn't try to understand too deeply yet).  IMO, first
> thing to do would be to make each have its own styles, and inherit XML's
> one in filetypes definitions ([styling=XML]) rather than in-code.  There
> wouldn't be manual duplication with the proposed setup, just alias XML
> definition to HTML and PHP.
> Not sure what to do with the Python filetype dependency though.
>

I *hate* how all these languages are mixed into a single lexer :)

>
> Anyway, I'd like your opinion on this, what to improve, what you feel
> good and bad about.
>

Heh, I actually did a tiny bit of work (mostly thinking) about basically 
the exact same thing.  I have a script I wrote in Python that builds the 
lookup table/array code (nearly identical to yours) from 
Scintilla.iface, we could perhaps look at using that to re-generate the 
tables when Scintilla updates, or maybe even just for checking after 
updates.  I don't know if it's necessary though now that you've done the 
bulk of the tables and they don't change *that* much between Scintilla 
versions.  Just a thought.

As we had discussed on IRC, I'd still rather not be storing duplicate 
name-mapping in (hard)code like this.  It seems cleaner to me for 
example, to have the filetypes.* containing a '[style_map]' section 
which maps named styles to scintilla style/state numbers.  The problem 
of course is that it would need comments everywhere that have the 
SCE_LANG_FOO constants, otherwise there would just be plain magic 
integers everywhere.  I guess this approach is similar to SciTE 
properties files.

Overall I think your changes are a great improvement in maintainability 
and readability (and maybe performance).  Nice work!

BTW, should we test your current repo or wait for the other filetypes to 
be implemented?

Cheers,
Matthew Brush