[Geany-devel] Highlighting setup refactoring?

Matthew Brush mbrush at xxxxx
Wed Nov 9 01:46:03 UTC 2011

On 11/08/2011 02:55 PM, Colomban Wendling wrote:
> Hi guys,
> TL;DR: code refactoring, boring stuff :D
> I tried a rewrite of highlighting.c:styleset_*() functions for all this
> code to be better readable and hopefully more easily maintainable.  I
> ended up with no logic change (yet at least ^^), but a big refactoring,
> in the form of arrays mapping the different values.
> The code itself can be found here:
> https://github.com/b4n/geany/commits/highlighting-rewrite, the most
> important commit being
> https://github.com/b4n/geany/commit/ab2663e9aef8aa17af330c54eb7e6223647aa0f5
> Basically, that's how I made this:
> A filetype/styling definition [1] is split in 4 variables/arrays:
> * highlighting_lexer_<LANG>: (guint)
> 	the SCI lexer, e.g. SCLEX_CPP
> * highlighting_styles_<LANG>: (HLStyle[])
> 	SCI style/named style mappings, e.g. SCE_ASM_DEFAULT to
> 	"default"
> * highlighting_keywords_<LANG>: (HLKeyword[])
> 	keywords ID/name mappings, e.g. 0 mapped to "primary"
> * highlighting_properties_<LANG>: (HLProperty[])
> 	default SCI properties and their values, only used to disable
> 	fold.cpp.comment.explicit ATM, may be dropped maybe.
> These definitions are enough to do all the initialization and SCI setup,
> and because of the naming scheme, can be hidden behind a macro:
> STYLESET_INIT_FROM_MAPPING(ft_id, config, configh,<LANG>)
> These does preprocessing "##" magic to hook the different idendifiers
> and call (roughly, there are actually array length too):
> styleset_init_from_mapping(ft_id, config, configh, styles, keywords)
> styleset_from_mapping(sci, ft_id, lexer, styles, keywords, properties)
> Finally, the old init_styleset_case(geany_ft_id, init_func) can be
> replaced by init_styleset_case_auto(<LANG>), since<LANG>  is the same as
> OK, this may look a little complicated at first glance (though the old
> wasn't that straightforward either).  But OTOH, the definitions are
> (IMO) far easier to read and update:
> * SCI style/named styles are next to each other
> * No need to allocate enough space manually (new_styleset()), it's done
> automatically
> * All the repetitive calls are in proper loops, making the actually
> useful information more readable.
> * ...

These changes solve my biggest gripes with highlighting.c :)  I *think* 
there might be some performance advantages to doing it this way too 
since all of the data is static/const and known upfront.  I think the 
compiler can treat it differently now (but I'm not a GCC expert, so I 
could be wrong).

> What's left to do is porting DocBook, HTML, PHP and XML.
> For DocBook it is because it uses SCI_STYLESETEOLFILLED that my port
> don't support yet (could easily with one more column);

Is it possible to add the extra column, and just not use it in all the 
other arrays where it's not required?

> For HTML, PHP and XML it's a bit harder, because they use a weird mixed
> setup (that I didn't try to understand too deeply yet).  IMO, first
> thing to do would be to make each have its own styles, and inherit XML's
> one in filetypes definitions ([styling=XML]) rather than in-code.  There
> wouldn't be manual duplication with the proposed setup, just alias XML
> definition to HTML and PHP.
> Not sure what to do with the Python filetype dependency though.

I *hate* how all these languages are mixed into a single lexer :)

> Anyway, I'd like your opinion on this, what to improve, what you feel
> good and bad about.

Heh, I actually did a tiny bit of work (mostly thinking) about basically 
the exact same thing.  I have a script I wrote in Python that builds the 
lookup table/array code (nearly identical to yours) from 
Scintilla.iface, we could perhaps look at using that to re-generate the 
tables when Scintilla updates, or maybe even just for checking after 
updates.  I don't know if it's necessary though now that you've done the 
bulk of the tables and they don't change *that* much between Scintilla 
versions.  Just a thought.

As we had discussed on IRC, I'd still rather not be storing duplicate 
name-mapping in (hard)code like this.  It seems cleaner to me for 
example, to have the filetypes.* containing a '[style_map]' section 
which maps named styles to scintilla style/state numbers.  The problem 
of course is that it would need comments everywhere that have the 
SCE_LANG_FOO constants, otherwise there would just be plain magic 
integers everywhere.  I guess this approach is similar to SciTE 
properties files.

Overall I think your changes are a great improvement in maintainability 
and readability (and maybe performance).  Nice work!

BTW, should we test your current repo or wait for the other filetypes to 
be implemented?

Matthew Brush

More information about the Devel mailing list