Hi, in hightlighting.c there is a comment: `/* ... keyword_idx is used for both style_sets[].keywords and scintilla keyword style number */ `
Is it possible to make this independant? This would be nice. I'm writing a new lexer, which requires independ (ordered) keyword lists from styling lists (see highlightingmappings.h HLStyle vs. HLKeyword).
Definition is as follows:
``` #define highlighting_lexer_UNI SCLEX_UNI static const HLStyle highlighting_styles_UNI[] = { // list index { SCE_UNI_S_DEFAULT, "Default", FALSE}, // 0 { SCE_UNI_S_COMMENT, "CommentOneline", FALSE}, // 1 { SCE_UNI_S_COMMENT_ML, "CommentMultiline", FALSE}, // 2 { SCE_UNI_S_COMMENT_DOC, "CommentDocument", FALSE}, // 3 { SCE_UNI_S_PRIMARYKEY, "PrimaryKey", FALSE}, // 4 { SCE_UNI_S_SECONDARYKEY, "SecondaryKey", FALSE}, // 5 { SCE_UNI_S_KEYLIST3, "KeyList3", FALSE}, // 6 { SCE_UNI_S_KEYLIST4, "KeyList4", FALSE}, // 7 { SCE_UNI_S_KEYLIST5, "KeyList5", FALSE}, // 8 ....
``` ``` static const HLKeyword highlighting_keywords_UNI[] = { // list index { SCE_UNI_K_DEFAULT , "Default", FALSE}, // 0 { SCE_UNI_K_COMMENT , "CommentOneline", FALSE}, // 1 { SCE_UNI_K_COMMENT_ML_S , "CommentMultilineStart",FALSE}, // 2 { SCE_UNI_K_COMMENT_ML_E , "CommentMultilineEnd", FALSE}, // 3 { SCE_UNI_K_COMMENT_DOC_S , "CommentDocumentStart", FALSE}, // 4 { SCE_UNI_K_COMMENT_DOC_E , "CommentDocumentEnd", FALSE}, // 5 { SCE_UNI_K_PRIMARYKEY , "PrimaryKey", FALSE}, // 6 { SCE_UNI_K_SECONDARYKEY , "SecondaryKey", FALSE}, // 7 { SCE_UNI_K_KEYLIST3 , "KeyList3", FALSE}, // 8 { SCE_UNI_K_KEYLIST4 , "KeyList4", FALSE}, // 9 { SCE_UNI_K_KEYLIST5 , "KeyList5", FALSE}, // 10 ....
``` As you can see, it is not possible to match list indices for keywords and styles by list index (e.g. Multiline Styles have two keyword lists, one for start and one for end of comment). So I plan to assign to SCE_UNI_K_.. values from 100 and to SCE_UNI_S_.. values from 200.
Anyway, in highlighting.c, when filling from filetypes.xxx, there should match to correct style_sets.styleset - searched by "guint style;" (HLStyle) and "guint id;" (HLKeyword).
br HoTschir
in hightlighting.c there is a comment: /* ... keyword_idx is used for both style_sets[].keywords and scintilla keyword style number */
I think you are confused about what the comment says. It says that the index to `style_sets[].keywords`, which in your case is `highlighting_keywords_UNI` has to be the same as the index to the keywords defined in scintilla lexer. In other words, that the first number in `highlighting_keywords_UNI` will have to correspond to the keyword index in the lexer. Indices to `highlighting_styles_UNI` are completely independent.
Also the comment is for `merge_type_keywords()` which is used to inject ctags types to scintilla and is irrelevant for your case.
So I plan to assign to SCE_UNI_K_.. values from 100 and to SCE_UNI_S_.. values from 200.
You can't do that (at least if you plan to submit your lexer to Lexilla) - `SCE_UNI_S_` are generated automatically by a script in lexilla:
https://github.com/ScintillaOrg/lexilla/blob/master/include/SciLexer.h
There's no exported `SCE_UNI_K_` (you can of course use it within your lexer). `highlighting_keywords_UNI` will just have to use the correct indices.
/* TODO: style_id might not be the real array index (Scintilla styles are not always synced
- with array indices) */
I have no idea what this comment tries to say to be honest and if it's related to lexers in any way.
In any case, if you plan to submit a pull request with the lexer to Geany, you have to get it merged to the Lexilla project first - we only accept official lexers.
Hallo.
...Indices to `highlighting_styles_UNI` are completely independent.
You are right. I was confused by the comment.
So I plan to assign to SCE_UNI_K_.. values from 100 and to SCE_UNI_S_.. values from 200.
You can't do that (at least if you plan to submit your lexer to Lexilla) ...
If Keyword indices are completely independant from Style indices, as you told above, that's o.k. for me. No need to number it from 100 or 200 in real world. It would only have prooved independance to me.
`highlighting_keywords_UNI` will just have to use the correct indices.
For me, that's a great drawback of geany vs. lexilla source code. Every time any lexer feature is added to lexilla, `highlightingmappings.h` needs to be updated **manually**, keeping order troublesomly. I'm not sure, but even now, there are differences.
A summary of what is defined for each lexer in `highlightingmappings.h` is attached: [geany_2.0.0_20241219_lexer_definitions.zip](https://github.com/user-attachments/files/18197627/geany_2.0.0_20241219_lexe...)
Examples of discrepancies: * lexer Pascal has properties `lexer.pascal.smart.highlighting` `fold.comment` `fold.preprocessor` and `fold.compact` Geany does use `lexer.pascal.smart.highlighting` (in filetype.pascal `lexer.pascal.smart.highlighting=1`), but in `highlightingmappings.h` there is entry `#define highlighting_properties_PASCAL EMPTY_PROPERTIES`.
* lexer CPP has about 20 properties defined. Two of them are used in geany (in filetype.c `styling.within.preprocessor=1` and `lexer.cpp.track.preprocessor=0`), but in `highlightingmappings.h` there is entry `"fold.cpp.comment.explicit", "0"`.
* lexer CPP currently has defined 6 keyword lists. In `highlightingmappings.h` there is entry only for `0 primary` `1 secondary` and `2 docComment`. The drawback extends deep into the source code. For example, here geany uses wordlist 3: https://github.com/geany/geany/blob/7a017c764038bcdfcb99db7365c3196fd8aebdbf... Also there is used a constant `3`. If you read this part of source only, you have no idea, what `3` means.
I propose to introduce keywordlist constants: `SCE_[LANG]_K_[DESCRIPTION]` and rename existing style constants `SCE_[LANG]_[DESCRIPTION]` to `SCE_[LANG]_S_[DESCRIPTION]`. Keywordlist constants can be used in `highlighting_keywords_[LANG]` and in source code (e.g. in document.c, see above) and will improve readability and reduce fault liability `SciLexer.h: SCE_C_K_GLOBALCLASSESANDTYPES = 3;` and `document.c: keyword_idx = SCE_C_K_GLOBALCLASSESANDTYPES;`.
In any case, if you plan to submit a pull request with the lexer to Geany, you have to get it merged to the Lexilla project first - we only accept official lexers.
That's the plan. A feature update to LexPascal will be offered to Lexilla project whithin next days. After that, I'll finish work on new Lexer and offer to Lexilla, too.
Thank you for your answer and time. br HoTschir
For me, that's a great drawback of geany vs. lexilla source code. Every time any lexer feature is added to lexilla, highlightingmappings.h needs to be updated manually, keeping order troublesomly. I'm not sure, but even now, there are differences.
If you mean things like `lexer.cpp.track.preprocessor=0` then no, it can be done dynamically in the config file independently of highlightingmappings.h.
But if a new highlighting style is introduced by a lexer, Geany indeed has to update `highlightingmappings.h`. I think the reason for this was to have nice names in the config file like e.g. `commentline` for Python instead of `style.python.1` in SciTE but this is at the cost of some extra maintenance. But it's probably too late to change that for all the languages.
Examples of discrepancies:
Not discrepancies IMO, see below.
lexer Pascal has properties lexer.pascal.smart.highlighting fold.comment fold.preprocessor and fold.compact Geany does use lexer.pascal.smart.highlighting (in filetype.pascal lexer.pascal.smart.highlighting=1), but in highlightingmappings.h there is entry #define highlighting_properties_PASCAL EMPTY_PROPERTIES.
That's alright. In general, properties should be defined in the config file instead of being hard-coded.
lexer CPP has about 20 properties defined. Two of them are used in geany (in filetype.c styling.within.preprocessor=1 and lexer.cpp.track.preprocessor=0), but in highlightingmappings.h there is entry "fold.cpp.comment.explicit", "0".
And if users wish, they can add whichever of those 20 properties to the config file and customize the lexer to their needs. I don't know why `fold.cpp.comment.explicit` is hard-coded - possibly some things in Geany don't work if this property is set to 1.
lexer CPP currently has defined 6 keyword lists. In highlightingmappings.h there is entry only for 0 primary 1 secondary and 2 docComment. The drawback extends deep into the source code. For example, here geany uses wordlist 3:
Basically I think nobody from Geany developers (or even users) needs 6 keyword groups so only 3 are defined. If there's a need, we may add more.
The wordlist 3 is indeed "magic" as it serves for injecting types from ctags to Scintilla and their colorization. If I'm not mistaken, we do this only for a few languages now, mostly using the C lexer so there hasn't been much need to define it dynamically.
I propose to introduce keywordlist constants: SCE_[LANG]_K_[DESCRIPTION] and rename existing style constants SCE_[LANG]_[DESCRIPTION] to SCE_[LANG]_S_[DESCRIPTION]. Keywordlist constants can be used in highlighting_keywords_[LANG] and in source code (e.g. in document.c, see above) and will improve readability and reduce fault liability SciLexer.h: SCE_C_K_GLOBALCLASSESANDTYPES = 3; and document.c: keyword_idx = SCE_C_K_GLOBALCLASSESANDTYPES;.
If you want such a change, you must do it in Scintilla - "SciLexer.h" isn't our code and as I said, it's generated dynamically using a script in Scintilla. I slightly doubt it will be accepted though.
If you mean things like `lexer.cpp.track.preprocessor=0` then no, it can be done dynamically in the config file independently of highlightingmappings.h.
But for that, the user of geany has to know source code of lexilla :-( From the beginning of use of geany, I miss some function to let me show, what wordlists, sytles and porperties are available for configuration within config files.
But if a new highlighting style is introduced by a lexer, Geany indeed has to update `highlightingmappings.h`. I think the reason for this was to have nice names in the config file like e.g. `commentline` for Python instead of `style.python.1` in SciTE but this is at the cost of some extra maintenance. But it's probably too late to change that for all the languages.
see https://github.com/ScintillaOrg/lexilla/issues/296, just created
lexer Pascal has properties lexer.pascal.smart.highlighting fold.comment fold.preprocessor and fold.compact Geany does use lexer.pascal.smart.highlighting (in filetype.pascal lexer.pascal.smart.highlighting=1), but in highlightingmappings.h there is entry #define highlighting_properties_PASCAL EMPTY_PROPERTIES.
That's alright. In general, properties should be defined in the config file instead of being hard-coded.
For all of this, user has to know, which properties are available, what means again he has to read source of Lexilla.
:
lexer CPP currently has defined 6 keyword lists. In highlightingmappings.h there is entry only for 0 primary 1 secondary and 2 docComment. The drawback extends deep into the source code. For example, here geany uses wordlist 3:
Basically I think nobody from Geany developers (or even users) needs 6 keyword groups so only 3 are defined. If there's a need, we may add more.
This let me fret about my needs. I've been working on some kind of universal lexer, supporting some dozen of styles and word lists. Did I miss soming? Your statement makes me hesitate to finish it for publication.
Initial question (independance of key and style indeces) has been clarified.
Initial question (independance of key and style indeces) has been clarified.
Closed #4154 as resolved.
Initial question (independance of key and style indeces) has been clarified.
But for that, the user of geany has to know source code of lexilla :-( From the beginning of use of geany, I miss some function to let me show, what wordlists, sytles and porperties are available for configuration within config files.
I think the problem exists only for properties, right? For wordlists and styles users see what's available.
For all of this, user has to know, which properties are available, what means again he has to read source of Lexilla.
For properties I agree and I think it would be good to list all of them somewhere - we could e.g. put all of them in filetype config files, commented out, or, if I remember correctly, @eht16 made some script to list them in documentation in some PR that got long forgotten. Maybe good idea to revive it.
lexer CPP currently has defined 6 keyword lists. In highlightingmappings.h there is entry only for 0 primary 1 secondary and 2 docComment. The drawback extends deep into the source code. For example, here geany uses wordlist 3:
Basically I think nobody from Geany developers (or even users) needs 6 keyword groups so only 3 are defined. If there's a need, we may add more.
This let me fret about my needs. I've been working on some kind of universal lexer, supporting some dozen of styles and word lists. Did I miss soming? Your statement makes me hesitate to finish it for publication.
We were talking about the C lexer which defines more wordlists than Geany uses and where I don't see much need to add more. (But of course if there's such a need, more can be added, I just haven't seen any user's requests in this regard.)
If you have a lexer where there's a legitimate need for 100 keywordlists, those can be added. I'm just wondering, if that is a good approach.
Assuming you are creating some universal lexer that uses e.g. some grammars for lexing. Then, instead of having tons of keywords from different languages in a single (fake) filetype for this lexer, it would be better to use this lexer for one filetype only, and define all other filetypes as external, similarly to what Geany does e.g. for JSON which reuses the Javascript lexer:
https://github.com/geany/geany/blob/master/data/filedefs/filetypes.JSON.conf
You can inherit stuff from the parent lexer, but you can also completely redefine styles, properties, or wordlists in such filetype definitions. Then, you don't need a universal wordlist set containing keywords from all languages, but only a set of say 4 wordlists which each language redefines based on its needs.
By the way, have you seen
https://github.com/geany/geany/blob/master/HACKING#L528
It might help you when adding a new lexer. Also, I'd suggest that you check some existing PR adding a new lexer such as https://github.com/geany/geany/pull/3934 so you know what needs to be changed (just ignore the tagmanager and ctags part).
Finally, if you plan to develop some "non-standard" lexer like your universal lexer, I'd strongly suggest to consult this with Neil first so that your idea is compatible with what Neil wants to have in Lexilla so you don't waste much time in case he doesn't like it for some reason.
Assuming you are creating some universal lexer that uses e.g. some grammars for lexing. Then, instead of having tons of keywords from different languages in a single (fake) filetype for this lexer, it would be better to use this lexer for one filetype only, and define all other filetypes as external,
Exactly, that is, how I realised it: Having one file, called `filetypes.Uni`. But this file is almost empty (just one comment, which say, it is a dummy file). Then for each of wanted lexing (let's say for Gcode files) one customer filetype file (in that case filetypes.gcode.conf) is generated, having the line `lexer_filetype=Uni` in it beside all keylist and style definitions.
To give you an impression, why so much keyword lists are necessary: For universal lexer, you can not define, what char includes strings or comments. So you must provide a keylist for each of them: `StringDelimiter="` or `StringDelimiter='` or even `StringDelimiter=' "` Same for comments: C-like `CommentMultilineStart=/*` `CommentMultilineEnd=*/` Pascal-like `CommentMultilineStart={` `CommentMultilineEnd=}` Forth like: `CommentMultilineStart=( ` `CommentMultilineEnd= )`
For instance in markdown files, if you want to style each level of headline (# ## ### ####), you need to have a separate keyword list and style for each level.
Currently LexUni does support 64 customer keylists and styles beside bulit in lists (like `StringDelimiter`, `CommentMultilineStart`, `HexNumber`, ..).
Hmm, but those things on the right hand side aren't really keywords, are they? If I understand it correctly, you basically misuse keyword lists for defining some syntactic rules - that's not a very clean approach. I'd really suggest to consult this with Neil first.
github-comments@lists.geany.org