[Geany-Devel] Syntax highlighting, folding, etc for a "new" language. etc for a "new" language.

Wed Jan 29 20:29:09 UTC 2014

Columban:

Many thanks for your guidance. 

There is a Jal IDE called PicShell, written in Python. It had a
rudimentary Jal parser to do the extraction of variables, etc, to build
a tree pane like Geany. (It uses the scintilla editor). I rewrote the
parser to use regular expressions to get the data, so I know how to do
it. 

In Geany, I didn't know where to start. 

Once I understand how Geany works, I'll play around. Then I can worry
about submitting it for possible inclusion into Geany, although I'm not
sure how wide an audience it will appeal to.

I'll be back with more questions, I'm sure!

-- 
Larry Bradley  
Orleans (Ottawa) Canada 

On Wed, 2014-01-29 at 18:34 +0100, Colomban Wendling wrote:

> Hi,
> 
> Le 29/01/2014 17:37, Larry Bradley a écrit :
> > [...]
> >
> > I have the geany 1.23 source, and I've actually make some changes to >
> the VHDL scintilla lexer and filetypes.vHDL to handle folding and
> > syntax highlighting properly.
> 
> You should use the development version (Git repository), so your changes
> would be easier to merge later.
> 
> > However, I would like to do a better job of supporting Jal.
> 
> First of all, you should take a look at the file named HACKING in the
> source tree.  It contains many generic and specific guidance how to hack
> on Geany source, and has a specific section for new filetypes.
> 
> > In particular, I would like the symbol tree to be able to show the 
> > variables and constants defined in a Jal program. Using the VHDL 
> > filetype, Geany shows the functions and procedures (I did nothing to 
> > cause this to happen), but not the variables.
> >
> > Only some filetypes actually display variables. Basic, for example, 
> > does, while Pascal does not.
> 
> The symbols are extracted with a CTags parser, e.g.
> tagmanager/ctags/vhdl.c.  Whether or not a particular type of symbol
> appears in Geany depends on basically two things:
> 
> 1) the ability of the relevant parser to generate "tags" for those symbols;
> 
> 2) whether or not the type of those generated tags is mapped to a
> category displayed in the symbols list.
> 
> First point obviously requires the parser to be tuned to handle a
> particular thing.  The second depends both on what type the parser
> reports for the tags, and whether this type is mapped to something for
> this language in src/symbols.c:add_top_level_items().
> 
> > I've no problems with making changes to the Geany code, but I've no
> > idea where to start with the display of variables and constants. The 
> > scintilla lexers that I've seen, and the scintilla documentation do
> > not make it really obvious how one writes a lexer.
> 
> Scintilla lexers do not generate tags, this is CTags parsers.
> 
> The true difference between those in how they work is that the goal of a
> Scintilla lexer is to only properly highlight the code, which most
> generally only require basic knowledge of the syntax (e.g. what is a
> string, a comment, etc.) -- basically, only the first step of the
> general language understanding is required: identifying tokens.  Having
> a very tolerant Scintilla lexer is a good thing, since it's definitely
> meant to highlight a document during modification.
> 
> On the other hand, since the CTags parser has to extract particular
> information from the data, it has to understand some parts.  In general,
> this requires the first step (dividing into tokens), although sometimes
> only very basic differentiation is required [1];  but also the second
> step: understanding what those token actually mean to some extent.
> Whether or not it has to understand the whole language or not depends on
> how the language is constructed and how clever the programmer of that
> parser is to find tricks.  For example, a language that use keywords to
> introduce everything the parser want to extract (PHP or Python pretty
> much fit) can pretty much simply search for those keywords and start
> extracting the relevant information from there and not care much for
> what is in-between.  On the other hand, for languages with a more "free"
> syntax (like C, C++ and other crazy languages :), the parser may have to
> care more just to be able to find what is interesting (e.g. one could
> imagine a C or C++ parser to cut the input in statements, and then
> analyze those statement content).
> 
> In practice however, one will generally take as basis an existing parser
> or lexer for a language similar to the one she want to support.
> 
> For writing your Scintilla lexer, pick one for a similar language (here
> you picked VHDL IIUC), copy it and modify it.  If the language in
> questing really only have small differences with an existing one, one
> might even simply tweak an existing lexer to handle both languages --
> but this should be done with caution no to render things hard to follow,
> and should only be used for very similar languages.
> Note that in the context of a Scintilla lexer, "very similar" means more
> what syntactic elements exist and what is their syntax (comments,
> strings, etc.) than how the language works.  For example C, C++, Java,
> JavaScript and a few other all use the same lexer, because most of their
> syntactic elements are the same.
> Also, Scintilla is a separate project, and we prefer new lexers to be
> integrated to it before we add them, so we don't diverge.  But don't
> worry, Scintilla easily accept new lexers.
> 
> For the CTags parser, it's less commonly a good idea to have one single
> parser for different languages, because generally the changes are larger
> -- unless of course one language is a perfect superscript or subscript
> of another one.
> There are 2 types of CTags parsers:  regular-expression based parser,
> and plain C ones.
> 
> 1) Regular expression parsers are quite simple, and simply consist of a
> set of line-based regular expressions that extract the tags.  These are
> limited (impossible to really handle multi-line constructs like comments
> or multi-line strings), but really simple.
> 
> 2) Plain C parsers are more complex, but can handle anything the
> programmer can handle.  They are just normal C code reading the data and
> handling it in any appropriate manner.
> 
> Ah, and don't take any example on the C parser (c.c) -- don't even look
> at it if you want don't want to become crazy ;)
> If you want a nice complete (and complex) parser for a relatively easy
> language, you can look at the ones for PHP and Rust.
> 
> 
> 
> Anyway, don't hesitate to ask any further question you have.
> 
> Regards,
> Colomban
> 
> 
> [1] e.g. it's generally not important to differentiate an identifier
> from a number constant, because in most languages they are used the
> same, and if a number appears where an identifier is expected it only
> means malformed input.
> _______________________________________________
> Devel mailing list
> Devel at lists.geany.org
> https://lists.geany.org/cgi-bin/mailman/listinfo/devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geany.org/pipermail/devel/attachments/20140129/63d01f74/attachment.html>