Many thanks for your guidance.
There is a Jal IDE called PicShell, written in Python. It had a
rudimentary Jal parser to do the extraction of variables, etc, to build
a tree pane like Geany. (It uses the scintilla editor). I rewrote the
parser to use regular expressions to get the data, so I know how to do
it.
In Geany, I didn't know where to start.
Once I understand how Geany works, I'll play around. Then I can worry
about submitting it for possible inclusion into Geany, although I'm not
sure how wide an audience it will appeal to.
--
Larry Bradley
Orleans (Ottawa) Canada
On Wed, 2014-01-29 at 18:34 +0100, Colomban Wendling wrote:
> Hi,
>
> Le 29/01/2014 17:37, Larry Bradley a écrit :
> > [...]
> >
> > I have the geany 1.23 source, and I've actually make some changes to >
> the VHDL scintilla lexer and filetypes.vHDL to handle folding and
> > syntax highlighting properly.
>
> You should use the development version (Git repository), so your changes
> would be easier to merge later.
>
> > However, I would like to do a better job of supporting Jal.
>
> First of all, you should take a look at the file named HACKING in the
> source tree. It contains many generic and specific guidance how to hack
> on Geany source, and has a specific section for new filetypes.
>
> > In particular, I would like the symbol tree to be able to show the
> > variables and constants defined in a Jal program. Using the VHDL
> > filetype, Geany shows the functions and procedures (I did nothing to
> > cause this to happen), but not the variables.
> >
> > Only some filetypes actually display variables. Basic, for example,
> > does, while Pascal does not.
>
> The symbols are extracted with a CTags parser, e.g.
> tagmanager/ctags/vhdl.c. Whether or not a particular type of symbol
> appears in Geany depends on basically two things:
>
> 1) the ability of the relevant parser to generate "tags" for those symbols;
>
> 2) whether or not the type of those generated tags is mapped to a
> category displayed in the symbols list.
>
> First point obviously requires the parser to be tuned to handle a
> particular thing. The second depends both on what type the parser
> reports for the tags, and whether this type is mapped to something for
> this language in src/symbols.c:add_top_level_items().
>
> > I've no problems with making changes to the Geany code, but I've no
> > idea where to start with the display of variables and constants. The
> > scintilla lexers that I've seen, and the scintilla documentation do
> > not make it really obvious how one writes a lexer.
>
> Scintilla lexers do not generate tags, this is CTags parsers.
>
> The true difference between those in how they work is that the goal of a
> Scintilla lexer is to only properly highlight the code, which most
> generally only require basic knowledge of the syntax (e.g. what is a
> string, a comment, etc.) -- basically, only the first step of the
> general language understanding is required: identifying tokens. Having
> a very tolerant Scintilla lexer is a good thing, since it's definitely
> meant to highlight a document during modification.
>
> On the other hand, since the CTags parser has to extract particular
> information from the data, it has to understand some parts. In general,
> this requires the first step (dividing into tokens), although sometimes
> only very basic differentiation is required [1]; but also the second
> step: understanding what those token actually mean to some extent.
> Whether or not it has to understand the whole language or not depends on
> how the language is constructed and how clever the programmer of that
> parser is to find tricks. For example, a language that use keywords to
> introduce everything the parser want to extract (PHP or Python pretty
> much fit) can pretty much simply search for those keywords and start
> extracting the relevant information from there and not care much for
> what is in-between. On the other hand, for languages with a more "free"
> syntax (like C, C++ and other crazy languages :), the parser may have to
> care more just to be able to find what is interesting (e.g. one could
> imagine a C or C++ parser to cut the input in statements, and then
> analyze those statement content).
>
> In practice however, one will generally take as basis an existing parser
> or lexer for a language similar to the one she want to support.
>
> For writing your Scintilla lexer, pick one for a similar language (here
> you picked VHDL IIUC), copy it and modify it. If the language in
> questing really only have small differences with an existing one, one
> might even simply tweak an existing lexer to handle both languages --
> but this should be done with caution no to render things hard to follow,
> and should only be used for very similar languages.
> Note that in the context of a Scintilla lexer, "very similar" means more
> what syntactic elements exist and what is their syntax (comments,
> strings, etc.) than how the language works. For example C, C++, Java,
> JavaScript and a few other all use the same lexer, because most of their
> syntactic elements are the same.
> Also, Scintilla is a separate project, and we prefer new lexers to be
> integrated to it before we add them, so we don't diverge. But don't
> worry, Scintilla easily accept new lexers.
>
> For the CTags parser, it's less commonly a good idea to have one single
> parser for different languages, because generally the changes are larger
> -- unless of course one language is a perfect superscript or subscript
> of another one.
> There are 2 types of CTags parsers: regular-expression based parser,
> and plain C ones.
>
> 1) Regular expression parsers are quite simple, and simply consist of a
> set of line-based regular expressions that extract the tags. These are
> limited (impossible to really handle multi-line constructs like comments
> or multi-line strings), but really simple.
>
> 2) Plain C parsers are more complex, but can handle anything the
> programmer can handle. They are just normal C code reading the data and
> handling it in any appropriate manner.
>
> Ah, and don't take any example on the C parser (c.c) -- don't even look
> at it if you want don't want to become crazy ;)
> If you want a nice complete (and complex) parser for a relatively easy
> language, you can look at the ones for PHP and Rust.
>
>
>
> Anyway, don't hesitate to ask any further question you have.
>
> Regards,
> Colomban
>
>
> [1] e.g. it's generally not important to differentiate an identifier
> from a number constant, because in most languages they are used the
> same, and if a number appears where an identifier is expected it only
> means malformed input.
> _______________________________________________
> Devel mailing list
> Devel@lists.geany.org
>
https://lists.geany.org/cgi-bin/mailman/listinfo/devel