[Geany-devel] Indentation using regex (was [PATCH 14/19] Rewrite tab switching queue)

Lex Trotman elextr at xxxxx
Tue Dec 6 06:20:54 UTC 2011


[...]

> First, note that I wasn't able to find the patch, so I'm only guessing
> from reading the thread and from my own (much less complete) attempt.
>

I'm afraid that if I had the patch it is on my broken hard drive :-S

And anyway we never got it to work satisfactorily.

>
> So.  This looks pretty good for line-based indents (but not brace match
> :(), but I ran into a really annoying problem with SH:
>
> SH should indent after "then", "do", etc. and unindent "fi", "esac",
> etc.  The problem is that you expect the "fi" line to be unindented
> (e.g. use unindent_this_line), but if you type "file" for example, it'd
> wrongly unindent that line too!
>
> I thought about unindenting the previous line when entering the \n, but
> this isn't a real solution either since re-adding a newline after a well
> indented line would unindent it again.  Crap.
>
> So I haven't yet found a sensible solution for this problem -- which
> wouldn't apply for '}' since it's very unlikely it's part of a bigger
> word -- and would like to know it anybody got super clever ideas, or how
> other editors you know handle this.
>
> This said, I really like the idea of configurable indentation rules that
> could handle languages like SH, Pascal, Ruby, Ada, etc. without the need
> to hard-code it.
>

WARNING, complex topic, big post :)

Quick summary of ones I know:

Emacs has language specific elisp, for C:

"It analyzes the line to determine its syntactic symbol(s) (the kind
of language construct it's looking at) and its anchor position (the
position earlier in the file that CC Mode will indent the line
relative to). The anchor position might be the location of an opening
brace in the previous line, for example. See Syntactic Analysis.
It looks up the syntactic symbol(s) in the configuration to get the
corresponding offset(s). The symbol +, which means “indent this line
one more level” is a typical offset. CC Mode then applies these
offset(s) to the anchor position, giving the indentation for the line.
The different sorts of offsets are described in c-offsets-alist. "

And it admits that even then it gets it wrong sometimes :(

Eclipse and Netbeans also use parser results for the indent guidance.

I don't think parsing the source for indent guidance is in the Geany
light and small spirit, so I reject that.

Instead I propose the following "correct most of the time" but simple
option based on a combination of Jiri's and Emacs' methods:

1. Each line N has an initial indentation which is the indentation of
line N-1 plus the increments/decrements for all matches to "indent
next line" regexes that occur in liine N-1.  (Note that each regex has
a signed count of columns to indent/exdent)

2. The line N final indentation is the initial indentation adjusted by
the increments/decrements for all matches to "indent this line"
regexes that occur in line N

Note that this is the indent, not a delta like Jiri's algorithm.  It
is therefore stable no matter how many times it is calculated.

The question is then when to calculate and apply this indent, clearly
when a line is first created by enter the indent should be applied.

But what about when line content changes?  Should we:

1. calculate the indent each change, and then ripple that through the file
2. calculate the indent each change and only apply it to this line
3. calculate and apply the indent to lines N and N-1 only on new line
or user command
4. calculate and apply the indent on user command

Option 1 is rejected because it is expensive and it will destroy
manually adjusted indentation when editing an existing line and
because indentation can change as you type causing distracting effects
(happens with some Emacs indentation styles)

Option 2 is rejected for the same reasons

Option 4 is rejected because auto new line indent is really the
minimum required to be called "auto" indentation

So that leaves option 3.  The upside is that new lines get a sensible
indentation automatically, the downside is that lines that should be
unindented won't be until enter or user command.  I have used another
editor that worked this way and after a while I became used to it.
Note that editing an existing line won't destroy manual indentation
unless you tell it to or create a new line after.

The settings are two ("indent this line", "indent next line") lists of
pairs of a regex and a signed count.

These settings are per language so they should come from the filetype files.

A final thought, as there is now an "apply auto indent" command, if
there is a selection the auto indent should ripple through the whole
selection.

Cheers
Lex

>
> Regards,
> Colomban
> _______________________________________________
> Geany-devel mailing list
> Geany-devel at uvena.de
> https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel



More information about the Devel mailing list