[Geany-devel] Short function factoring

Wed Nov 9 02:35:24 UTC 2011

On 11/08/2011 01:21 PM, Lex Trotman wrote:
> On Wed, Nov 9, 2011 at 3:40 AM, Nick Treleaven
> <nick.treleaven at btinternet.com>  wrote:
>> On 04/11/2011 00:21, Lex Trotman wrote:
>>>
>>> Geany has many places where a short function then calls
>>> another short function which calls another short function, none of
>>> which are re-used.  Personally I find this way of writing code less
>>> efficient and very hard to follow and understand as a whole, but
>>> others find it easier to think only in terms of each little piece.
>>> YMMV
>>
>> Here may be somewhere where we disagree fundamentally, because:
>
> Probably :)
>

I certainly do :)

>>
>> * long functions cause bugs
>
> Rubbish, wrong functions cause bugs no matter how long, and being
> unable to see the whole logic leads to wrong design.
>

+1

>> * too many variables in one place cause bugs
>

Can you elaborate on this assertion?

> Thats what declarations in inner blocks are for :)
>
>>
>> I'm sure there are statisics to back this up, it's well known in code
>> analysis/reliability circles.
>
> And there are also statistics that say the opposite, that breaking it
> up too far over complicates things and causes unreliability.
>

+1

>
>>
>> Breaking up logical tasks into functions is crucial to writing maintainable
>> code, functions *are not just about code reuse*.
>

But there's lots of places in Geany's code where logical tasks are 
actually further broken into many little sub-tasks (see below for an 
example), so small and trivial that the code would be a lot easier to 
follow if it was just in one place and achieved it's primary task.

> Yes, the key word here is *logical*, too big is bad and too small is
> bad, but what is too big and what is too small depends on the person
> and what the code is doing.  It isn't a one size fits all.
>
> I just noted that for me parts of Geany err on the too small side.
>

I think you've summed that up pretty well.

Just to give an example, in encodings.c, there's a "logical" task 
towards the end of the code, which is to "guess encoding from a buffer 
and convert it".  It starts out in encodings_convert_to_utf8_auto(), 
which is meant to do the task, but instead of that function doing what 
it's meant to do, it instead calls handle_buffer() to do it.

Now handle_buffer() is supposed to "guess encoding from a buffer and 
convert it" instead of the original function, so it first tries to 
detect the Unicode BOM, it calls encodings_scan_unicde_bom() which is 
fine, since it's logically a separate task and has a general usefulness 
(in fact in this case it's also re-used elsewhere, but pretend it wasn't).

Once the Unicode BOM is scanned, it continues on and if the encoding is 
to be forced, it calls yet another function, handle_forced_encoding() 
which could easily be inline in handle_buffer().  If not forcing the 
encoding, handle_buffer() will then call handle_encoding(), which 
actually does what handle_buffer() is supposed to do also; convert the 
encoding.  handle_encoding() then calls out to the actual encoding 
conversion functions, with some logic/conditions around it. 
handle_encoding() could go into handle_buffer() as well, even though 
it's slightly on the longer side.

Continuing on, after handle_buffer() has called out to these other 
functions, it then shifts the buffer's data to overwrite the BOM bytes 
at the beginning, which is basically part of what handle_buffer() is 
supposed to be doing, but instead it calls yet another function named 
handle_bom() which is so short and simple that it should very much be in 
the handle_buffer().

Speaking only for myself, this is type of "chase the sub-logic" stuff is 
*so* confusing, and the problem becomes much worse when the "mini 
functions" are not directly near the "real function" in the file or 
worse yet in a whole other file.  I'd have a *much* easier time reading 
a 200 line function that just does the one thing all these separate 
sub-functions combined are meant to do: automatically convert the 
encoding of the buffer.

> I guess we won't ever all agree on this, so we all have to be tolerant.
>

Sadly, probably not :(

Cheers,
Matthew Brush