OS: Arch Linux x86_64 Kernel: 4.15.1-2-ARCH Geany version: 1.32 ("built on or after 2018-01-29") CPU: Intel i5-8600K (6 cores, each 3.6GHz) DE: Xfce
I have [a pretty big JSON file](https://github.com/geany/geany/files/1713744/json-file.txt) on my computer and I formatted it with terminal colours (see here for what I mean with terminal colors [link](https://misc.flogisoft.com/bash/tip_colors_and_formatting#colors2)). I can display that file on my terminal with colors without problems, but when I open it with geany, it immediately crashes with a "Segmentation fault (core dumped)". I experimented a little bit, trying to pinpoint the exact problem with that file and I noticed a few things: - That error occurs only when the filetype is set to JSON - The file has to be saved for the error to occur. (It cannot be copy-pasted into a new "untitled" document, nothing happens then) - It IS possible to delete parts of the file so that the weird behavior stays, but there is not one specific line that causes the problem. If I remember correctly, the error goes away when I delete from the start until the ~16000th line or when I delete from the ~40000th line until the end of the file (Line numbers may vary by ~1000 lines plus or minus). There could be more areas of that sort in between those two though.
Your "JSON" file isn't JSON, its full of escape character rubbish that are terminal control characters. Of course it will display in the terminal, but its not legal [JSON](https://www.json.org/). The crap includes many `[`s and as the traceback shows the parser is looking for the matching `]`s recursively and running out of stack because all the `[`s make it nested too deeply.
This is therefore not a bug.
#0 0x00007ffff7b39bad in matchRegex (line=0x13b9930, language=48) at main/lregex.c:509 #1 0x00007ffff7b44dcb in iFileGetLine () at main/read.c:449 #2 0x00007ffff7b44eb0 in getcFromInputFile () at main/read.c:484 #3 0x00007ffff7b4528f in getcFromInputFile () at main/read.c:469 #4 0x00007ffff7b37385 in readTokenFull (token=token@entry=0x1442050, includeStringRepr=includeStringRepr@entry=false) at parsers/json.c:149 #5 0x00007ffff7b37645 in skipToOneOf3 (token=token@entry=0x1442050, type1=TOKEN_CLOSE_SQUARE, type2=TOKEN_EOF, type3=TOKEN_EOF) at parsers/json.c:246 #6 0x00007ffff7b37662 in skipToOneOf3 (token=token@entry=0x1442050, type1=TOKEN_CLOSE_SQUARE, type2=TOKEN_EOF, type3=TOKEN_EOF) at parsers/json.c:254 #7 0x00007ffff7b37662 in skipToOneOf3 (token=token@entry=0x1442050, type1=TOKEN_CLOSE_SQUARE, type2=TOKEN_EOF, type3=TOKEN_EOF) at parsers/json.c:254 #8
repeats until stack full.
IMO, under no circumstances should the program crash. That said, it's a bug in Scintilla's JSON parser, not Geany.
There is no way to prevent a recursive program from running out of stack if its given pathological input like this unless you wish to make some arbitrary limit on nesting levels that isn't there in the language (JSON) and may be too small on large systems that have big stacks and may be too big on small systems with small stacks.
@codebrainz if you get Linus and MS and BSD to accept a new API to provide the real stack size (since existing ones return address space not physical limits) and then measure how much stack each iteration takes and divide to get the nesting limit we can prevent overflow.
Or you can unwind the stack on overflow, recognise its a recursive parser problem and unwind back to where the loop started, terminate the loop and return to Geany now with stack available. That might "only" take patches to Glibc.
Or you can make the parser non-recursive and control what memory it uses each nesting level so you don't blow heap instead. Pull requests are welcome :grin:
A program that accepts arbitrarily large inputs should probably not use recursion like that. The Javascript parser should use iteration as even valid JSON inputs could cause stack overflow how it is now, IIUC.
even valid JSON inputs could cause stack overflow how it is now, IIUC.
Yep, the JSON language is defined recursive, so it will take growing memory to parse, either stack or heap so at some point its gonna run out. Unfortunately current OSes don't make it easy to find out when you are about to run out.
so it will take growing memory to parse, either stack or heap so at some point its gonna run out
Stack size on my machine is 8MB, heap is somewhere in the neighborhood of 32GB...
github-comments@lists.geany.org