Ninkasi Bugs 2
Fixed one more Ninkasi bug today!
Functions in Ninkasi have three undocumented variables. Named function
arguments just refer to stack positions relative to the current stack
frame, so it was pretty simple to add variables that refer to the
function ID itself, the return pointer, and the argument count,
because all of these values exist on the stack just like the arguments
to the function itself.
These are considered debugging-only features, but they should at least
be not-broken debugging-only features.
These variables are:
-
_functionId
- The function being called. This can be called as a
function, itself. There are a few things to watch out for with it,
however. If used with callable objects, the value will be the
object and not the function. This is because the 'call'
instruction accepts that object as its function argument and
then alters the argument list and argument count.
-
_argumentCount
- The number of arguments. This will include the
object's _data
field when usedwith a callable object, which will
be different from the actual argument count from the call. This
is less useful because Ninkasi does not (yet) support varargs
function calls for functions defined in the scripting system
though it someday could. Altering this will likely cause stack
corruption.
-
_returnPointer
- The instruction pointer to return to when the
"return" instruction executes. Not super useful from a scripting
standpoint, unless you want some insight about the caller. If you
want to live dangerously, you can even alter it before returning!
(But I don't recommend it. Will also cause stack corruption,
probably.)
The Bug
At some point along the way, I had designed the 'call' instruction to
expect a stack that looked like this (example 3-parameter function):
index | value | type
---------------------------------
-6 | ... |
-5 | arg0 | any type
-4 | arg1 | any type
-3 | arg2 | any type
-2 | _functionId | function
-1 | _argumentCount | integer
Later down the road, I had to change it to this:
index | value | type
---------------------------------
-6 | ... |
-5 | _functionId | function
-4 | arg0 | any type
-3 | arg1 | any type
-2 | arg2 | any type
-1 | _argumentCount | integer
I'm not sure off the top of my head why I had to do this, but my guess
is that it was simply the order that arguments needed to be evaluated
and then pushed onto the stack.
Anyway, when I made this change, I apparently forgot to update the
position of _functionId
. The arguments still needed a +1 offset in
the stack that I always considered a sort of mystery. Well, the +1
offset was to skip the _functionId
, and the functionId
was
pointing to one of the arguments.
The Fix
This was a pretty simple fix. Just remove the offset, and move the
_functionId
setup code to before the argument setup loop.
Sometimes it really is that easy!
Posted: 2021-09-06
Ninkasi Bug Fixes
I spent a bit of time hacking on Ninkasi code this weekend!
I got rid of one terrible idea I implemented, replaced it with
something much better, fixed one crash bug, and two potential
coroutine-related memory leaks!
Removing One Ugly Anti-feature
First order of business was to change the object-function-call
stupidity I put in there forever ago. It's an anti-feature I never
should have considered. Here's how it went:
An object can have values inside of it assigned to functions, like so:
function testFunc(testValue)
{
print("Value passed in: ", testValue, "\n");
}
var ob = object();
ob["foo"] = testFunc;
Previously, these functions can be called like so:
ob.foo();
When called in this manner, the object itself is inserted as the first
argument to the function. This is roughly equivalent to the following
code:
{
var _temp = ob.foo;
_temp(ob);
}
And the output of this is, predictably:
Value passed in: <object:0>
The problem with this is that there's no way to distinguish between
when you want the object to be inserted as the first argument to a
function and when you do not.
My reasoning at the time I developed this "feature" was sketchy at
best, so I finally got around to cutting it out. Now there is no
special case considered for calling a function directly from an
object's field using '.'.
Now this code...
function testFunc(testValue)
{
print("Value passed in: ", testValue, "\n");
}
var ob = object();
ob["foo"] = testFunc;
ob.foo();
... is a runtime error, as it should be:
error:
test.nks:14: Incorrect argument count for function call. Expected: 1 Found: 0
To replace this feature, because still like some aspects of it, I
have stolen and repurposed C's indirection operator. The following
code acts the way ob.foo()
acted before:
ob->foo();
Which is equivalent to:
ob.foo(ob);
This gets us back to the original result:
Value passed in: <object:0>
But now you get to decide if you want to call a function on an object
without the object itself being added in as an argument, by using the
->
or the .
.
One Crash Bug
Callable Objects
First, an explanation of the feature this affected:
Callable objects in Ninkasi are a little oddity I thought it would be
useful to throw in. It's sort of a halfway point to actual closures.
Objects may be called as functions through the use of
specially-named fields.
Take the following code:
var ob = newobject;
ob._exec = function(foo) {
print("foo: ", foo, "\n");
};
ob._data = "blah blah blah";
ob();
The output of this code is:
foo: blah blah blah
It's that simple. Set an object's _exec
field to a function, set an
object's _data
field to a value, and you may call the object itself
as though it were a function.
This code...
ob();
... is equivalent to this code...
ob._exec(ob._data);
You can even add extra arguments after the _data
one, which is inserted first:
var ob = object();
ob._exec = function(foo, bar) {
print("foo: ", foo, ", ", bar, "\n");
};
ob._data = "blah blah blah";
ob("whatever");
Output:
foo: blah blah blah, whatever
The Bug
This feature does, unfortunately, involve some weird shifting-around
of stack data.
This part wasn't thought out super-well, so bare with me here.
The setup for the function-call operator involves pushing the function
ID onto the stack, followed by all the function arguments, followed by
an integer indicating the number of arguments there are.
Here's what the stack will look like for a normal 3-parameter function
call, at the time the call opcode executes (with stack indices
relative to the top of the stack):
index | value | type
------------------------------
-6 | ... |
-5 | function-ID | function
-4 | arg0 | any type
-3 | arg1 | any type
-2 | arg2 | any type
-1 | 3 | integer
We have the function ID, followed by the arguments, followed by an
argument count.
In order for the _data
field to always be first, we need the stack
to look like this:
index | value | type
------------------------------
-7 | ... |
-6 | function-ID | function
-5 | _data | any type
-4 | arg0 | any type
-3 | arg1 | any type
-2 | arg2 | any type
-1 | 4 | integer
That just involves inserting the _data
into the stack, shifting
everything forward by 1, and adding 1 to the argument count entry.
I know this is probably tickling everyone's inefficient-VM sense, but
this feature was kind of an afterthought for my personal toy language,
so be nice. :)
Anyway, our bug today was in the code to shift the stack data around.
The argument count coming into the function call opcode could be
larger than the stack, and it would happily underrun the bottom of the
stack, leading to your usual C buffer overrun/underrun type of error.
That code is also not the prettiest code in the world, leading me to
think I was either tired or in a rush when I wrote it. Let's have a
look...
// Make room for the _data contents on the stack.
nkiVmStackPush_internal(vm);
// Shift everything up the stack.
for(i = argumentCount; i >= 1; i--) {
vm->currentExecutionContext->stack.values[
vm->currentExecutionContext->stack.size - argumentCount - 2 + i + 1] =
vm->currentExecutionContext->stack.values[
vm->currentExecutionContext->stack.size - argumentCount - 2 + i];
}
Yikes.
How to Setup This Bug
So here's an interesting thing: You can't.
Well, let me be more clear, you can't if you're using the scripting
language directly. This is purely a vulnerability exposed through the
binary state snapshots.
I haven't actually reverse-engineered AFL's crash output for this one,
to be entirely honest, but my thinking is that the bad executable just
pushes a bad argument count onto the stack before doing a normal
callable-object call.
The Fix
First, I know what you're going to say. How dare I not have a system
in place to protect against these kinds of overrun/underruns, when
it's so easy to check for!? That usually comes after "why the hell are
you using C89 for this project when it's absolutely prone to these
kinds of errors!?" followed by a recommendation that I switch to Rust.
The thing is, though, I do have a way of safely interacting with the
stack. I'm just not using it here because... uh... reasons?
The stoage space allocated to the stack only doubles in size when it
reallocates, so, given that the stack capacity is always a power of
two, it's pretty easy to just mask off the lowest some-number-of bits
for stack indices going into the array. I even store that stack mask,
and use it like I actually know what I'm doing in most spots!
The fix (along with some much-needed cleanup) look like this:
struct NKVMStack *stack = &(vm->currentExecutionContext->stack);
struct NKValue *stackValues;
nkuint32_t stackSize;
nkuint32_t stackMask;
nkuint32_t stackSizeMinusArguments;
// Make room for the _data contents on the stack.
nkiVmStackPush_internal(vm);
stackValues = stack->values;
stackSize = stack->size;
stackMask = stack->indexMask;
stackSizeMinusArguments = stackSize - argumentCount - 2;
// Shift everything up the stack.
for(i = argumentCount; i >= 1; i--) {
stackValues[(stackSizeMinusArguments + i + 1) & stackMask] =
stackValues[(stackSizeMinusArguments + i) & stackMask];
}
This also throws in a few extra variables to cut down on those giant
globs of indirection and indexing inside the loop, potentially
offering a speedup on compilers that can't optimize out those
redundant operations (not that it's a particularly fast piece of
code to begin with).
Memory Leaks
I'm not going to get into too much detail with this one, because it's
pretty simple. A failure during coroutine creation would make the
program allocate the storage space needed for the coroutine's
execution context (including stack, program counter, etc) and then
fail to set the pointer to store it, causing it to leak.
The fix is just to check for those failures and clean up the mess when
it does.
Memory still Tracked
So with this leak still have a backup system in place for these.
Allocations inside the VM are actually tracked and easily cleaned up
during deallocation of the VM object itself. It's not a C memory leak,
it's just a leak of VM address space, more or less.
When the VM's address space is exhausted, it will bail out with its
own internal allocation error and can still be cleaned up by the
hosting application. So as far as the stability of the hosting
application is concerned, and in relation to potentially hostile
script code, this isn't a "real" threat. It does suck for long-running
scripts, though.
Final Thoughts
AFL is great. Valgrind is awesome, too. All these issues found
today were through those two tools. Thinking you can write secure code
without them or some equivalent to them is hubris in its purest form.
The issues with coroutine setup here are definitely showing my lack of
testing after writing the coroutine code. The callable object issues
are something I wasn't expecting, because that code's been in there
forever.
Ninkasi may never be perfect and vulnerability-proof, but at least it
got a little bit closer today, and I'm pretty happy with that.
Posted: 2021-09-05