It's a relatively small change that adds support for argument hinting in CUDA kernel calls (of the form
MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);
) as well as C# generic function calls (of the form
Swap<int>(ref a, ref b);
).
I've never done a pull request before; is there anything else I need to do to make a contribution?