I am no test fanatic myself but reproducing is always the first step to understand (and then solve the problem); if it were easily possible to "code" the steps to reproduce a test, we'd get for free some coverage against regressions; but don't want to reiterate, we can also just close this (or wait some other feedback, at your option).
(NOTE: I have never used autopilot myself but I am familiar with other UI testsuites) From what I had understood on my initial research Autopilot already has "introspection" features (see https://developer.ubuntu.com/api/devel/ubuntu-14.04/autopilot/api/introspect...) so that one could access GTK objects without relying on A11Y features of the application.
As for the plugins, you're right, any discussion for a "plugins testsuite" for 3rd-parties is probably worth be separate.