A small amount of stats analysis, mostly max callback time, with simple
display in the UI.
Also improves pow calculation to use lut implementation instead of
math.h pow(), for a speedup somewhere around 20-30%.
The FM kernel yields itself well to speedup using NEON assembler. This
patch contains the NEON assembly code, plus C integration code
(including making sure that buffers are aligned to 16 bytes).
We were only using a very few STL functions (min, max, and iostream for
debug logging). This patch gets rid of those dependencies (implementing
the needed functions in synth.h), and turns on the "all" ABI target, so
that it works with all native architectures supported by the NDK.