Extremely highly optimized implementations of the ladder filter in Neon
assembler. The audio loops (both linear and non-linear) are hooked up
and running, while the matrix generation is still running in scalar
code, though the Neon version has been tested and benchmarked.
Performance numbers on Nexus 10: linear audio loop = 22.5 cycles.
Nonlinear audio loop = 62 cycles. Matrix generation = 580 cycles.
Note that the current code will crash on ARM v7 devices without Neon
(for example, Motorola Xoom).