speed up fft_un16
speed up fft_untwopass
speed up real convolutions
deal with d-cache associativity and linker issues

provide asm for various machines
tweak asm for better scheduling
tweak asm for register parameter passing
consider splitting forward and inverse code; try to improve i-cache use

do Pentium Pro/II version
do simultaneous-two-pass implementation
speed up out-of-cache transforms

autogenerate fft_r* at compile time
split multiplyr4_*, multiplyr8_* into separate .o files

use better random-number generator in accuconv.c
