cleanup: typedef double small[...] for all processors
cleanup: split linear operations out of ecdouble, ecadd
cleanup: unify temporary small definitions for all processors
cleanup: unify 96127, ecdouble, ecadd for all processors
cleanup: move spill variables out of opt-*-*.c into globals
cleanup: unify c2d, d2c to the extent possible
cleanup: unify order of subroutines for generic i-cache friendliness

do, and publish, all necessary numerical analysis

consider karatsuba: looks bad for x86 squaring, close otherwise
speed up t[] initialization in nistp224(): take advantage of 1's
maybe precompute only odd powers; interleave add with doubles in main loop

athlon: deal with 8-byte instruction alignment
ppro/pii/piii: need a better simulator
p4: investigate sse2 instructions
powerpc: deal with the callee-save insanity; consider f2/fg/fg8h2
