1 - overrides/16bit for string ops
2 - optimize translated cache chaining (DLL PLT-like system)
6 - make it self runnable (use same trick as ld.so : include its own relocator and libc)
7 - improved 16 bit support
8 - fix FPU exceptions (in particular: gen_op_fpush not before mem load)