Rework ecp_mod_p192()

On x86_64, this makes it 5x faster, and ecp_mul() 17% faster for this curve.
The code is shorter too.
2 files changed