Make SHA-2 implementation smaller

Adjust the size/performance trade-off:
* Reduces size of sha256_process() from 7.4KB to 2KB on ARMv7-M
* Reduces performance by less than 14% on Cortex-M4
* Seems to even improve performance on my Core i7
1 file changed