After looking over the algorithm again and again, the bitwise operations become less daunting... They're just part of the scrambling process. Quick and clever too.
More interesting is how the rounds go by and let the same registers get the same operations performed on them, yet be completely different each time by looking up values of various variables from previous rounds and incorporating them into the mix.
Breaking the main loop into three high level steps seems to be the best approach rather than try to explain everything at once. There's just too much going on to do that.
step 0: initialize the first intermediate hash and pad the message, also break it up into 32-bit chunks.
In the for loop:
step 1: initialize the temporary registers to the old intermediate hash sub-values
step 2: the heart of the algorithm. Repeat 64 times: apply a bunch of bitwise operations to the registers and incorporate previous calculated values from other variables.
step 3: make the new intermediate hash sub-values from the registers' values found in step 2.
do it all N times, where N is the number of 32-bit blocks the message is.
concatenate the final hash sub-values to get the actual digest.