Here are a few routines that I've put together to make up for Coldfire's lack of 32 x 32 -> 64 and 64 / 32 -> 32 instructions.
mulu64.S is the basic sum of partial products using four MULU.W instructions. The basic problem here is to get the 16-bit pieces in the proper places so they can be summed.
muls64_m.S This one uses the new fractional mode of the MAC to get the upper 32 bits of the signed product. (Requires "J" mask 5307 or 5407.) 0x80000000 squared is a special case for the MAC. This is treated as -1 * -1, but as the MAC fractional mode does not have a representation for 1, the MAC V bit is set.
mulu64_m.S This is the unsigned version of muls64_m.S. It applies a correction to convert the signed multiply into an unsigned one. (The same correction may be applied to mulu64.S also.)
divu64_1.S A basic, bit-by-bit, unsigned divide routine. It uses a few tricks to speeds things up a bit.
divu64_2.S Pretty much the same as divu64_1.S, but uses Coldfire's one conditional data instruction (Scc) to avoid doing some conditional branches.
divs64_1.S Basic signed divide. Calculates abs(a)/abs(b) and remainder, then fixes up the signs.
divs64_2.S Another signed divide routine. This one is based on divu64_1.S, but builds up the quotient in a seperate register.
divu64_c.S Unsigned "long" division routine. Uses the DIVU.W instruction to speed things up a bit. This runs in about 6 us on a SBC5206elite board with the cache enabled, vs about 9 us for the bit-by-bit versions.
Last modified: April 4, 2003
Wayne Deeter - wrd@deetour.net