Re: [stella] optimization : sneaky fractional bit representation

Subject: Re: [stella] optimization : sneaky fractional bit representation
From: emooney@xxxxxxxxxxxxxxxx (Erik Mooney)
Date: Thu, 04 Jun 1998 01:25:02 GMT
>Oh, yea of little faith
>
>Multiplying by 11 is the same as multiplying by 12 and subtracting the
>original.
>That's 2 shifts ( = x4 ) + another shift ( = *8) - the original.  Or, to be
>really back-to-basics, what is wrong with a loop which adds the number 11
>times?  The assumption in the code following is that the result will not
>exceed 40 bits, and that "original..." holds the original number.  Result is
>left in "temp..."
[snip]
Sure, I know you can do routines to multiply by any number with only a few
shifts, but for a password thingy like this, you'd need to be able to
multiply by any number that's the size of any value you want to encode.

>There we go, that wasn't too hard.  Slow?  Well, slowish... but speed isn't
>the issue when we're encoding/decoding a password.  Too many cycles in one
>whack?  Do one of the loops every frame.  Think laterally... instead of
>making the 40 bits in one monolithic block, break it down and generate the
>40 bit stream 5 bits at a time.  Sure, a bit more complex to generate... but
>you really aren't limited to doing it the way my walkthrough suggested.
>
>Its about time we started talking about efficient multiply and divide
>routines - which are certainly possible.  This is about the third time
>someone has balked at the word "multiply" or "divide".  Would anybody care
>to post some multiply / divide code?

Wonder of wonders, we *just* went over an algorithm for that 30 minutes ago
in my computers class.  To multiply C by B, where both are eight digit
registers, you need one extra 8-bit storage area (we'll call it A) and one
1-bit storage area (we'll call it E.).  C gets destroyed, but B remains
intact.  Initialize C and E to 0.
Repeat 8 times:
  If bit 0 of C is 1, then add B to C, carrying into E if necessary.
  Shift the 17-bit quantity EAC to the right.

This puts the product as a 16-bit quantity in AC.

I don't quite know how it works, but it does.  In 6502 code, I'd imagine it
looks like this, with MulB and MulC being two memory locations for the
numbers we're multiplying, MulA for A, and the carry for E.

    LDX #8
beginloop
    CLC
    LDA MulC
    AND #01
    BEQ noadd
    LDA MulA
    ADC MulB
    STA MulA
noadd
    ROR MulA    ;first part of the 17-bit shift
    ROR MulC    ;complete the 17-bit shift
    DEX
    BNE beginloop

Multiplies any two 8-bit numbers in MulB and MulC, leaving the product as a
16-bit value in MulA:MulC.  Maximum possible cycles is 2 + 8 * (2 + 3 + 2 +
2 + 3 + 3 + 3 + 5 + 5 + 2 + 3) = 266.  Minimum possible is 56 less (the
middle BEQ never taken), so it's fairly consistent as well.  And it's only
22 bytes.

I tested this in the PCAE debugger, and it seems to work.  There's a
similar divide algorithm, but we aren't doing that till Monday :)

--
Archives (includes files) at http://www.biglist.com/lists/stella/archives/
Unsub & more at http://www.biglist.com/lists/stella/stella.html

Current Thread