Subject: Re: [stella] optimization : sneaky fractional bit representation From: emooney@xxxxxxxxxxxxxxxx (Erik Mooney) Date: Thu, 04 Jun 1998 01:25:02 GMT |
>Oh, yea of little faith > >Multiplying by 11 is the same as multiplying by 12 and subtracting the >original. >That's 2 shifts ( = x4 ) + another shift ( = *8) - the original. Or, to be >really back-to-basics, what is wrong with a loop which adds the number 11 >times? The assumption in the code following is that the result will not >exceed 40 bits, and that "original..." holds the original number. Result is >left in "temp..." [snip] Sure, I know you can do routines to multiply by any number with only a few shifts, but for a password thingy like this, you'd need to be able to multiply by any number that's the size of any value you want to encode. >There we go, that wasn't too hard. Slow? Well, slowish... but speed isn't >the issue when we're encoding/decoding a password. Too many cycles in one >whack? Do one of the loops every frame. Think laterally... instead of >making the 40 bits in one monolithic block, break it down and generate the >40 bit stream 5 bits at a time. Sure, a bit more complex to generate... but >you really aren't limited to doing it the way my walkthrough suggested. > >Its about time we started talking about efficient multiply and divide >routines - which are certainly possible. This is about the third time >someone has balked at the word "multiply" or "divide". Would anybody care >to post some multiply / divide code? Wonder of wonders, we *just* went over an algorithm for that 30 minutes ago in my computers class. To multiply C by B, where both are eight digit registers, you need one extra 8-bit storage area (we'll call it A) and one 1-bit storage area (we'll call it E.). C gets destroyed, but B remains intact. Initialize C and E to 0. Repeat 8 times: If bit 0 of C is 1, then add B to C, carrying into E if necessary. Shift the 17-bit quantity EAC to the right. This puts the product as a 16-bit quantity in AC. I don't quite know how it works, but it does. In 6502 code, I'd imagine it looks like this, with MulB and MulC being two memory locations for the numbers we're multiplying, MulA for A, and the carry for E. LDX #8 beginloop CLC LDA MulC AND #01 BEQ noadd LDA MulA ADC MulB STA MulA noadd ROR MulA ;first part of the 17-bit shift ROR MulC ;complete the 17-bit shift DEX BNE beginloop Multiplies any two 8-bit numbers in MulB and MulC, leaving the product as a 16-bit value in MulA:MulC. Maximum possible cycles is 2 + 8 * (2 + 3 + 2 + 2 + 3 + 3 + 3 + 5 + 5 + 2 + 3) = 266. Minimum possible is 56 less (the middle BEQ never taken), so it's fairly consistent as well. And it's only 22 bytes. I tested this in the PCAE debugger, and it seems to work. There's a similar divide algorithm, but we aren't doing that till Monday :) -- Archives (includes files) at http://www.biglist.com/lists/stella/archives/ Unsub & more at http://www.biglist.com/lists/stella/stella.html
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [stella] optimization : sneaky , Andrew Davie | Thread | Re: [stella] optimization : sneaky , Chris Wilkson |
Re: [stella] optimization : sneaky , Andrew Davie | Date | Re: [stella] optimization : sneaky , Chris Wilkson |
Month |