Subject: Re: [stella] some more optimization tips From: "Andrew Davie" <adavie@xxxxxxxxxxxxxxxxx> Date: Sun, 10 May 1998 15:57:10 +1000 |
OK guys and gals... here's another. Let's say we're looping, and every pass through the loop we add some value to an index register, then loop only if it doesn't exceed a certain value... LDX #0 LOOP ; do something TXA CLC ADC #6 TAX ; increase index by 6 every loop CPX #24 BCC LOOP ; keep going until we hit 24 RTS Here we're skipping by 6 - but these methods will work for other values. Seems simple enough, but that ugly bit, doing the looping, at the end costs us 13 cycles and 9 bytes every time we loop. The real killer is the 13 cycles. In loops in general, a handy optimisation is to start at the end and work backwards. This is because the flags will be set for us automatically when we pass 0, removing the need for a compare. LDX #18 LOOP ; do something TXA SEC SBC #6 TAX BCS LOOP OK, that reduces it to 11 cycles, 7 bytes. Not bad. Have a look at this next bit of code... LDA #24 ; start at the one AFTER the last one LOOP TAX ; do something here ; your index register is 6 more than you "want" ; so use indexing like this... ; LDA MYVARIABLE-6,X LDA NEXT-6,X ; next index BNE LOOP RTS NEXT .BYTE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 .BYTE 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 ... OK, the table is costly, but you have 4K of ROM - and it gets cheaper - read on! It needs to be big enough to ensure that your use of this method won't overflow it. But, how many cycles do YOU have to spare? What we're doing is letting the assembler do the branch calculation effectively ahead of run-time, and just looking up the new value in a table. We're also using the assembler to adjust the indexing inside the loop for us (that's the MYTABLE-6,X bit). By rearranging things (note the TAX at the beginning, to save doing it needlessly on the last iteration of the loop - saving two cycles over the entire loop), we've reduced the "ugly bit at the end" to just 9 cycles per loop, and the CODE part to only 6 bytes. As we can re-use that NEXT table all over our code, using this method becomes cheaper the more you use it (we save 1 byte of code, and 2 cycles per loop iteration on the previous example, and a whopping - well, whopping is small on a 2600 :) - 4 cycles and 3 bytes per iteration on the original method). Here's some sample calculations - lets take a 32 x loop. I've calculated all cycles relevant to the looping, setup... Original = 9 bytes and 417 cycles ( 2 + (13*32) - 1) 2nd version = 11 bytes and 353 cycles final version = 6 bytes + table, and 289 cycles. There is a big big difference in the 2600 universe between 289 and 417 cycles. That's a 30% speedup, by my calculations. We're rich! Best of all, it makes your code even more unreadable. Isn't that what we all aspire for? Go on, tell me you love it. Enjoy! A -- Archives (includes files) at http://www.biglist.com/lists/stella/archives/ Unsub & more at http://www.biglist.com/lists/stella/stella.html
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [stella] some more optimization, Retroware | Thread | Re: [stella] some more optimization, Andrew Davie |
[stella] 1998 Atari Survey, Songbird | Date | Re: [stella] some more optimization, Andrew Davie |
Month |