[stella] "Zombie" optimizations

Subject: [stella] "Zombie" optimizations
From: Thomas Jentzsch <tjentzsch@xxxxxx>
Date: Fri, 17 Jan 2003 09:16:59 +0100
Hi,
while "desperately" seeking for free cycles to fulfill Glenn's Death
Derby wishes, I remembered some code I already used in Thrust more than
two years ago. 

The known 'elegant' DCP-Solution needs 26 cycles and 11 of those cycles
are overhead for just determining wether to draw or not. The new
solution reduces that overhead down to only 5 cycles!

The main difference (and disadvantage) is that it requires a 2nd RAM
register which stores the end of the vertically displayed area of the
Zombie in a very special way. The trick is to use a *signed* compare.

Here is a brief explanation:

Above the displayed area M0_Y (which is simply loaded with the value
that the Y-register will have when to start drawing) is signed
compared with Y. As long as the result is is positive and not equal, the
draw routine is skipped.

When the result is equal, the 2nd RAM variable is copied into M0_Y. This
one is loaded with Y value at the end of the display area ORA #$80
(enabling bit 7). Therefore, as long as Y is inside the display area the
signed compare with Y will always be negative and the Zombie is drawn.

Below the displayed area the compare will become positive again until
the end of the kernel.


Due to the signed compare this trick works on in kernels smaller than
128 kernel loops, which is no problem for 2LK.

For adapting the new routine to DD, I had rearrange the code a bit
(because of the length of the equal branch), so I "only" gain 5 cycles:

    cpy     M0_Y            ; 3
    bmi     .drawM0         ; 2³      this taken branch costs one cycle
    bne     .disableM0      ; 2³
;.enableM0:
    lda     M0_Y2           ; 3       copy the end value into
    sta     M0_Y            ; 3        the compared one
    lda     #2              ; 2       enable the missile
    sta     ENAM0           ; 3
    bpl     .continueM0     ; 3 = 21

.disableM0:                 ; 8       this branch is executed above
    SLEEP   5               ; 5        and below the displayed area
    lda     #0              ; 2       disable the missile
    sta     ENAM0           ; 3
    beq     .continueM0     ; 3 = 21

.drawM0:                    ; 6
    lda     (M0_Ptr),y      ; 5       bytes stored as: mmmmss00
    sta     HMM0            ; 3
    asl                     ; 2
    asl                     ; 2
    sta     NUSIZ0          ; 3 = 21
.continueM0:

M0_Y is initialized with distance from the bottom of the kernel,
M0_Y2 with M0_Y - ZOMBIE_HEIGHT + $81

Example:
M0_Y = $40, ZOMBIE_HEIGHT = 8, M0_Y2 = $40 - (8-1) | $80 = $c9

Y = $7f..$41: N-Flag = 0, Z-Flag = 0 -> skip drawing
Y = $40     : N-Flag = 0, Z-Flag = 1 -> copy M0_Y2 into M0_Y
Y = $3f..$39: N-Flag = 1, Z-Flag = ? -> draw
Y = $38..$00: N-Flag = 0, Z-Flag = ? -> skip drawing

Have fun!
Thomas
_______________________________________________________
Thomas Jentzsch         | *** Every bit is sacred ! ***
tjentzsch at web dot de |

----------------------------------------------------------------------------------------------
Archives (includes files) at http://www.biglist.com/lists/stella/archives/
Unsub & more at http://www.biglist.com/lists/stella/


Current Thread