Re: [stella] MultiSpriteDemo update (source+binary)

Subject: Re: [stella] MultiSpriteDemo update (source+binary)
From: emooney@xxxxxxxxxxxxxxxx (Erik Mooney)
Date: Tue, 08 Apr 1997 07:29:53 GMT
>>>I wonder why are you asking this :) ...maybe you want to use all the
>>>available cycles during a scanline, without even doing the STA WSYNC?
>>I had a routine that needed to rewrite both GRP registers each line while
>>updating all three PF registers twice per scanline (non-repeating
>hmmm... looks like you're working on something very interesting... :)

I was working on a kernel for an Arkanoid-type game, using the playfield
for the walls and the GRP registers for the capsules (and of course the
ball for the ball).  It wasn't working out the way I had it - there just
weren't enough cycles.  Trimming it down to 32 blocks wide and setting
PF-Reflect so it didn't use PF0 (see Super Breakout), and restricting the
capsules so that only one could be in any vertical zone, necessitating only
one GRP write per scanline, worked with just barely enough time left over
to check the ball (not using the PHP trick, though.)  Positioning sprites
while making four playfield writes on the same line is just about
impossible, though, so I leave a one-scanline gap between rows of blocks
(again see Super Breakout) during which I can reposition RESP1 for the next

>>and checking the ball's Y coordinate and writing to ENABL if necessary
>There's that incredible PHP trick for this... I always wonder if Stella
>designers had already it in mind when they created the hardware.

That's the trick that sets the stack register to ENAMx or ENABL, compares
two memory locations (or any register and one memory location) and then
just does PHP because the "equal" flag is bit 1 of P which matches the data
bit that ENAxx is looking for, right?  It is a very nice routine, fast and
does not branch.  It works as long as you only want a one-scanline high
object, or if you only run the routine every N lines, where N is the height
of the object in scanlines.

I think the Stella designers must have had this in mind, because what other
reason is there for ENAxx using bit 1 instead of bit 0 for data?  This
works very well for single-height objects.  And considering this routine is
used in Combat, I'm pretty sure it was intentional.

This would not quite work for games such as Centipede/Millipede, which seem
to run the enable-player-bullet routine every three lines but it's a
six-line object.  Is there an easy way to modify the PHP routine for a
multiple height object?  Looks like you'd SBC the two numbers and if the
result is less than N you'd enable the object.. this would work with an
easy way of setting the Z flag from the carry flag without a branch. 

Warning, untested code!  For the following code, MissileY equals the LAST
scanline on which you want the missile to display, Scanline equals the
current scanline number, and the code assumes you want a missile of height
4 scanlines.  (so if MissileY = 50, the missile will be enabled for
scanlines 47 through 50.)

LDX #$1E        ;for ENAM1 - could also be used for ENABL or ENAM0
LDA MissileY
SBC Scanline    ;A has (MissileY - Scanline).  If it is >=0 but <4,
                ;we want the carry clear.
ADC #252        ;If 0 <= A <= 3, the carry will now be clear.
LDA #00
ADC #00         ;If the carry was clear, A now = 0, so Z is set.
PHP             ;Plug it into ENABL.

This can be optimized to

LDX #$1E        ;+2  2
TXS             ;+2  4
LDA MissileY    ;+3  7
SEC             ;+2  9
SBC Scanline    ;+3 12
ADC #251        ;+2 14
LDA #00         ;+2 16
ADC #00         ;+2 18
PHP             ;+3 21

because if MissileY >= Scanline, the carry was set after the SBC so ADC
#251 does the same thing as CLC+ADC #252 before.. and if MissileY <
Scanline, A=a large unsigned number after the SBC, so the carry will be set
whether we add 251 or 252.

Not bad, 21 cycles to handle an object of any height.. for more than one
object, you have to set S for the first but not afterward.. making 38
cycles for two objects, and 55 for three.  With three objects, it leaves
enough time to rewrite either the playfield or GRPx, but probably not both.

Comparing with an alternate approach, using branching:
LDX #0          ;+2  2
LDA MissileY    ;+3  5
SEC             ;+2  7
SBC Scanline    ;+3 10
CMP #3          ;+2 12
BCS L1          ;+2 14
LDX #2          ;+2 16
L1 STX ENAM1    ;+3 19(18 if branch taken)

This takes 18-19 for one object, 36-38 for two and 54-57 for three, so the
timing is pretty close to the same.  Unless someone can optimize one
instruction out of either routine?

Archives available at
E-mail UNSUBSCRIBE in the body to stella-request@xxxxxxxxxxx to be removed.

Current Thread