Re: [stella] I did it!(SFCaves 2600)

Subject: Re: [stella] I did it!(SFCaves 2600)
From: Mark De Smet <de-smet@xxxxxxxxxxxxxxx>
Date: Mon, 2 Oct 2000 23:30:40 -0500 (CDT)
> I had a bit of a look at this code.

I appreciate the help :-)

> Why not include the LDA #0 inside the loop, at the start, instead of
> duplicating it at the end....

It doesn't really hurt, as it only adds a few bytes to the code space,
which I'm not concerned about.  However, I had to do it in order to get my
second writes to PFx to occur before the registers are used.  I am bit
mapping the playfield, so I have to update the PF registers mid line.  If
I move that LDA as you suggest, then the write will be to late.  I had
started out with it as you describe, but was forced to do this to make the
timing work.

>  loopcave1
>         lda #0
>         sta WSYNC
>         ... [snip/snip]
>         ;LDA #00 ;2                << get rid of this = 2 bytes saving

Well, doesn't save 2 bytes, simply moves them.

>         DEX
>         BNE loopcave1        << restore branch, if <128 bytes... branches
> are quicker!
> > BEQ noloopcave1 ;34;check if done, otherwise, loop.
> > JMP loopcave1 ;3

I wish...  I had the BNE, but it exceded the 128 bytes, and I don't know
how to make it shorter.

> from memory, JMP instructions are 4 cycles, not 3.  They're nearly always
> slower than branches

Ya, but I didn't really have a choice.  Bensema's guide to cycle counting
says 3.

> I also recall writes to memory locations being 3 cycles for zero page, 4
> cycles otherwise.

Yes, did I have any STA's marked as something besides 3?  All writes are
zero page.(that's where all the memory and registers are located.)

> You don't need the 1st wsync in the loop, either, if you get your timing
> exactly right.  another 6 cycles and5 bytes saving per loop.

That is what I want to do.

> > TXA ;2
> > ADC height ;3
> > TAY ;2
> And here (above) you have forgotten to clear the carry.  It's probably in a
> known state, given your earlier subtraction wouldn't overflow... but this is
> something you have to fix, or make SURE you know the state of the carry
> before you do the addition.  You could also consider using a table for the
> above...

It is known, it is 1 because as you point out, the prevous subtraction
won't overflow.  However, to save the time, I have adjusted for the extra
subtraction by adding 1 to the data.

>     ldy xplusheight,x            ; 5  (or ldx blah,y  ... i can't recall
> which is OK .. if either!)
> an alternative
>     lda xplusheight,x
>     tay
> Sometimes this is a better way to do it (that is, using a table to do
> constant additions for you)... if you needed to set the carry to do the
> first method, this is smaller in code size and cycles.  Cost is the bytes
> required for your table.  I'm unsure if your "height" is constant.

This is the sort of things I'm looking for, I think I have more rom space
than kernal time,l so I'd like to make the trade.  As you suggest, height
is a variable however.  It is limited in scope though, so I'll think about
it and see if I can replace it with a table anyway.

> > LDA #00 ;2
> >
> > SEC ;2
> > ;74
> > ; STA WSYNC ;3
> >
> >
> > ;74

> Avoid stringing out the SEC so far away from its needed use...  unless you
> really need it there for cycle counting.
> It's too easy for it to get misplaced/lost/changed before it is actually
> needed.

Ya, I should probably clean that up ;-)

> The code seems to load one batch of data, mask out (AND) some bits, save it
> all, then load it all up again and OR in some bits, then write to the
> registers.  It might be possible to improve the efficiency by using both
> index registers and doing the and/or in one hit.  I haven't spent enough
> time with the code to look at the pros/cons... especially regarding the scan
> line timing.  But here's the gist of it...

The AND comes in the first two lines kernal, and the OR in the second.
They manipulate differenet sets of data.(well, same data, differenet
order, which makes all the difference.)

>     phx
>  CLC
>  LDA currleftcol
>  ADC leftcave,X
>  STA currleftcol
>  TAY
>  SEC
>  LDA currrightcol     ;3 ;load the column from last line.
>  SBC rightcave,X      ;45 ;add on the shift.
>  tax
>  LDA cavedata1a,Y
>  and cavedata2a,x
>  STA PF0
> ... etc all the way through the bytes
>     plx
> This is probably another of those "nice code, but you can't do it that way
> (timing!)" things.
> But, something to think about :)

Good idea.  I'm sure I did it the way I did because I did the first half
of the code, then added in the second half after the first worked.  I
think this will work, but I'll have to see if it saves time.  Of course
there is only one way to find out :^P  I think the trick will be timing
the placement of the the writes to the PFx to work out right.

Thanks for your help!


Archives (includes files) at
Unsub & more at

Current Thread