Hm, if that helps, I found you a few cycles.....
You've got some good ideas. I'm in the middle of converting to 32k and
rewriting part of the level data structures so I can spread them across
banks. And I just found out that none of the emulators support standard
32k mode without extra RAM. It does work on the Cuttle Cart,
though. After I finish this, I'll go over the kernal again and see if I
can use your ideas to clear a few more cycles. Thanks for taking a look at it.
The idea of interleaving the color data is great. Originally I had 24
color bytes which didn't require the ASL, but I needed more RAM so I
changed it to 12. I probably would have made it interleaved if it had been
12 from the beginning. I just didn't think about it when I did the conversion.
And why did you have to do code 26(!) kernel lines? Can you describe
what is going on in each of those lines? It looks that you are doing the
same over and over and use the remaining cycles for
paddles/colors/looping etc. Maybe with some optimizing you can reduce
the number of kernel lines too.
I got the idea from examining the Maze Craze kernal. This type of kernal
helps when you are displaying an asymmetric playfield on each line, but it
only changes every few lines (13 in my case). It's especially helpful if
you have to do any logic on the playfield data before displaying it. And I
do since PF0-left and PF0-right are stored in the high and low nibble of a
byte.
The idea is that you store each of the indexed playfield bytes in a buffer
when you have extra time towards the start of the loop, and a few scanlines
into the loop, you'll have them all in a buffer. Before they're in the
buffer you have to read them all with indexes which takes an extra
cycle. Plus I have to use four ASLs on the PF0 data. But once I get
everything in the buffer I'm saving something like 12 cycles per line. And
after everything is in the buffer I don't need X to index the playfield
data any more, so I save it in the stack register and transfer one of the
playfield buffer values into X which saves me another 3 cycles per
line. By the end of the loop you've got tons of extra cycles to do other
stuff and get ready for the extra cycles that are needed to loop and fill
the buffer again.
I later realized I needed the X register again, which is why there is all
that messy X manipulation going on. I should probably see if I can rework
that whole thing.
The reason there are two 13 line kernals is to alternate which playfield
lines are colored, and to alternate which paddles are read. The kernal is
so fragile that I couldn't figure out a way to use one kernal to do
this. The hardest part is the coloring. The color change in the middle of
the screen has to be timed perfectly.
Even with your current code, the full 256 bytes make no sense, because
the values for Y are not ranging from 0..255 when drawing the ball. You
initialize Y with 156, so that's the maximum of bytes you should have to
waste.
The problem is that to display it at the top and bottom I need roughly 140
empty bytes above and below the marble image. That's fine, except there's
no way to do it without indexing across a page boundary, which causes an
extra cycle on each line. My kernal totally breaks with an extra cycle per
line.
-Paul
----------------------------------------------------------------------------------------------
Archives (includes files) at http://www.biglist.com/lists/stella/archives/
Unsub & more at http://www.biglist.com/lists/stella/