Re: [stella] Re: Thrust 1.0

Subject: Re: [stella] Re: Thrust 1.0
From: "Andrew Davie" <adavie@xxxxxxxxxxxxx>
Date: Fri, 8 Sep 2000 00:38:30 +1000
Since Thomas specifically mentioned me in his recent email regarding the
optimization of source code - and since I have a highly important multi-part
assignment due tomorrow morning - and it is almost midnight here - well, I'm
obviously looking for anything but assignment work to do.  I'll have a go.

Thomas's original post is copied, below.

> ..digitLoop:
>     dey                     ; 2  61
>     lda     (digcolPtr),y   ; 5  66
>     and     digcolMask      ; 3  69
>     sta     COLUP0          ; 3  72
>     sta     COLUP1          ; 3  75
>     lda     (digitPtr+$a),y ; 5   4
>     sta     GRP0            ; 3   7
>     lda     (digitPtr+$8),y ; 5  12
>     sta     GRP1            ; 3  15
>     lda     (digitPtr+$6),y ; 5  20
>     sta     GRP0            ; 3  23
>     lax     (digitPtr+$0),y ; 5  28     illegal
>     txs                     ; 2  30
>     lax     (digitPtr+$4),y ; 5  35     illegal
>     lda     (digitPtr+$2),y ; 5  40
>     stx     GRP1            ; 3  43
>     sta     GRP0            ; 3  46
>     tsx                     ; 2  48
>     stx     GRP1            ; 3  51
>     sty     GRP0            ; 3  54
>     tya                     ; 2  56
>     bne     .digitLoop      ; 2³ 59     when loop ends: y=0

The aim is to remove the "illegal" lax instructions, but leave the code at
least as efficient.

By the way, it's not necessary to use "$8" or "$4" - hex and decimal values
are equivalent here - just "8" or "4" are exactly the same.  And, of course,
+$0 is only there for clarity :)  Thomas's branching structure is probably
better achieved by jumping (or branching, if possible) to the end of the
loop on the first iteration, then you don't have to have an extra "tya" just
to set the Z flag.  See below...

OK, I'm aware there are some timing issues here, and I don't have a setup
where I can really check this out - or the time.  But here's the loop again,
with no illegal opcodes.

    jmp     .digitEnd                      ; use a branch, instead, if you
can

.digitLoop:

    lda (digcolPtr),y                      ; 5
    and digcolMask                         ; 3
    sta COLUP0                             ; 3
    sta COLUP1                             ; 3

    lda (digitPtr+10),y                    ; 5
    sta GRP0                               ; 3
    lda (digitPtr+8),y                     ; 5
    sta GRP1                               ; 3
    lda (digitPtr+6),y                     ; 5
    sta GRP0                               ; 3

    lda (digitPtr+4),y                     ; 5
    sta GRP1                               ; 3
    lda (digitPtr+2),y                     ; 5
    sta GRP0                               ; 3
    lda (digitPtr),y                       ; 5
    sta GRP1                               ; 3
    sty GRP0                               ; 3

.digitEnd:
    dey                                    ; 2
    bpl .digitLoop                         ; 3

That was pretty simple.  This is a logical (but not functional or timing)
equivalent loop.  Exept it won't trash the stack pointer :)  Obviously it
won't work as it is, because the writes to GRP0/1 have to happen at a
certain time (don't they?).  Now, what I don't know at this stage is the
constraints for when writes to GRP0 and GRP1 can occur.  If somebody would
care to post the exact cycle ranges under which these writes can occur, we
may have a bit more fun working on optimising the loop WITHOUT illegal
opcodes.

At this stage we have the X and S "registers" free for our use.

There is a fair bit of leeway for shifting things back and forth - the
critical section is how much time we have to write GRP0/GRP1/GRP2 for the
2nd time.   There are a few things I don't understand about Thomas's code...
(I have no manuals on hand, and its been a few years since I really looked
at the 2600 manual), so I may goof here and there.  Bear with me :)

Looking at the timing, the actual loop in my rewrite takes 70 cycles,
compared to 76 for the original. That's a saving of 6 cycles - plus or
minus - but of course we need to get the timing right.  Obviously having the
extra cycles gives us room to do more stuff - but the trick will be the
tightness of the writing of those registers for the 2nd time.

So, Thomas... anyone... what exactly are the times under which writes to
each of the GRPx registers are allowable?  Obviously we have to put in a few
NOPs here or there (or other code!) anyway - to get timing for the line
correct.  Shifing the actual GRPx stores around to stay within the
flicker/hardware timing constraints MAY be possible.  As I said, I need more
information to do any better.

Hey, at lest I'm having a stab at it :)

Cheers
A

--
Andrew Davie adavie@xxxxxxxxxxxxx & adavie@xxxxxxxxxxxxxxxxx ICQ #3297382
Museum of Soviet Calculators @ www.taswegian.com/MOSCOW/soviet.html
FAQ @ www.taswegian.com/TwoHeaded/faq.html  Work @ www.bde3d.com


----- Original Message -----
From: "Thomas Jentzsch" <tjentzsch@xxxxxx>
To: <stella@xxxxxxxxxxx>
Sent: Friday, August 04, 2000 2:40 AM
Subject: [stella] Re: Thrust 1.0


> Andrew Davie wrote:
> > Anyway, Thomas... how about it?  How about sharing those areas where you
> > feel you *must* use the illegal opcodes and let those of us who are on
> > higher moral ground :))  tell you why you don't need to :)
>
> Ok, actually the score-displaying routine is the only place where i
> really need the illegal opcodes.
>
> I used them get some free time (16 cycles) for other effects. The
> result is actually only a *47* pixel routine, because the right pixel
> of the first and the third digit (if i remember it correct) are always
> the same. That works for me, the score display has blank pixels there,
> and THRUST too (except for the long T-line). Here is the result:
>
> ..digitLoop:
>     nop                     ; 2  61
>     nop                     ; 2  63
>     nop                     ; 2  65
>     nop                     ; 2  67
>     nop                     ; 2  69
>     nop                     ; 2  71
>     nop                     ; 2  73
>     nop                     ; 2  75
>     lda     (digitPtr+$a),y ; 5   4
>     sta     GRP0            ; 3   7
>     lda     (digitPtr+$8),y ; 5  12
>     sta     GRP1            ; 3  15
>     lda     (digitPtr+$6),y ; 5  20
>     sta     GRP0            ; 3  23
>     lax     (digitPtr+$0),y ; 5  28     illegal
>     txs                     ; 2  30
>     lax     (digitPtr+$4),y ; 5  35     illegal
>     lda     (digitPtr+$2),y ; 5  40
>     stx     GRP1            ; 3  43
>     sta     GRP0            ; 3  46
>     tsx                     ; 2  48
>     stx     GRP1            ; 3  51
>     sty     GRP0            ; 3  54
>     dey                     ; 2  56
>     bpl     .digitLoop      ; 2³ 59
>
> As you can see, i saved the time by using LAX and the stack pointer.
>
> Then i added the color-scrolling effect, wasted some time (AND, TYA) to
> get the code working for my needs, and that's how it looks like now:
>
> ..digitLoop:
>     dey                     ; 2  61
>     lda     (digcolPtr),y   ; 5  66
>     and     digcolMask      ; 3  69
>     sta     COLUP0          ; 3  72
>     sta     COLUP1          ; 3  75
>     lda     (digitPtr+$a),y ; 5   4
>     sta     GRP0            ; 3   7
>     lda     (digitPtr+$8),y ; 5  12
>     sta     GRP1            ; 3  15
>     lda     (digitPtr+$6),y ; 5  20
>     sta     GRP0            ; 3  23
>     lax     (digitPtr+$0),y ; 5  28     illegal
>     txs                     ; 2  30
>     lax     (digitPtr+$4),y ; 5  35     illegal
>     lda     (digitPtr+$2),y ; 5  40
>     stx     GRP1            ; 3  43
>     sta     GRP0            ; 3  46
>     tsx                     ; 2  48
>     stx     GRP1            ; 3  51
>     sty     GRP0            ; 3  54
>     tya                     ; 2  56
>     bne     .digitLoop      ; 2³ 59     when loop ends: y=0
>
> There are a few other places in the code where i saved some bytes and
> cycles with illegal opcodes (LAX to get a value in two registers, a
> long NOP instead of BIT to skip one instruction and keep the flags,...),
> but i could make them legal again.
>
> Thomas Jentzsch         | *** Every bit is sacred ! ***
> tjentzsch at web dot de |
> _______________________________________________________________________
> 1.000.000 DM gewinnen - kostenlos tippen - http://millionenklick.web.de
> IhrName@xxxxxx, 8MB Speicher, Verschluesselung - http://freemail.web.de
>
>
> --
> Archives (includes files) at http://www.biglist.com/lists/stella/archives/
> Unsub & more at http://www.biglist.com/lists/stella/
>


--
Archives (includes files) at http://www.biglist.com/lists/stella/archives/
Unsub & more at http://www.biglist.com/lists/stella/

Current Thread