Re: [stella] Finally underway: 2600 Cookbook

Subject: Re: [stella] Finally underway: 2600 Cookbook
From: Paul Slocum <paul-stella@xxxxxxxxxxxxxx>
Date: Sun, 11 Apr 2004 23:36:05 -0500

I started work on 2600 Cookbook, what might end up being my
biggest contribution to the 2600 homebrew community.

I started on pretty much the same thing a while back. I've attached what I did in hopes that you can integrate it into your guide. Some of the sections are pretty much finished, and I have notes on quite a few other subjects I thought should eventually be covered. Although you've already listed a lot of them.


So, I'd love to get some feedback on the format (which like I said,
I've found super friendly in the past)

When I was writing my guide, my goal was to have a document where I could quickly find the trick I needed and drop it right into my program. I'm more interested in short generic code snippets than full examples. I'd prefer to have a code snippet on the webpage that I can cut and paste into my code, and then maybe have a link to a full example that I can download and play with.


Also, I'd like to see some of the important tables and diagrams in text format rather than images or HTML tables so they can be pasted into the code as a comment for quick reference.

-paul
==============================================================
ATARI 2600 ADVANCED PROGRAMMING GUIDE

02-02-03
Compiled and edited by Paul Slocum
Written by the Atari 2600 programming community
==============================================================

This guide is intended to be a supplement to the standard Stella Programmer's Guide.  For more information:

The Stella mailing list:
http://www.biglist.com/lists/stella/

The Stella list archives (used to compile much of this document):
http://www.biglist.com/lists/stella/archives/

The Dig (selected material from the Stella list archives):
http://www.neonghost.com/the-dig/index.html


==============================================================
TABLE OF CONTENTS
==============================================================
USING BRK WITH RESXX
BRK SUBROUTINE TRICK
BANKSWITCHING
SHOWING MISSILES USING PHP
SOUND AND MUSIC
ILLEGAL OPCODES
CONSTANT CYCLE COUNT TO AVOID WSYNC
HIGH RESOLUTION 48-PIXEL WIDE GRAPHICS
INSURANCE AGAINST TOO MANY VBLANK/OVERSCAN CYCLES
CHECKING THE NUMBER OF SCANLINES
COUNTING DOWN WHEN LOOPING
PADDLES
WASTING CYCLES
SKIPDRAW





- naming conventions
- usage of DASM (SUBROUTINE, MACRO, defines/includes)
- avoid "magic numbers"
- separating code and data


- aligning (incl. calculations with .)
- using segments (especially for the zeropage)
- bankswitching
- variables reusage
- things to avoid during VSYNC ...


= Buffered Playfield kernal

= Atari 2600 sample playback?

- usage of VDEL in general

- BIT tricks

- register and single bits reusing

- avoiding page penalties

- early HMOVEs
http://www.biglist.com/lists/stella/archives/199804/msg00186.html 

- shifting vs. ANDing

- branching out of the kernel flow to keep the slowest part as fast as
possible

- normal and Fatal Run positioning routines

- exact timings for writing to PFx (in both modes)

- multiple RESP tricks

- constant carry status (e.g. EOR instead of CMP) to avoid CLC/SEC

- trying to preserve X and Y inside the kernel

- using branches instead of JMPs, using JMP,...RTS instead of
JSR,...RTS,RTS at the end of a subroutine

- multiplication tricks with shifts vs. using a table

- using TXS, TSX to save CPU cycles





==============================================================
ILLEGAL OPCODES
Thomas Jentzch
 Todo: Explain and give DASM setup
==============================================================

    DCP      $C3       M <- (M)-1, (A-M) -> NZC   (Ind,X)          2/8


    DOP      $04       [no operation]             (Z-Page)         2/3


    LAX      $AF       A <- M, X <- M             (Absolute)       3/4




==============================================================
HIGH RESOLUTION 48-PIXEL WIDE GRAPHICS
==============================================================





==============================================================
PADDLES
Thomas Jentzch
 Todo: Explain and include how to discharge cap
==============================================================

Assumes Y is your kernal line counter.

          lda INPT0               ;3
          bmi paddles1            ;2 or 3
          .byte $2c               ;1    bit abs opcode
 paddles1:
          sty padVal1             ;3



==============================================================
WASTING CYCLES
Christopher Tumbler, Chris Wilkson, Andrew Davie
 Todo: Verify Andrew's
==============================================================

These are the most efficient ways to waste processor cycles.  Note that
locations $2D-$3F do nothin and aren't decoded.  The only case where this might be a problem is if you're using an unusual bankswitching setup.


  1 Cycle (0 or 1 byte)
    .w  (Change a zero page instruction to absolute, adds 1 byte of code)
    ,x  (Change a zero page or absolute instruction to an indexed instruction.
	 Make sure x=0.  Can also use Y)

  2 Cycles (1 byte)
    nop

  3 Cycles (2 bytes)
    sta $2D
    - or -
    lda $2D
    - or -
    dop (Double NOP illegal opcode)

  4 Cycles (2 bytes)
    nop
    nop

  5 Cycles (2 bytes)
    dec $2D
    - or -
    sta $1800,X  ; asssumes you can write to ROM without problems

6@2
    lda ($80,X)  ; assumes possible reads from 0-$7f have no effect


  6 Cycles (3 bytes)
    nop
    nop
    nop

  7 Cycles (2 bytes, need 1 byte free on stack)
    pha
    pla

8@3
    lda ($80,X)  ; assumes possible reads from 0-$7f have no effect
    nop
  
  9 Cycles (3 bytes, need 1 byte free on stack)
    pha
    pla
    nop

  9 Cycles (4 bytes)
    dec $2D
    nop
    nop

  10 Cycles (4 bytes)
    dec $2D
    dec $2D
    - or -
    rol $80
    rol $80 ; leaves $80 unchanged

11@4 .. a few assumptions here
    ASSUMING we can safely write to ROM and have nothing disasterous
    STA $8000,X
    LDA ($80,X)        ; assumes possible reads from 0-$7f have no effect

  12 Cycles (3 bytes, need 2 bytes free on stack)
    jsr return
    ; somewhere else
  return:   
    rts

12@4  without stack
    LDA ($80,X)        ; assumes possible reads from 0-$7f have no effect
    LDA ($80,X)        ; assumes possible reads from 0-$7f have no effect





  Also:
    You can use PHA/PHP (1 byte 3 cycles) or PLA/PLP (1 byte 4 cycles) alone but you have to be carefull not to mess up your stack (PLP/PHA would be usefull if you have no stack!

  

==============================================================
SKIPDRAW
Thomas Jentzch
 Todo: Explain and clean up
==============================================================

The best way, i knew until now, was (if y contains linecounter):
  tya                   ; 2
 (sec                   ; 2) <- this can sometimes be avoided
  sbc SpriteEnd         ; 3
  adc #SPRITEHEIGHT     ; 2
  bcx .skipDraw         ; 2 = 9-11 cycles
  ...

If you like using illegal opcodes, you can use dcp (dec,cmp) here:
  lda #SPRITEHEIGHT     ; 2
  dcp SpriteEnd         ; 5     initial value has to be adjusted
  bcx .skipDraw         ; 2 = 9
  ...
---------------------------------------------------------------------
LAX
---------------------------------------------------------------------
Using stack as temp storage
TXS
TSX
---------------------------------------------------------------------
long NOP
 .byte   $0c ; NOOP



==============================================================
==============================================================
==============================================================
FINISHED SECTIONS
==============================================================
==============================================================
==============================================================




==============================================================
SHOWING MISSILES USING PHP
==============================================================

This trick is originally from Combat and is probably the most efficient way to display the missiles and/or ball.  This trick just requires that you don't use the stack during your kernal.  Recall that:

  ENABL = $1F
  ENAM1 = $1E
  ENAM0 = $1D

In this example I'll show how to use the trick for both missiles.  You can easily adapt it for the ball too.  To set the trick up, before your kernal save the stack pointer and set the top of the stack to ENAM1+1.

  tsx                    ; Transfer stack pointer to X
  stx SavedStackPointer  ; Store it in RAM
  ldx #ENAM1+1		
  txs                    ; Set the top of the stack to ENAM1+1

Now during the kernal you can compare your scanline counter to your missile position register and this will set the zero flag in the processor.  Then to enable/disable the missile for that scanline, just push the processor flags onto the stack.  The ENAxx registers use bit 1 to enable/disable which corresponds with the zero flag in the processor, so the enable/disable will be automatic.  It takes few cycles and doesn't vary the number of cycles depending on the result like branching usually does.

  ; On each line of your the kernal...
  cpy MissilePos1        ; Assumes Y is your kernal line counter
  php
  cpy MissilePos0
  php

Then before you do it again, somewhere on each scanline you need to pull off the stack again using two PLA's or PLP's, or you can manually reset the stack pointer with ldx #ENAM1+1, txs.

After your kernal, restore the stack pointer:

  ldx SavedStackPointer
  txs




==============================================================
BANKSWITCHING
==============================================================

For a bankswitching reference you'll want Kevin Horton's sizes.txt reference:
http://www.tripoint.org/kevtris/files/sizes.txt

One thing you'll probably want to know for any kind of bankswitching is the RORG assembler directive.  RORG is like ORG except it only affects the way label addresses are handled, not where the code is placed in the ROM.  So let's say you're working on an 8K ROM.  The first bank will start with ORG $1000 and all the code and data for the first bank will follow.  At the start of the second 4K bank, you'll want:
	ORG $2000
	RORG $1000

If you don't use RORG, all your label addresses in the second bank will be in the $2000-$2FFF range which won't work.  With RORG, they will continue to address the $1000-$1FFF range.

Here's a basic template for doing F8 bankswitching.  This can easily be modified to work with similar bankswitching methods (F4,F6,etc).  This code allows you to call "Bank2Subroutine" from bank 1 using jsr CallBank2Subroutine.


  ;------------------------------------
  ;This code in bank 1
  ;Switches to bank 2 where the subroutine it called.
      org $1FE0
  CallBank2Subroutine
      ldx $1FF9	  ; switch to bank 2
      nop         ; 1FE3 jsr Bank2Subroutine
      nop         ; .
      nop         ; .
      nop         ; 1FE6 lda $1FF8  (Switch back to bank 1)        
      nop         ; .
      nop         ; .
      rts

  ;------------------------------------
  ;This code in bank 2:
  ;Calls the subroutine and returns to bank 1
      org $2FE3
      rorg $1FE3

      jsr Bank2Subroutine
      ldx $1FF8    ;(Switch back to bank 1)        


It's good practice to assume that your multi-bank ROM could start up in any bank.  In each bank, set up the startup vector so it points to code that switches to the correct startup bank and then jumps to the start of your program.



==============================================================
USING BRK WITH RESXX
Eckhard Strolberg
==============================================================

(I'm not sure what you'd do with this trick but it's pretty interesting.  Maybe somebody will figure out how to use it.)

Pole Position puts the stack pointer over the RESxx registers and then does a BRK. There are three write cycles in a BRK instruction, so the three position registers for the objects that make up the road in PP, get accessed in three consecutive cycles. This is how PP managed to get the road to meet so closely in the horizon.




==============================================================
INSURANCE AGAINST TOO MANY VBLANK/OVERSCAN CYCLES
Paul Slocum
==============================================================

It's difficult to make sure that there is no unusual case where your game logic in VBlank and Overscan will use too many cycles and cause the number of scanlines in a frame to fluctuate.  To insure against this problem, pad your VBlank and Overscan with a few STA WSYNCs to add extra cycles during development.  Optimize your game so that it runs fine with these in.  Then when the game is finished and ready for release, comment out those extra WSYNCs.  This is also an easy way to estimate how much time you have left in VBlank and Overscan: keep adding WSYNCs (each line is 76 cycles) until the screen jumps.



==============================================================
CHECKING THE NUMBER OF SCANLINES
Eckhard Strolberg
==============================================================

You'll want to verify that your program is drawing the desired number of scanlines (around 262 NTSC and 312 PAL) and is not varying that number while running.  To do this, use the Z26 emulator in video modeo 9.  While Z26 is running, press ALT-9 to enter this mode which will display the number of scanlines in the upper right corner.  The -v9 switch will start Z26 in this mode.


==============================================================
SOUND AND MUSIC
==============================================================

Atari 2600 Music Programming Guide and music driver code:

http://qotile.net/sequencer.html

Eckhard Strolberg's Frequency and Waveform Guide:

http://buerger.metropolis.de/estolberg



==============================================================
COUNTING DOWN WHEN LOOPING
==============================================================

Have loops count down whenever possible since this can save a lot of cycles and a little bit of ROM too.

------------------------------------------

  Counting up requires a compare:

      ldx #0
  Loop
      [your code...]
      inx
      cpx #20
      bne Loop

------------------------------------------

  Counting down allows you to get rid of the compare:

      ldx #20
  Loop
      [your code...]
      dex
      bne Loop



==============================================================
CONSTANT CYCLE COUNT TO AVOID WSYNC
==============================================================

Normally you use STA WSYNC towards the end of each scanline in the kernal to stay in sync with the TV.  But by carefully programming your kernal so each line consistently takes exactly 76 cycles, you can avoid using STA WSYNC at all in your kernal loop.  This will save you at least 3 cycles per scanline.



==============================================================
BRK SUBROUTINE TRICK
Mark Lesser, Thomas Jentzch
==============================================================

Thomas found this trick in Mark Lesser's Lord of the Rings prototype:  You can use BRK to call a subroutine that needs to be called often and save ROM space.  If you aren't familiar with BRK, it pushes the flags and PC on the stack and jumps to wherever the vector $FFFE is pointing.

Thomas found BRK commands like this scattered through Lord of the Rings:

    brk
    .byte $0e        ; id-byte
    lda    $e3       ; <- here we will continue
    ora    #$04
  
And the BRK vector was pointing to this routine:

BrkRoutine:
    plp             ; remove flags from stack (not needed)
    tsx             ; load x with stackpointer
    inx             ; x++
    dec    $00,x    ; adjust return address
    lda    ($00,x)  ; read break-id...
    tay             ; ...and store in y
Subroutine:
    [subroutine code...]
    rts       

So it ended up being the equivalent of passing a value to a subroutine similar to this:

   ldy     #value
   jsr     Subroutine

But it saves 3 bytes with each call and the overhead is only 8 bytes.
After only 3 subroutine calls (Lord of the Rings has about 20) you are saving ROM space.

In the case of Lord of the Rings, the subroutine was sound related code that selected the sound effect to be played based on a priority system.

Current Thread