Subject: Re: [stella] Finally underway: 2600 Cookbook|
From: Paul Slocum <paul-stella@xxxxxxxxxxxxxx>
Date: Sun, 11 Apr 2004 23:36:05 -0500
I started work on 2600 Cookbook, what might end up being my biggest contribution to the 2600 homebrew community.
So, I'd love to get some feedback on the format (which like I said, I've found super friendly in the past)
============================================================== ATARI 2600 ADVANCED PROGRAMMING GUIDE 02-02-03 Compiled and edited by Paul Slocum Written by the Atari 2600 programming community ============================================================== This guide is intended to be a supplement to the standard Stella Programmer's Guide. For more information: The Stella mailing list: http://www.biglist.com/lists/stella/ The Stella list archives (used to compile much of this document): http://www.biglist.com/lists/stella/archives/ The Dig (selected material from the Stella list archives): http://www.neonghost.com/the-dig/index.html ============================================================== TABLE OF CONTENTS ============================================================== USING BRK WITH RESXX BRK SUBROUTINE TRICK BANKSWITCHING SHOWING MISSILES USING PHP SOUND AND MUSIC ILLEGAL OPCODES CONSTANT CYCLE COUNT TO AVOID WSYNC HIGH RESOLUTION 48-PIXEL WIDE GRAPHICS INSURANCE AGAINST TOO MANY VBLANK/OVERSCAN CYCLES CHECKING THE NUMBER OF SCANLINES COUNTING DOWN WHEN LOOPING PADDLES WASTING CYCLES SKIPDRAW - naming conventions - usage of DASM (SUBROUTINE, MACRO, defines/includes) - avoid "magic numbers" - separating code and data - aligning (incl. calculations with .) - using segments (especially for the zeropage) - bankswitching - variables reusage - things to avoid during VSYNC ... = Buffered Playfield kernal = Atari 2600 sample playback? - usage of VDEL in general - BIT tricks - register and single bits reusing - avoiding page penalties - early HMOVEs http://www.biglist.com/lists/stella/archives/199804/msg00186.html - shifting vs. ANDing - branching out of the kernel flow to keep the slowest part as fast as possible - normal and Fatal Run positioning routines - exact timings for writing to PFx (in both modes) - multiple RESP tricks - constant carry status (e.g. EOR instead of CMP) to avoid CLC/SEC - trying to preserve X and Y inside the kernel - using branches instead of JMPs, using JMP,...RTS instead of JSR,...RTS,RTS at the end of a subroutine - multiplication tricks with shifts vs. using a table - using TXS, TSX to save CPU cycles ============================================================== ILLEGAL OPCODES Thomas Jentzch Todo: Explain and give DASM setup ============================================================== DCP $C3 M <- (M)-1, (A-M) -> NZC (Ind,X) 2/8 DOP $04 [no operation] (Z-Page) 2/3 LAX $AF A <- M, X <- M (Absolute) 3/4 ============================================================== HIGH RESOLUTION 48-PIXEL WIDE GRAPHICS ============================================================== ============================================================== PADDLES Thomas Jentzch Todo: Explain and include how to discharge cap ============================================================== Assumes Y is your kernal line counter. lda INPT0 ;3 bmi paddles1 ;2 or 3 .byte $2c ;1 bit abs opcode paddles1: sty padVal1 ;3 ============================================================== WASTING CYCLES Christopher Tumbler, Chris Wilkson, Andrew Davie Todo: Verify Andrew's ============================================================== These are the most efficient ways to waste processor cycles. Note that locations $2D-$3F do nothin and aren't decoded. The only case where this might be a problem is if you're using an unusual bankswitching setup. 1 Cycle (0 or 1 byte) .w (Change a zero page instruction to absolute, adds 1 byte of code) ,x (Change a zero page or absolute instruction to an indexed instruction. Make sure x=0. Can also use Y) 2 Cycles (1 byte) nop 3 Cycles (2 bytes) sta $2D - or - lda $2D - or - dop (Double NOP illegal opcode) 4 Cycles (2 bytes) nop nop 5 Cycles (2 bytes) dec $2D - or - sta $1800,X ; asssumes you can write to ROM without problems 6@2 lda ($80,X) ; assumes possible reads from 0-$7f have no effect 6 Cycles (3 bytes) nop nop nop 7 Cycles (2 bytes, need 1 byte free on stack) pha pla 8@3 lda ($80,X) ; assumes possible reads from 0-$7f have no effect nop 9 Cycles (3 bytes, need 1 byte free on stack) pha pla nop 9 Cycles (4 bytes) dec $2D nop nop 10 Cycles (4 bytes) dec $2D dec $2D - or - rol $80 rol $80 ; leaves $80 unchanged 11@4 .. a few assumptions here ASSUMING we can safely write to ROM and have nothing disasterous STA $8000,X LDA ($80,X) ; assumes possible reads from 0-$7f have no effect 12 Cycles (3 bytes, need 2 bytes free on stack) jsr return ; somewhere else return: rts 12@4 without stack LDA ($80,X) ; assumes possible reads from 0-$7f have no effect LDA ($80,X) ; assumes possible reads from 0-$7f have no effect Also: You can use PHA/PHP (1 byte 3 cycles) or PLA/PLP (1 byte 4 cycles) alone but you have to be carefull not to mess up your stack (PLP/PHA would be usefull if you have no stack! ============================================================== SKIPDRAW Thomas Jentzch Todo: Explain and clean up ============================================================== The best way, i knew until now, was (if y contains linecounter): tya ; 2 (sec ; 2) <- this can sometimes be avoided sbc SpriteEnd ; 3 adc #SPRITEHEIGHT ; 2 bcx .skipDraw ; 2 = 9-11 cycles ... If you like using illegal opcodes, you can use dcp (dec,cmp) here: lda #SPRITEHEIGHT ; 2 dcp SpriteEnd ; 5 initial value has to be adjusted bcx .skipDraw ; 2 = 9 ... --------------------------------------------------------------------- LAX --------------------------------------------------------------------- Using stack as temp storage TXS TSX --------------------------------------------------------------------- long NOP .byte $0c ; NOOP ============================================================== ============================================================== ============================================================== FINISHED SECTIONS ============================================================== ============================================================== ============================================================== ============================================================== SHOWING MISSILES USING PHP ============================================================== This trick is originally from Combat and is probably the most efficient way to display the missiles and/or ball. This trick just requires that you don't use the stack during your kernal. Recall that: ENABL = $1F ENAM1 = $1E ENAM0 = $1D In this example I'll show how to use the trick for both missiles. You can easily adapt it for the ball too. To set the trick up, before your kernal save the stack pointer and set the top of the stack to ENAM1+1. tsx ; Transfer stack pointer to X stx SavedStackPointer ; Store it in RAM ldx #ENAM1+1 txs ; Set the top of the stack to ENAM1+1 Now during the kernal you can compare your scanline counter to your missile position register and this will set the zero flag in the processor. Then to enable/disable the missile for that scanline, just push the processor flags onto the stack. The ENAxx registers use bit 1 to enable/disable which corresponds with the zero flag in the processor, so the enable/disable will be automatic. It takes few cycles and doesn't vary the number of cycles depending on the result like branching usually does. ; On each line of your the kernal... cpy MissilePos1 ; Assumes Y is your kernal line counter php cpy MissilePos0 php Then before you do it again, somewhere on each scanline you need to pull off the stack again using two PLA's or PLP's, or you can manually reset the stack pointer with ldx #ENAM1+1, txs. After your kernal, restore the stack pointer: ldx SavedStackPointer txs ============================================================== BANKSWITCHING ============================================================== For a bankswitching reference you'll want Kevin Horton's sizes.txt reference: http://www.tripoint.org/kevtris/files/sizes.txt One thing you'll probably want to know for any kind of bankswitching is the RORG assembler directive. RORG is like ORG except it only affects the way label addresses are handled, not where the code is placed in the ROM. So let's say you're working on an 8K ROM. The first bank will start with ORG $1000 and all the code and data for the first bank will follow. At the start of the second 4K bank, you'll want: ORG $2000 RORG $1000 If you don't use RORG, all your label addresses in the second bank will be in the $2000-$2FFF range which won't work. With RORG, they will continue to address the $1000-$1FFF range. Here's a basic template for doing F8 bankswitching. This can easily be modified to work with similar bankswitching methods (F4,F6,etc). This code allows you to call "Bank2Subroutine" from bank 1 using jsr CallBank2Subroutine. ;------------------------------------ ;This code in bank 1 ;Switches to bank 2 where the subroutine it called. org $1FE0 CallBank2Subroutine ldx $1FF9 ; switch to bank 2 nop ; 1FE3 jsr Bank2Subroutine nop ; . nop ; . nop ; 1FE6 lda $1FF8 (Switch back to bank 1) nop ; . nop ; . rts ;------------------------------------ ;This code in bank 2: ;Calls the subroutine and returns to bank 1 org $2FE3 rorg $1FE3 jsr Bank2Subroutine ldx $1FF8 ;(Switch back to bank 1) It's good practice to assume that your multi-bank ROM could start up in any bank. In each bank, set up the startup vector so it points to code that switches to the correct startup bank and then jumps to the start of your program. ============================================================== USING BRK WITH RESXX Eckhard Strolberg ============================================================== (I'm not sure what you'd do with this trick but it's pretty interesting. Maybe somebody will figure out how to use it.) Pole Position puts the stack pointer over the RESxx registers and then does a BRK. There are three write cycles in a BRK instruction, so the three position registers for the objects that make up the road in PP, get accessed in three consecutive cycles. This is how PP managed to get the road to meet so closely in the horizon. ============================================================== INSURANCE AGAINST TOO MANY VBLANK/OVERSCAN CYCLES Paul Slocum ============================================================== It's difficult to make sure that there is no unusual case where your game logic in VBlank and Overscan will use too many cycles and cause the number of scanlines in a frame to fluctuate. To insure against this problem, pad your VBlank and Overscan with a few STA WSYNCs to add extra cycles during development. Optimize your game so that it runs fine with these in. Then when the game is finished and ready for release, comment out those extra WSYNCs. This is also an easy way to estimate how much time you have left in VBlank and Overscan: keep adding WSYNCs (each line is 76 cycles) until the screen jumps. ============================================================== CHECKING THE NUMBER OF SCANLINES Eckhard Strolberg ============================================================== You'll want to verify that your program is drawing the desired number of scanlines (around 262 NTSC and 312 PAL) and is not varying that number while running. To do this, use the Z26 emulator in video modeo 9. While Z26 is running, press ALT-9 to enter this mode which will display the number of scanlines in the upper right corner. The -v9 switch will start Z26 in this mode. ============================================================== SOUND AND MUSIC ============================================================== Atari 2600 Music Programming Guide and music driver code: http://qotile.net/sequencer.html Eckhard Strolberg's Frequency and Waveform Guide: http://buerger.metropolis.de/estolberg ============================================================== COUNTING DOWN WHEN LOOPING ============================================================== Have loops count down whenever possible since this can save a lot of cycles and a little bit of ROM too. ------------------------------------------ Counting up requires a compare: ldx #0 Loop [your code...] inx cpx #20 bne Loop ------------------------------------------ Counting down allows you to get rid of the compare: ldx #20 Loop [your code...] dex bne Loop ============================================================== CONSTANT CYCLE COUNT TO AVOID WSYNC ============================================================== Normally you use STA WSYNC towards the end of each scanline in the kernal to stay in sync with the TV. But by carefully programming your kernal so each line consistently takes exactly 76 cycles, you can avoid using STA WSYNC at all in your kernal loop. This will save you at least 3 cycles per scanline. ============================================================== BRK SUBROUTINE TRICK Mark Lesser, Thomas Jentzch ============================================================== Thomas found this trick in Mark Lesser's Lord of the Rings prototype: You can use BRK to call a subroutine that needs to be called often and save ROM space. If you aren't familiar with BRK, it pushes the flags and PC on the stack and jumps to wherever the vector $FFFE is pointing. Thomas found BRK commands like this scattered through Lord of the Rings: brk .byte $0e ; id-byte lda $e3 ; <- here we will continue ora #$04 And the BRK vector was pointing to this routine: BrkRoutine: plp ; remove flags from stack (not needed) tsx ; load x with stackpointer inx ; x++ dec $00,x ; adjust return address lda ($00,x) ; read break-id... tay ; ...and store in y Subroutine: [subroutine code...] rts So it ended up being the equivalent of passing a value to a subroutine similar to this: ldy #value jsr Subroutine But it saves 3 bytes with each call and the overhead is only 8 bytes. After only 3 subroutine calls (Lord of the Rings has about 20) you are saving ROM space. In the case of Lord of the Rings, the subroutine was sound related code that selected the sound effect to be played based on a priority system.