I've always counted cycles when I coded, mostly because I was always
trying to get the most out of a machine but on the 2600, it's vital when
you're working out your kernal. After a while you will probably have
every common instruction memorized but if you don't, or if you get rusty
here's a method that you may be able to use instead, when your chart has
fallen off the wall behind your desk... or at least it might be
interesting to consider where the cycles go...
First an observation, on the 6502, when the processor is reading or
writing a byte it takes one cycle.
The basic tip is to just sum the number of bytes read/written during an
instruction to get your total cycles. So for example lda $F200 has 3
bytes that were read in the instruction stream, $AD $00 $F2 and of
course the byte is read from location $F200. This is 4 bytes across the
data bus and the instruction time is 4 cycles.
In practice, because of pipelining (faster) and operation overhead
(slower) you will also need to combine this simplistic approach with
extra rules.
The rules you need to remember are:
- No instruction takes less than 2 cycles. So a NOP or an LSR will take
2 cycles even though only one instruction byte is read.
- Instructions LDx/STx ADC/SBC AND/OR/EOR do not take an extra operation
cycle when used on memory. BIT is a special case of AND and the compare
instructions are special cases of SBC.
- Adding an index register in the address calculation takes an extra
cycle except on loads from an absolute address (nice). As a follow-on
rule, if adding the index register causes the address to cross a page
boundary, you add another cycle.
- You add a cycle if you change the Program Counter (PC) in a branch
instruction, but JMP is able to manage it in 3 cycles.
- Just have to remember timing on instructions that modify the stack
pointer as I don't have a good theory on where the cycles go. JSR (6),
RTS(6), PHx(3), PLx(4)
Well those are the rules, here are some examples.
Examples:
A9 FF : LDA #$FF
2 bytes = 2 cycles (the read in the accumulator is accomplished directly
out of the instruction stream)
AD F0 FF : ADC $FFF0
3 bytes, 1 read = 4 cycles
3E F0 01 : ROL $01F0,X
3 bytes, 1 for read, 1 for rotate operation, 1 for write, 1 for adding
the X index register = 7 cycles (or 8 if X is greater than $0f - thus
crossing a page boundary)
D1 80 : CMP ($80),Y
2 bytes, 2 reads (of locations $80 and $81), 1 for adding y register = 5
cycles (or 6 if the effective address plus y crosses a page boundary)
-------------
The stack exceptions
Thinking about JSR
JSR has 3 bytes and two bytes get saved on stack. That would be 5
cycles. I don't know where the extra cycle comes from to get 6. Could be
the update to the Stack Register?
Thinking about RTS
RTS is 1 byte and two bytes get read from the stack, That would be 3
cycles but RTS is actually 6 cycles. Where do the extra 3 cycles go?
Could an update to the Stack Register account for part of it? Another
part might be the way the PC is changed doesn't allow for the same
pipelineing that happens in JMP and JSR?
No warranty on this information. ;-) I apologize in advance for typos
and goofs.
- David Galloway