Optimized Flags Code for 6502

When I disassembled Steve Wozniak’s Apple I BASIC, I found a 6502 trick that I had never seen before, although I had read a lot of 6502 code, including the very well-written Commodore BASIC (i.e. Microsoft BASIC for 6502).

What is the most optimized way (shortest code) to set, clear and test a flag?

Normally, you would do this (“flag” is a byte in the zero page):

function code bytes cycles
set
lda #1
sta flag
4 5
clear
lda #0
sta flag
4 5
test
lda flag
beq cleared
4 5/6

Woz did it like this:

function code bytes cycles
set
lda #$80
sta flag
4 5
clear
lsr flag
2 5
test
bit flag
bpl cleared
4 5/6

The trick is to store the flag in bit #7. Clearing bit #7 is easy: Just do a logical shift right, which will always write a zero into the MSB; we don’t care about the contents of the other bits. The “lsr zp” is the same speed as the “lda/sta zp” combination, but only occupies 2 bytes instead of four.

The other advantage of this method is that it is possible to test the flag without destroying a register. The “bit” instruction will just copy the MSB into the negative flag, while the “lda” would have overwritten the accumulator.

I can’t see why this couldn’t be adapted to other (CISC, RMW-capable) CPUs as well. If you are fluent in any other assembly language, please post a comment.

15 thoughts on “Optimized Flags Code for 6502”

  1. The ROM of the Commodore drives (at least, 1540/1541 and 1570/1571) uses these “tricks”, too. IIRC, the C64 KERNAL, too. Regarding the BIT instruction: This is why it was invented in the first place. Shifting to set or unset flags is very easy. Commodore also used shifts for setting bit 7 if the Carry was known to be set because of some other instructions.

    BTW, people tend to disagree if the Commodore BASIC is well-written or not. Additionally, there are many claims that the Basic (v2, at least) is a almost complete rewrite by Commodore, as the original MS BASIC was not good enough. However, I do not know if this is true or not.

  2. Michael, I haven’t seen the LSR trick for clearing a flag but the 1541 floppy ROM does use BIT/BPL as a quick way of accessing a flag. One thing I thought was neat was they tied the “sync detected” line to the overflow bit in the CPU. So a loop to wait until sync has passed before writing the sector header is “BVS self”.

  3. @strik:

    I looked for “LSR” in the 1541 disassembly and could not find an instance of this trick.

    I also went through all “LSR” instructions in the Commodore 64 ROM Listing (BASIC V2 and KERNAL), and couldn’t find this trick either. KERNAL uses LSR to clear the “quote flag” in $D4, but the value of the byte is 0 or 1, so in order to test it, it has to be loaded. BASIC uses values of 0 and $FF for the “string flag” in $0D, and seems to clear it with an LSR at $AE07, but it always tests it with LDA/BNE.

  4. @Nate:

    Right, the 1541 ROM uses BIT/BPL to test the flag in $0298 (“silently ignore some errors”), which can be 0 or $FF, but uses LDA/STA to clear it.

    This is very interesting: The developers of KERNAL used LSR to clear a flag in bit #0, and the Commodore DOS developers used BIT/BPL to test a flag in bit #7, but nobody at Commodore came up with the trick Woz used.

    Also, Nate, having the overflow flag exposed as an input pin on the 6502 is the single greatest feature of any device I’ve seen. :-)

  5. Interesting… I’m trying to think of how one might efficiently handle bits on an Intel 8086. One problem is that there’s not a lot of things that can be done with memory operands, so the best code I’ve been able to come up with doesn’t really improve on the naive method… Using nasm notation:

    ; set (bit 0 of flag)
    mov byte [flag],1 ; 5 bytes

    ; clear (bit 0 of flag, trash rest)
    shl byte [flag],1 ; 4 bytes, just 1 less than simple mov

    ; test without clobbering
    test byte [flag],1 ; 5 bytes, no win over a simple cmp… ugh
    jz cleared ; 2 bytes

    ; test and clobber
    rcr byte [flag],1 ; 4 bytes, 1 less
    jnc cleared ; 2 bytes

    Putting the flag in bit 7 instead of bit 0 just means switching left and right around, so it doesn’t really win much…

  6. Sounds to me like there’s another advantage — you can “unclear” with LSL but only if you’re aware of context when programming.

    I’m specifically thinking of things that toggle (normally you’d use EOR) but this sounds good too :P

  7. The x86 processors allows branching based on all sorts flag states: parity, overflow, sign, carry, zero

    With a single test operation, you could check the state of multiple bits simultaneously, branching as appropriate:

    test al, 11000000b
    jz BothClear
    jpe BothSet
    js Bit7Set

    Bit6Set:

    At one point, I was working on a 6502 emulator in x86 assembler. I wanted the behavior of the Accumulator and Flags register to match an actual 6502 processor performing decimal-mode arithmetic with every value for both operands from 0 to 255.

    If I only wanted to handle valid input values, I’d have simply used the x86 decimal adjust instructions (DAA/DAS). I wanted 100%-accurate emulation, so I threw everything at the 6502 to get the complete picture.

    The addition behavior was pretty straightforward, but it took a while to figure out subtraction. If anyone is interested, I may be able to dig up the x86 implementations I came up with.

  8. That trick implies that there is a shorter way to set that flag:

    sec
    ror flag ; 3 bytes, but 7 cycles!

    However, that trick shines when you combine it with a test:

    jsr query ; Ask a yes/no question; then, input only “y” or “n”.
    cmp #’y’ ; Test that answer.
    ror flag ; flag is set for “yes”; clear for “no”.
    bpl label1
    … ; first yes-code
    label1:
    … ; other code

    bit flag
    bpl label2
    … ; second yes-code
    label2:

  9. More assorted observations from the 64:

    ; print kernal message indexed by Y

    F12B 24 9D BIT $9D
    F12D 10 0D BPL $F13C

    You can also use ASL to clear the flag, if you make sure only bit 7 is ever set.
    Then you can branch on the carry to check the original value.

    Another trick is to have 2 flag bits in a byte: bit 7 and 6. The latter will be put in the V status bit with a BIT instruction. An example of that can be seen in the 64’s Basic:

    A594 C9 22 CMP #$22 ; quote mark
    A596 F0 56 BEQ $A5EE
    A598 24 0F BIT $0F
    A59A 70 2D BVS $A5C9
    A59C C9 3F CMP #$3F ; question mark
    A59E D0 04 BNE $A5A4

    I’m pretty sure I have seen this trick in more places. $9D is also used in this way in error output routines.

    C64 again:

    ; allocate number of bytes in A

    B4F4 46 0F LSR $0F
    B4F6 48 PHA

    B518 A5 0F LDA $0F
    B51A 30 B6 BMI $B4D2

    Strangely enough here LSR seems to be used to clear a flag bit, but then it is tested with LDA instead of BIT. I suppose that it makes little difference if you don’t care about the value in A.

    • I’m curios that this isn’t mentioned before, but the floating point routines handle the sign flag this way. For the extended floating point representation the sign bit (FAC #1) is held in a separate byte ($66), bit 7 indicates the sign. Some sample references:

      .,BC58 46 66 LSR $66 ABS FUNCTION: CHANGE SIGN TO +

      .,BCA2 24 66 BIT $66 CHECK SIGN OF FAC

      Woz’ contribution may be not that extraordinary as it looks like. Since I read several ROM listings in the mid-80’s I got the feeling that such things are part of a common 6502 programming paradigm. ;)

  10. The Acorn BBC Micro kernel uses the “Bit immediate” opcode as a 1-byte skip instruction. This means that disassemblers generally fail because there is a valid instruction start address where the operand to Bit would usually be. Makes life interesting for automated recompilers!

    • @Graham Toal
      > The Acorn BBC Micro kernel uses the “Bit immediate” opcode as a 1-byte skip instruction.
      I think, “Bit zeropage” is meant. Bit immediate does not exist (on vanilla 6502). In addition, “Bit absolute” is used to skip 2 bytes. A very common technique at least in Commodore’s ROM code.

  11. Instead of using

    lda #$80
    sta flag

    on occasions you can use the carry (if set) or explicit set the carry flag:

    sec
    ror flag

    without to destroy one of the registers.

  12. The designers of the Atari 2600 designed hardware around the fact that the BIT instruction sets both the sign and overflow flags based upon bits 6-7 of the operand. In my own code for that platform, I have a number of flag bytes that hold meaningful things in both of the top two bits. The need for packing two bits per byte may not be as great on other 6502 platforms which have more memory than the 2600, but being able to use one “read” instruction and then two branches is still helpful.

Leave a Reply to Michael Steil Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.