Optimized Flags Code for 6502

2008-07-15 by Michael Steil

When I disassembled Steve Wozniak’s Apple I BASIC, I found a 6502 trick that I had never seen before, although I had read a lot of 6502 code, including the very well-written Commodore BASIC (i.e. Microsoft BASIC for 6502).

What is the most optimized way (shortest code) to set, clear and test a flag?

Normally, you would do this (“flag” is a byte in the zero page):

function	code	bytes	cycles
set	lda #1 sta flag	4	5
clear	lda #0 sta flag	4	5
test	lda flag beq cleared	4	5/6

Woz did it like this:

function	code	bytes	cycles
set	lda #$80 sta flag	4	5
clear	lsr flag	2	5
test	bit flag bpl cleared	4	5/6

The trick is to store the flag in bit #7. Clearing bit #7 is easy: Just do a logical shift right, which will always write a zero into the MSB; we don’t care about the contents of the other bits. The “lsr zp” is the same speed as the “lda/sta zp” combination, but only occupies 2 bytes instead of four.

The other advantage of this method is that it is possible to test the flag without destroying a register. The “bit” instruction will just copy the MSB into the negative flag, while the “lda” would have overwritten the accumulator.

I can’t see why this couldn’t be adapted to other (CISC, RMW-capable) CPUs as well. If you are fluent in any other assembly language, please post a comment.

15 thoughts on “Optimized Flags Code for 6502”

strik

2008-07-15 at 23:34

The ROM of the Commodore drives (at least, 1540/1541 and 1570/1571) uses these “tricks”, too. IIRC, the C64 KERNAL, too. Regarding the BIT instruction: This is why it was invented in the first place. Shifting to set or unset flags is very easy. Commodore also used shifts for setting bit 7 if the Carry was known to be set because of some other instructions.

BTW, people tend to disagree if the Commodore BASIC is well-written or not. Additionally, there are many claims that the Basic (v2, at least) is a almost complete rewrite by Commodore, as the original MS BASIC was not good enough. However, I do not know if this is true or not.
Reply
Nate

2008-07-16 at 07:40

Michael, I haven’t seen the LSR trick for clearing a flag but the 1541 floppy ROM does use BIT/BPL as a quick way of accessing a flag. One thing I thought was neat was they tied the “sync detected” line to the overflow bit in the CPU. So a loop to wait until sync has passed before writing the sector header is “BVS self”.
Reply
Michael Steil

2008-07-16 at 08:08

@strik:

I looked for “LSR” in the 1541 disassembly and could not find an instance of this trick.

I also went through all “LSR” instructions in the Commodore 64 ROM Listing (BASIC V2 and KERNAL), and couldn’t find this trick either. KERNAL uses LSR to clear the “quote flag” in $D4, but the value of the byte is 0 or 1, so in order to test it, it has to be loaded. BASIC uses values of 0 and $FF for the “string flag” in $0D, and seems to clear it with an LSR at $AE07, but it always tests it with LDA/BNE.
Reply
Michael Steil

2008-07-16 at 08:20

@Nate:

Right, the 1541 ROM uses BIT/BPL to test the flag in $0298 (“silently ignore some errors”), which can be 0 or $FF, but uses LDA/STA to clear it.

This is very interesting: The developers of KERNAL used LSR to clear a flag in bit #0, and the Commodore DOS developers used BIT/BPL to test a flag in bit #7, but nobody at Commodore came up with the trick Woz used.

Also, Nate, having the overflow flag exposed as an input pin on the 6502 is the single greatest feature of any device I’ve seen. 🙂
Reply
bi -- International Journal of Inactivism

2008-07-16 at 10:48

Interesting… I’m trying to think of how one might efficiently handle bits on an Intel 8086. One problem is that there’s not a lot of things that can be done with memory operands, so the best code I’ve been able to come up with doesn’t really improve on the naive method… Using nasm notation:

; set (bit 0 of flag)
mov byte [flag],1 ; 5 bytes

; clear (bit 0 of flag, trash rest)
shl byte [flag],1 ; 4 bytes, just 1 less than simple mov

; test without clobbering
test byte [flag],1 ; 5 bytes, no win over a simple cmp… ugh
jz cleared ; 2 bytes

; test and clobber
rcr byte [flag],1 ; 4 bytes, 1 less
jnc cleared ; 2 bytes

Putting the flag in bit 7 instead of bit 0 just means switching left and right around, so it doesn’t really win much…
Reply
Erik

2008-07-21 at 04:09

Is it possible to recompile the other way? I386 code to ARM processors? That would be amazing.
Reply
Chuck

2008-07-31 at 06:18

Sounds to me like there’s another advantage — you can “unclear” with LSL but only if you’re aware of context when programming.

I’m specifically thinking of things that toggle (normally you’d use EOR) but this sounds good too 😛
Reply
ScoBa

2008-10-31 at 11:08

The x86 processors allows branching based on all sorts flag states: parity, overflow, sign, carry, zero

With a single test operation, you could check the state of multiple bits simultaneously, branching as appropriate:

test al, 11000000b
jz BothClear
jpe BothSet
js Bit7Set

Bit6Set:

At one point, I was working on a 6502 emulator in x86 assembler. I wanted the behavior of the Accumulator and Flags register to match an actual 6502 processor performing decimal-mode arithmetic with every value for both operands from 0 to 255.

If I only wanted to handle valid input values, I’d have simply used the x86 decimal adjust instructions (DAA/DAS). I wanted 100%-accurate emulation, so I threw everything at the 6502 to get the complete picture.

The addition behavior was pretty straightforward, but it took a while to figure out subtraction. If anyone is interested, I may be able to dig up the x86 implementations I came up with.
Reply
Greg

2008-11-06 at 10:36

That trick implies that there is a shorter way to set that flag:

sec
ror flag ; 3 bytes, but 7 cycles!

However, that trick shines when you combine it with a test:

jsr query ; Ask a yes/no question; then, input only “y” or “n”.
cmp #’y’ ; Test that answer.
ror flag ; flag is set for “yes”; clear for “no”.
bpl label1
… ; first yes-code
label1:
… ; other code
…
bit flag
bpl label2
… ; second yes-code
label2:
…
Reply
Rhialto

2009-04-02 at 15:28

More assorted observations from the 64:

; print kernal message indexed by Y

F12B 24 9D BIT $9D
F12D 10 0D BPL $F13C

You can also use ASL to clear the flag, if you make sure only bit 7 is ever set.
Then you can branch on the carry to check the original value.

Another trick is to have 2 flag bits in a byte: bit 7 and 6. The latter will be put in the V status bit with a BIT instruction. An example of that can be seen in the 64’s Basic:

A594 C9 22 CMP #$22 ; quote mark
A596 F0 56 BEQ $A5EE
A598 24 0F BIT $0F
A59A 70 2D BVS $A5C9
A59C C9 3F CMP #$3F ; question mark
A59E D0 04 BNE $A5A4

I’m pretty sure I have seen this trick in more places. $9D is also used in this way in error output routines.

C64 again:

; allocate number of bytes in A

B4F4 46 0F LSR $0F
B4F6 48 PHA
…
B518 A5 0F LDA $0F
B51A 30 B6 BMI $B4D2

Strangely enough here LSR seems to be used to clear a flag bit, but then it is tested with LDA instead of BIT. I suppose that it makes little difference if you don’t care about the value in A.
Reply
- Johann Klasek
  
  2017-05-11 at 10:44
  
  I’m curios that this isn’t mentioned before, but the floating point routines handle the sign flag this way. For the extended floating point representation the sign bit (FAC #1) is held in a separate byte ($66), bit 7 indicates the sign. Some sample references:
  
  .,BC58 46 66 LSR $66 ABS FUNCTION: CHANGE SIGN TO +
  
  .,BCA2 24 66 BIT $66 CHECK SIGN OF FAC
  
  Woz’ contribution may be not that extraordinary as it looks like. Since I read several ROM listings in the mid-80’s I got the feeling that such things are part of a common 6502 programming paradigm. 😉
  Reply
Graham Toal

2009-07-03 at 05:18

The Acorn BBC Micro kernel uses the “Bit immediate” opcode as a 1-byte skip instruction. This means that disassemblers generally fail because there is a valid instruction start address where the operand to Bit would usually be. Makes life interesting for automated recompilers!
Reply
- Johann Klasek
  
  2017-05-11 at 08:47
  
  @Graham Toal
  > The Acorn BBC Micro kernel uses the “Bit immediate” opcode as a 1-byte skip instruction.
  I think, “Bit zeropage” is meant. Bit immediate does not exist (on vanilla 6502). In addition, “Bit absolute” is used to skip 2 bytes. A very common technique at least in Commodore’s ROM code.
  Reply
Johann Klasek

2017-05-19 at 09:00

Instead of using
lda #$80 sta flag
on occasions you can use the carry (if set) or explicit set the carry flag:
sec ror flag
without to destroy one of the registers.
Reply
John Payson

2017-10-28 at 11:08

The designers of the Atari 2600 designed hardware around the fact that the BIT instruction sets both the sign and overflow flags based upon bits 6-7 of the operand. In my own code for that platform, I have a number of flag bytes that hold meaningful things in both of the top two bits. The need for packing two bits per byte may not be as great on other 6502 platforms which have more memory than the 2600, but being able to use one “read” instruction and then two branches is still helpful.
Reply

15 thoughts on “Optimized Flags Code for 6502”

Leave a Comment Cancel reply