Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this.
The straightforward way to deal with bitfields is to do the Boolean logic by hand:
Boolean Magic
#define FLAG_USER (1 << 0)
#define FLAG_ZERO (1 << 1)
#define FLAG_FORCE (1 << 2)
/* bits 3-30 reserved */
#define FLAG_COMPAT (1 << 31)
int
create_object(int flags)
{
int is_compat = (flags & FLAG_COMPAT);
if (is_compat)
flags &= ~FLAGS_FORCE;
if (flags & FLAG_FORCE) {
[...]
}
[...]
}
int
create_object_zero(int flags)
{
create_object(flags | FLAGS_ZERO);
}
void
caller()
{
create_object(FLAG_FORCE | FLAG_COMPAT);
}
You can see code like this everywhere, so most C programmers can probably read and understand this quite easily. But unfortunately, this method is very error-prone. Mixing up "&" and "&&" and omitting the "~" when doing "&=" are common oversights, and since the compiler only sees "int" types, this also doesn't give you any kind of type-safety.
Bitfields
Let us look at the same code using bitfields instead:
typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
struct {
boolean_t user:1;
boolean_t zero:1;
boolean_t force:1;
int :28; /* unused */
boolean_t compat:1; /* bit 31 */
};
int raw;
} flags_t;
int
create_object(flags_t flags)
{
boolean_t is_compat = flags.compat;
if (is_compat)
flags.force = FALSE;
if (flags.force) {
[...]
}
[...]
}
int
create_object_zero(flags_t flags)
{
flags.zero = TRUE;
create_object(flags);
}
void
caller()
{
create_object((flags_t) { { .force = TRUE, .compat = TRUE } });
}
Flags can just be used like any variables. The compiler abstracts the Boolean logic away. The only downside is that the code with the static initializer requires some advanced syntax.
Endianness
Bitfields in C always start at bit 0. While this is the least significant bit (LSB) on Little Endian (bit 0 has a weight of 2^0), most compilers on Big Endian systems inconveniently consider the most significant bit (MSB) bit 0 (bit 0 has a weight of 2^31, 2^63 etc. depending on the word size), so in case your bitfield needs to be binary-compatible across machines with different endianness, you will need to define two versions of the bitfield.
The Raw Bitfield
Did you notice the "int raw" in the union? It lets us conveniently (and type-safely) export the raw bit value without having to use a cast.
printf("raw flags: 0x%x\n", flags.raw);
We can use this to reconstruct the FLAG_* constants in the original example, in case some code needs it:
#define FLAG_USER (((flags_t) { { .user = TRUE } }).raw)
#define FLAG_ZERO (((flags_t) { { .zero = TRUE } }).raw)
#define FLAG_FORCE (((flags_t) { { .force = TRUE } }).raw)
#define FLAG_COMPAT (((flags_t) { { .compat = TRUE } }).raw)
This code constructs a temporary instance of the bitfield, sets one bit, and converts it into a raw integer - all at compile time.
Bitfield Access from Assembly
With the same trick, you can also access your bitfield from assembly, for example if the bitfield is part of the Thread Control Block in your operating system kernel, and the low level context switch code needs to access one of the bits. The "int raw" can be used to statically convert a flag into the corresponding raw mask:
typedef unsigned int boolean_t;
typedef union {
struct {
boolean_t bit0:1;
boolean_t bit1:1;
int :19;
boolean_t bit31:1;
};
int raw;
} bitfield_t;
int test()
{
int param = -1;
int result;
__asm__ volatile (
"test %2, %1 \n"
"xor %0, %0 \n"
"setcc %0 \n"
: "=r" (result)
: "r" (param),
"i" (((bitfield_t) { { .bit31 = TRUE } }).raw)
);
return result;
}
The corresponding x86 assembly code looks like this:
.text
.align 4,0x90
.globl _test
_test:
pushl %ebp
movl %esp, %ebp
movl $-1, %eax
## InlineAsm Start
test $0x80000000, %eax
xor %eax, %eax
setcc %eax
## InlineAsm End
popl %ebp
ret
.subsections_via_symbols
This works fine with LLVM, but unfortunately GCC (4.2.1) has problems detecting that the raw value is available at compile time, so the "i" has to be replaced with an "r": GCC will then pre-assign a register with the raw value instead of being able to use an immediate with the "test" instruction.
How to Not Do It
I have seen C++ code doing this:
enum {
FLAG_USER,
FLAG_ZERO,
FLAG_FORCE,
FLAG_COMPAT = 31
}
int
create_object(bitfield_t flags)
{
bool is_compat = flags.is_set(FLAG_COMPAT);
if (is_compat)
flags -= FLAGS_FORCE;
if (flags.is_set(FLAG_FORCE)) {
[...]
}
[...]
}
int
create_object_zero(int flags)
{
create_object(flags + FLAGS_ZERO);
}
void
caller()
{
create_object(((bitfield_t)FLAG_FORCE) + FLAG_COMPAT);
}
This all looks quite weird. The constants are bit index values, and they are added and subtracted. The reason is C++ operator overloading:
class bitmask_t
{
word_t maskvalue;
public:
[...]
inline bitmask_t operator -= (int n)
{
maskvalue = maskvalue & ~(1UL << n);
return (bitmask_t) maskvalue;
}
[...]
}
This is horrible. The code that uses this class makes no sense unless you read and understand the implementation of the class. And you have to be very careful: While it is possible to "add" a flag to an existing bitfield, you cannot just add two flags - it would do the arithmetic and add the two values.
Mapping the setting and clearing of bits onto the addition and subtraction operators is clearly wrong in the first place: Flags in a bitfield are equivalent to elements in a set. Setting a flag is equivalent to the "union" operation, which even in Mathematics has its own symbol instead of overloading the "+" operator.
Question
If you compile code that does something like "((bitfield_t) { { .bit31 = TRUE } }).raw" with GCC in C++ mode, it fails. Why?