Readable and Maintainable Bitfields in C

Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this.

The straightforward way to deal with bitfields is to do the Boolean logic by hand:

Boolean Magic

#define FLAG_USER   (1 << 0)
#define FLAG_ZERO   (1 << 1)
#define FLAG_FORCE  (1 << 2)
/* bits 3-30 reserved */
#define FLAG_COMPAT (1 << 31)

int
create_object(int flags)
{
        int is_compat = (flags & FLAG_COMPAT);

        if (is_compat)
                flags &= ~FLAGS_FORCE;

        if (flags & FLAG_FORCE) {
                [...]
        }
        [...]
}

int
create_object_zero(int flags)
{
	create_object(flags | FLAGS_ZERO);
}

void
caller()
{
        create_object(FLAG_FORCE | FLAG_COMPAT);
}

You can see code like this everywhere, so most C programmers can probably read and understand this quite easily. But unfortunately, this method is very error-prone. Mixing up "&" and "&&" and omitting the "~" when doing "&=" are common oversights, and since the compiler only sees "int" types, this also doesn't give you any kind of type-safety.

Bitfields

Let us look at the same code using bitfields instead:

typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        int raw;
} flags_t;

int
create_object(flags_t flags)
{
        boolean_t is_compat = flags.compat;

        if (is_compat)
                flags.force = FALSE;

        if (flags.force) {
                [...]
        }
        [...]
}

int
create_object_zero(flags_t flags)
{
	flags.zero = TRUE;
	create_object(flags);
}

void
caller()
{
        create_object((flags_t) { { .force = TRUE, .compat = TRUE } });
}

Flags can just be used like any variables. The compiler abstracts the Boolean logic away. The only downside is that the code with the static initializer requires some advanced syntax.

Endianness

Bitfields in C always start at bit 0. While this is the least significant bit (LSB) on Little Endian (bit 0 has a weight of 2^0), most compilers on Big Endian systems inconveniently consider the most significant bit (MSB) bit 0 (bit 0 has a weight of 2^31, 2^63 etc. depending on the word size), so in case your bitfield needs to be binary-compatible across machines with different endianness, you will need to define two versions of the bitfield.

The Raw Bitfield

Did you notice the "int raw" in the union? It lets us conveniently (and type-safely) export the raw bit value without having to use a cast.

	printf("raw flags: 0x%xn", flags.raw);

We can use this to reconstruct the FLAG_* constants in the original example, in case some code needs it:

#define FLAG_USER   (((flags_t) { { .user   = TRUE } }).raw)
#define FLAG_ZERO   (((flags_t) { { .zero   = TRUE } }).raw)
#define FLAG_FORCE  (((flags_t) { { .force  = TRUE } }).raw)
#define FLAG_COMPAT (((flags_t) { { .compat = TRUE } }).raw)

This code constructs a temporary instance of the bitfield, sets one bit, and converts it into a raw integer - all at compile time.

Bitfield Access from Assembly

With the same trick, you can also access your bitfield from assembly, for example if the bitfield is part of the Thread Control Block in your operating system kernel, and the low level context switch code needs to access one of the bits. The "int raw" can be used to statically convert a flag into the corresponding raw mask:

typedef unsigned int boolean_t;

typedef union {
	struct {
		boolean_t bit0:1;
		boolean_t bit1:1;
		int :19;
		boolean_t bit31:1;
	};
	int raw;
} bitfield_t;


int test()
{
	int param = -1;
	int result;

	__asm__ volatile (
		"test    %2, %1    n"
		"xor     %0, %0    n"
		"setcc   %0        n"
		: "=r" (result)
		: "r" (param),
		  "i" (((bitfield_t) { { .bit31 = TRUE } }).raw)
	);
	return result;
}

The corresponding x86 assembly code looks like this:

	.text
	.align	4,0x90
	.globl	_test
_test:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$-1, %eax
	## InlineAsm Start
	test	$0x80000000, %eax
	xor	%eax, %eax
	setcc	%eax
	## InlineAsm End
	popl	%ebp
	ret

	.subsections_via_symbols

This works fine with LLVM, but unfortunately GCC (4.2.1) has problems detecting that the raw value is available at compile time, so the "i" has to be replaced with an "r": GCC will then pre-assign a register with the raw value instead of being able to use an immediate with the "test" instruction.

How to Not Do It

I have seen C++ code doing this:

enum {
	FLAG_USER,
	FLAG_ZERO,
	FLAG_FORCE,
	FLAG_COMPAT = 31
}

int
create_object(bitfield_t flags)
{
        bool is_compat = flags.is_set(FLAG_COMPAT);

        if (is_compat)
                flags -= FLAGS_FORCE;

        if (flags.is_set(FLAG_FORCE)) {
                [...]
        }
        [...]
}

int
create_object_zero(int flags)
{
	create_object(flags + FLAGS_ZERO);
}

void
caller()
{
        create_object(((bitfield_t)FLAG_FORCE) + FLAG_COMPAT);
}

This all looks quite weird. The constants are bit index values, and they are added and subtracted. The reason is C++ operator overloading:

class bitmask_t
{
    word_t      maskvalue;

public:
    [...]
    inline bitmask_t operator -= (int n)
        {
            maskvalue = maskvalue & ~(1UL << n);
            return (bitmask_t) maskvalue;
        }
    [...]
}

This is horrible. The code that uses this class makes no sense unless you read and understand the implementation of the class. And you have to be very careful: While it is possible to "add" a flag to an existing bitfield, you cannot just add two flags - it would do the arithmetic and add the two values.

Mapping the setting and clearing of bits onto the addition and subtraction operators is clearly wrong in the first place: Flags in a bitfield are equivalent to elements in a set. Setting a flag is equivalent to the "union" operation, which even in Mathematics has its own symbol instead of overloading the "+" operator.

Question

If you compile code that does something like "((bitfield_t) { { .bit31 = TRUE } }).raw" with GCC in C++ mode, it fails. Why?

33 thoughts on “Readable and Maintainable Bitfields in C

  1. strik

    Regarding the bitfield:

    It is bad practise to define a bit-field entry with “int”; quoting N1124, footnote 105:

    “As specied in 6.7.2 above, if the actual type specier used is int or a typed
    then it is implementation-dened whether the bit-eld is signed or unsigned.”

    Thus, it may be that your comparison flags.compat == 1 is never true, as flags.compat is -1. (Of course, for flags, this is no problem. However, if you have some bit-field entry of “int abc : 4;”, you might be very surprised if abc is in the range -8..7 instead of 0..15.)

    Thus, better use “unsigned int compat : 1;”

    Furthermore, other than endianness issues, bit-fields are not very portable if you rely only on what C guarantees:

    “An implementation may allocate any addressable storage unit large enough to hold a bit-eld. If enough space remains, a bit-eld that immediately follows another bit-eld in a structure shall be packed into adjacent bits of the same unit. If insufcient space remains, whether a bit-eld that does not t is put into the next unit or overlaps adjacent units is implementation-dened. The order of allocation of bit-elds within a unit (high-order to
    low-order or low-order to high-order) is implementation-dened. The alignment of the addressable storage unit is unspecied.”
    (N1124.pdf, §6.7.2.1 “Structure and union specifiers”, no. 10, p. 102)

    That is, here is very much unspecified. Especially a bit-field that does not fit in the storage unit where the current bit-field is located can be placed either crossing the storage unit (“byte”), or it may begin at the next one.

    For example: In your example above, .compat could be placed in a byte of its own, if the compiler likes it this way.

    Regarding using the union to access the raw value: While it works in practise (on most machines), there might be obscure machines where this does not work, as the C standard does not guarantee anything.

    “Annex J (informative)
    Portability issues
    [...]
    J.1 Unspecified behaviour
    [...]
    The value of a union member other than the last one stored into (6.2.6.1)”
    (N1124.pdf, J.1, p. 488)

    or

    “When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecied values.”
    (N1124.pdf, §6.2.6.1, p. 50)

    So, to make a long story short: While you are right that it is possible to implement bit-fields with C bit-fields, it is not advisible if you must agree on an exact layout (for example, bits of some hardware register or to write files which are read by other programs). Note also that even one compiler might “change its opinion” with some newer version, thus, even saying “but I know my compiler” will not help you.

    Regarding your question: Does C++ even allow struct initialisation with the “.bit31 = TRUE” syntax? Last time I looked at it, it did not. To be more precise, even now, there are still many C compiler arounds that do not support this, as this was added with C99. Before C99, it was not supported on C, either.

    Reply
  2. quackzilla

    If you’re going to advocate this kind of approach for bitfields, there’s a tiny gotcha which can make debugging a nightmare:

    typedef int boolean_t;

    should be

    typedef unsigned int boolean_t;

    You’ve taken other steps to mitigate sign debugging hell (using the high bit, TRUE is !FALSE, always use != FALSE rather than == TRUE comparisons, etc), but it can be a terrible slice of purgatory to debug if many of these steps are missing as they often are in other’s code.

    Reply
  3. ZungBang

    Bit-fields are a tempting abstraction. The problem with bit-fields is that the normal use case for them is in a setting which tends to break the abstraction.

    Case in point: a former boss of mine had me debug machine exceptions caused by a piece of code he wrote that used bit-fields to represent and access hardware registers. He was quite in love with it, I swear.

    Turns out that the assembly code that was generated by the compiler, did byte-wide reads/writes on hardware that was designed for 32-bit wide accesses only. The code would’ve been perfectly fine for accessing normal memory, but not on our system.

    The fix was to use “boolean magic”, which worked nicely both on the actual system and in a simulation environment.

    Reply
  4. Anonymous

    You really should be changing it to say NOT to use this method! It is completely unreliable and will most likely cause you problems when compiling on different compilers or different architectures, as was thoroughly explained by the earlier comments. You’re relying on unspecified behaviors in several different ways.

    Reply
  5. Bobby

    “Bitfields in C always start at bit 0.” is not true. The spec says “The order of allocation of bit-elds within a unit (high-order to low-order or low-order to high-order) is implementation-dened.”

    As everybody else has been saying, relying on any details of how this code works is just asking for trouble. If you just want bitfields and don’t need to interact with code from other compilers, or hardware registers, and you don’t particularly care how it works behind the scenes, C bitfields are fine. For anything else, they aren’t suitable.

    Reply
  6. lorenzo

    >> If you compile code that does something like “((bitfield_t) { { .bit31 = TRUE } }).raw” with GCC in C++ mode, it fails. Why?

    to build anonymous struct (and array too) on fly i use the syntax {field:value}
    the above code compile with g++ on linux:

    #include

    union flags_t {
    struct {
    unsigned int _user:1;
    unsigned int _zero:1;
    unsigned int _force:1;
    unsigned int _dummy:28;
    unsigned int _compat:1;
    } _bitfield;
    unsigned int _raw;
    };

    void print(const flags_t &flags) {
    std::cout << “flags._user: ” << flags._bitfield._user << “n”
    << “flags._zero: ” << flags._bitfield._zero << “n”
    << “flags._force: ” << flags._bitfield._force << “n”
    << “flags._compat: ” << flags._bitfield._compat << std::endl;
    }

    int main(int argc, char **argv) {
    print( (flags_t){{_user:1, _zero:0, _force:1, _dummy:0, _compat:0}} );

    return 0;
    }

    Reply
  7. Pingback: popurls.com // popular today

  8. bitfieldhater

    Please change the title to:
    Non-Portable Bitfields in C

    If you ever think that someone may possibly want to use your code in a cross-platform manner or even interfacing to it from code written in a different language on the same platform, please refrain from using bitfields. Use an unsigned int with masks instead.

    Thank you.

    Reply
  9. Matt H

    @bitfieldhater

    How would masks be cross platform? You still have to deal with endianness issues.

    Bitfields are great in my opinion. I find its easiest to determine how your compiler is aligning them, and then adjust the structure for other compilers using preprocessor defines (if you want to interop)

    Reply
  10. Anonymous

    @Matt H

    You have to deal with endianness issues anyway if you’re doing cross-platform code. Bitfields or masks aren’t going to change that. However, using bitfields adds a lot of other issues on top of that, none of which you actually have any control over!

    Reply
  11. strik

    Michael, as other’s have also pointed out: DON’T DO IT! This type of code is the best way to be non-portable!

    It might work on your specific platform – NOW. But every little change (even time of compilation) can probably break it.

    Reply
  12. Felix

    I also have to agree with the other commenters – bitfields suck. They are tempting, yes, but next to all the problems with theoretically “undefined behavior”, there are real world problems. The largest problem is endianness. The right way to handle the wrong endianness (which is little endian, obviously) is to treat reading binary data as “decoding” and writing binary data as “encoding”. Like treating user data in a web application as “unsafe”, you should treat “binary data” as “binary” (or 8-bit numbers, if you want), but never as “32-bit numbers”.

    The right way is to handle endianness is not by doing swapping of data at whatever place where it’s “broken otherwise”, it’s by having a defined (and correct) funtion to put “bytes of data” into “registers”, or generally spoken: how to convert a stream of 8 bit values into a stream of 32bit (or 64bit) values. PowerPC for example does it right; there is no “bswap”-instruction, but instead there is a “load non-native endian word” and “store non-native endian word”. Endianness is an issue with interfacing memory spaces with different data widths; registers have 32 (or 64) bit data width, and memory can be considered as an 8-bit memory space, if you do bytewise access.

    If you start having a bit-addressed memory space (like a bitfield), then WHENEVER you interface to a word-addressed memory space (register) or a byte-addressed memory space (memory) you have to take endianness into account. And while swapping in byte-based domains is cheap (bswap ftw), swapping bits is (usually) not cheap. Sure, you can just redefine your bitfield to magically match whatever the native bit-to-word conversion does (usually by inverting the order), but would you ever re-define any (word) constants because you’re too cheap to do proper endianness correction on bytes? No. Why would you redefine your bitfields then?

    Reply
  13. Pingback: Josh Haberman » Bit-fields in C99

  14. Graham

    u3_u5_u8_u12_DECODER {
    // see libs_apps/docs/
    }

    Supposing you had a C++ class,
    for use at COMPILE_TIME or RUN_TIME

    BIT_FIELD { // probably template // probably derived from NAMED_OBJECT

    uns nbits_gap_right;
    uns nbits_data;
    bool is_signed;

    // thats all you need, maybe expand “is_signed” to TYPE_SPEC_of_enum_bit_field
    // plus NAME + SPEC of this
    // plus WHEN you know this, CT_COMPILE_TIME, RT_RUNTIME, const ?, changed?

    enum byte_order_of_memory_and_code_used_to_handle_it
    byte_order = byte_order_hilo_by_definition_probably_on_lohi_cpu;

    static enum compiler_bit_order_code_and_metrics
    bit_order = bit_order_gcc_on_AMD64_using_masks_AND_bit_fields_mixed; // !

    // starts to get virtual here

    u32 GET_mask_1s_rhj();
    u32 GET_mask_1s_in_situ(); // 1 where data is

    u32 GET_mask_0s_rhj();
    u32 GET_mask_0s_in_situ(); // 0 where data is

    bool GET_is_within_byte_boundry(); // does_not_cross_byte_boundry
    bool know_gap_right_is_zero;
    bool know_no_need_to_mask_off_upper_bits;

    };

    class LIST_of_BIT_FIELD
    {
    BIT_FIELD & LIST[4]; // however many you want
    bool generate_code( generator & );
    };

    and some friends
    u32_hilo
    u32_lohi
    u32_cpu // probably typedef to one or the other, with C++ casting all over the place

    and some distant cousins: u32_cpu_lohi_holding_inverted_hilo.

    Plus of course some optimising bswap functions (HTON macros evaporate to swapb),
    and some failsafe compile-mode-use-masks-and-shifts,
    plus some unit tests,
    plus a community to maintain it for a range of CPUs, compilers, times-of-day,

    Plus, you also can run this data-driven C++ class,
    to prints the correct C/C++ for your machine/compiler/timeofday
    (or falls back on masks and swapb),
    as long as there is a non-cross-compiler available, at compile-time.

    Add to that, the attempt to access u32_hilo IN-SITU from a u32_cpu_lohi
    knowing that the sub-byte-values are easier, but the byte-boundry cross isnt,
    but never-the-less attempt to get a good engineering compromise

    Then what _YOU_ do with a family of types named

    u3_u5_u8_u16 // upper tray is blind to lower tray u16
    u3_u5_u8_u4_u12 // decoder finds common case of 4K pools
    u8_u8_u16 // decode u3_u5 as plain lookup[u8] // sparse: void * lookup[ decode[u8] ]
    u16_u16 // u16_upper_u16_lower
    u32_hilo // as_found_in_file_preferably_aligned

    Remember that decoding, will probably extract the values all-in-one-go,
    because you KNOW that you will decode the entire multi-step-address
    (which is an index not an offset, looking up the object in a tray_of_256_of_similar_type)

    (Q) If it is to be quick on all architectures, and all compiler-modes,
    what should the bit layout be: ?

    u3_u5_u8_u16
    u16_u8_u5_u3

    u3_u5_u8_u4_u12
    u12_u4_u8_u5_u3

    NB By using different names for each bitfield,
    it remains easier to prototype here, for a larger space,
    otherwise its:

    u3_A
    u5_B
    u8_C ….
    u16_Lower

    NB My storage layout allocates within the lower 16 bits (object per item)
    with multiple parallel worlds selected in the upper 16 bits.
    That upper layout is implemented as a (hidden) lower_tray_of_items
    to reuse code, but allow mixing trays_of_lower_u16 from different files

    Reply
  15. kohlrak

    I know this is old, but, truely, the instability comes with the datatype itself. An int on an older system is 16bits, and int on a going out of style system is 32bits, and a new system an int is 64bits. Probably best to revert to non-changing “types” such as byte, word, dword, qtword, etc.

    Reply
  16. Pingback: dholm.com » Blog Archive » Tumblelog 090817

  17. woody

    I disagree strongly with all the bit field detractors.
    I do embeded firmware. Bit fields are critical. Most porting of embedded code is going to be to a similar platform, for example, various flavors of an 8051, using the same compiler.

    The PIC24 family uses the GNU compiler, and their entire method of accessing the bit fields in all the registers, depends on bit fields in unions.
    Every pic compiler does the same thing.

    I use Keil 8051 compiler in my work. I’ve never seen any problems when using the Keil compiler in the 8051 world, and have ported code back and forth across many 8051 platforms. Bit fields have been used extensively to unpack, and access registers and bits, and communication buffers.

    The moral of the story is: Bit fields are highly useful. Now when I go to port my 8051 code over to an arm, I will of course have to use whatever compiler is furnished, and adhere to the way that compiler implements things, but most of that can be hidden behind defines, just as it is in the PCI 24 family.

    Reply
  18. Sean Ellis

    Scott said: “A poster who dislikes bitfields is one who has never seriously programmed an embedded system.”

    I do serious embedded work. I dislike bitfields. I would love to use them – they would make life so much easier – but they’re all but unusable for serious work.

    The reasons for not using them have all been set out above. They’re a mess of undefinedness. Which end do they pack from? Which end of what (bytes, words, halfwords)? What exact set of operations is used for a bitfield set, and how does it interact with memory mapped I/O?

    Even if you only use a single toolchain, as woody apparently dows, any of the behaviors you rely on can be broken with an updgrade, at any time, for any reason.

    This is a real pity, because they would remove about half the evil #defines in the embedded world. But even the preprocessor isn’t quite as evil as undefined behavior.

    Reply
  19. t

    I have programmed using C for numerous processors, dsps, and microcontrollers and I have never found the need to use bitfields. I use masks and macros when I want to manipulate bits.

    Reply
  20. Rod

    Can someone point me to a embedded processor vendor who uses bitfields in their sample code foro access registers? Up til now, no chip maker code I know has. I’ve lost count how many embedded architectures I’ve worked with.

    Maybe I should have read the C spec before today, but I’ve been embedded programming (mostly self taught) for 6 years now. Yesterday was the first time I had ever heard of bitfields in C.

    Reply
  21. Jon

    Rod, Microchip uses bitfields in their processor header files, in their mcc18 compiler. For example, the STATUS register in a PIC18F26K20 is defined as so in their header file:
    extern near unsigned char STATUS;
    extern near struct {
    unsigned C:1;
    unsigned DC:1;
    unsigned Z:1;
    unsigned OV:1;
    unsigned N:1;
    } STATUSbits;

    This way, you can clear (for example) the C bit by:
    STATUSbits.C = 0;

    Good luck!

    Reply
  22. Frederick Eek

    I have been doing extensive embedded programming (amongst others) for 20 years. I fail to understand why you would not use them. All hardware have bit fields in their registers. Most protocols contain information with bit fields.
    As for their undefinedness, exactly the same applies for “boolean magic”. Here you use defines to map bits to positions in unsigned shorts or ints. You are therefore fixing the flags to the endianness of your system.
    The endianess problem does not lie in your compiler/system, but how different systems interpret them (i.e. when communicating over a bus or network).
    I have been using bit fields in various systems (ported and sharing code) including 8086, .NET programming, 8051, Arm, TMS320, AVR, etc. I never have to modify flags or #defines. The worst I have ever had to do was to have two definitions for the same structure depending on the compiler used (the old 8051 Franklin compiler did not support 32 bit packed structures and the packed TIMESTAMP data type we used on the network was packed into 32 bits).
    The biggest advantage of bit fields is the fact that you do not continuously have to keep track of how flags and masks actually map to your memory. Once the structure is defined, you are completely abstracted from the memory representation whilst for boolean magic you have to remember the masks and sometimes even shifts at every point you use them.
    I think people that detract from bit fields have actually never really had to do anything but simple flag checking and setting.

    Reply
  23. Eswar

    Can any body explain the memory size of a bit field variable? What is the advantage of using Bit fields particular to memory management point of view?

    Reply
  24. tissit

    TI uses bitfields and so does Fujitsu. I have a feeling I’ve met others, but I’m not sure. I think AVR is the exception that doesn’t.

    Reply
  25. Pingback: Adobe Interview Question for Software Engineer/Developer about Bit Magic « GeeksforGeeks

  26. Pingback: Glitch Kombat (Part 2) | Shanth's Blog

  27. chalapathi

    Hi ,
    I’m chalapathi , which is very helpful for me updating the technology.

    Thanks & Regards

    Chalapathi

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>