Readable and Maintainable Bitfields in C

Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this.

The straightforward way to deal with bitfields is to do the Boolean logic by hand:

Boolean Magic

#define FLAG_USER   (1 << 0)
#define FLAG_ZERO   (1 << 1)
#define FLAG_FORCE  (1 << 2)
/* bits 3-30 reserved */
#define FLAG_COMPAT (1 << 31)

int
create_object(int flags)
{
        int is_compat = (flags & FLAG_COMPAT);

        if (is_compat)
                flags &= ~FLAGS_FORCE;

        if (flags & FLAG_FORCE) {
                [...]
        }
        [...]
}

int
create_object_zero(int flags)
{
	create_object(flags | FLAGS_ZERO);
}

void
caller()
{
        create_object(FLAG_FORCE | FLAG_COMPAT);
}

You can see code like this everywhere, so most C programmers can probably read and understand this quite easily. But unfortunately, this method is very error-prone. Mixing up "&" and "&&" and omitting the "~" when doing "&=" are common oversights, and since the compiler only sees "int" types, this also doesn't give you any kind of type-safety.

Bitfields

Let us look at the same code using bitfields instead:

typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        int raw;
} flags_t;

int
create_object(flags_t flags)
{
        boolean_t is_compat = flags.compat;

        if (is_compat)
                flags.force = FALSE;

        if (flags.force) {
                [...]
        }
        [...]
}

int
create_object_zero(flags_t flags)
{
	flags.zero = TRUE;
	create_object(flags);
}

void
caller()
{
        create_object((flags_t) { { .force = TRUE, .compat = TRUE } });
}

Flags can just be used like any variables. The compiler abstracts the Boolean logic away. The only downside is that the code with the static initializer requires some advanced syntax.

Endianness

Bitfields in C always start at bit 0. While this is the least significant bit (LSB) on Little Endian (bit 0 has a weight of 2^0), most compilers on Big Endian systems inconveniently consider the most significant bit (MSB) bit 0 (bit 0 has a weight of 2^31, 2^63 etc. depending on the word size), so in case your bitfield needs to be binary-compatible across machines with different endianness, you will need to define two versions of the bitfield.

The Raw Bitfield

Did you notice the "int raw" in the union? It lets us conveniently (and type-safely) export the raw bit value without having to use a cast.

	printf("raw flags: 0x%x\n", flags.raw);

We can use this to reconstruct the FLAG_* constants in the original example, in case some code needs it:

#define FLAG_USER   (((flags_t) { { .user   = TRUE } }).raw)
#define FLAG_ZERO   (((flags_t) { { .zero   = TRUE } }).raw)
#define FLAG_FORCE  (((flags_t) { { .force  = TRUE } }).raw)
#define FLAG_COMPAT (((flags_t) { { .compat = TRUE } }).raw)

This code constructs a temporary instance of the bitfield, sets one bit, and converts it into a raw integer - all at compile time.

Bitfield Access from Assembly

With the same trick, you can also access your bitfield from assembly, for example if the bitfield is part of the Thread Control Block in your operating system kernel, and the low level context switch code needs to access one of the bits. The "int raw" can be used to statically convert a flag into the corresponding raw mask:

typedef unsigned int boolean_t;

typedef union {
	struct {
		boolean_t bit0:1;
		boolean_t bit1:1;
		int :19;
		boolean_t bit31:1;
	};
	int raw;
} bitfield_t;

int test()
{
	int param = -1;
	int result;

	__asm__ volatile (
		"test    %2, %1    \n"
		"xor     %0, %0    \n"
		"setcc   %0        \n"
		: "=r" (result)
		: "r" (param),
		  "i" (((bitfield_t) { { .bit31 = TRUE } }).raw)
	);
	return result;
}

The corresponding x86 assembly code looks like this:

	.text
	.align	4,0x90
	.globl	_test
_test:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$-1, %eax
	## InlineAsm Start
	test	$0x80000000, %eax
	xor	%eax, %eax
	setcc	%eax
	## InlineAsm End
	popl	%ebp
	ret

	.subsections_via_symbols

This works fine with LLVM, but unfortunately GCC (4.2.1) has problems detecting that the raw value is available at compile time, so the "i" has to be replaced with an "r": GCC will then pre-assign a register with the raw value instead of being able to use an immediate with the "test" instruction.

How to Not Do It

I have seen C++ code doing this:

enum {
	FLAG_USER,
	FLAG_ZERO,
	FLAG_FORCE,
	FLAG_COMPAT = 31
}

int
create_object(bitfield_t flags)
{
        bool is_compat = flags.is_set(FLAG_COMPAT);

        if (is_compat)
                flags -= FLAGS_FORCE;

        if (flags.is_set(FLAG_FORCE)) {
                [...]
        }
        [...]
}

int
create_object_zero(int flags)
{
	create_object(flags + FLAGS_ZERO);
}

void
caller()
{
        create_object(((bitfield_t)FLAG_FORCE) + FLAG_COMPAT);
}

This all looks quite weird. The constants are bit index values, and they are added and subtracted. The reason is C++ operator overloading:

class bitmask_t
{
    word_t      maskvalue;

public:
    [...]
    inline bitmask_t operator -= (int n)
        {
            maskvalue = maskvalue & ~(1UL << n);
            return (bitmask_t) maskvalue;
        }
    [...]
}

This is horrible. The code that uses this class makes no sense unless you read and understand the implementation of the class. And you have to be very careful: While it is possible to "add" a flag to an existing bitfield, you cannot just add two flags - it would do the arithmetic and add the two values.

Mapping the setting and clearing of bits onto the addition and subtraction operators is clearly wrong in the first place: Flags in a bitfield are equivalent to elements in a set. Setting a flag is equivalent to the "union" operation, which even in Mathematics has its own symbol instead of overloading the "+" operator.

Question

If you compile code that does something like "((bitfield_t) { { .bit31 = TRUE } }).raw" with GCC in C++ mode, it fails. Why?

pixelstats trackingpixel

37 Responses to “Readable and Maintainable Bitfields in C”

  1. strik says:

    Regarding the bitfield:

    It is bad practise to define a bit-field entry with “int”; quoting N1124, footnote 105:

    “As specied in 6.7.2 above, if the actual type specier used is int or a typed
    then it is implementation-dened whether the bit-eld is signed or unsigned.”

    Thus, it may be that your comparison flags.compat == 1 is never true, as flags.compat is -1. (Of course, for flags, this is no problem. However, if you have some bit-field entry of “int abc : 4;”, you might be very surprised if abc is in the range -8..7 instead of 0..15.)

    Thus, better use “unsigned int compat : 1;”

    Furthermore, other than endianness issues, bit-fields are not very portable if you rely only on what C guarantees:

    “An implementation may allocate any addressable storage unit large enough to hold a bit-eld. If enough space remains, a bit-eld that immediately follows another bit-eld in a structure shall be packed into adjacent bits of the same unit. If insufcient space remains, whether a bit-eld that does not t is put into the next unit or overlaps adjacent units is implementation-dened. The order of allocation of bit-elds within a unit (high-order to
    low-order or low-order to high-order) is implementation-dened. The alignment of the addressable storage unit is unspecied.”
    (N1124.pdf, §6.7.2.1 “Structure and union specifiers”, no. 10, p. 102)

    That is, here is very much unspecified. Especially a bit-field that does not fit in the storage unit where the current bit-field is located can be placed either crossing the storage unit (“byte”), or it may begin at the next one.

    For example: In your example above, .compat could be placed in a byte of its own, if the compiler likes it this way.

    Regarding using the union to access the raw value: While it works in practise (on most machines), there might be obscure machines where this does not work, as the C standard does not guarantee anything.

    “Annex J (informative)
    Portability issues
    [...]
    J.1 Unspecified behaviour
    [...]
    The value of a union member other than the last one stored into (6.2.6.1)”
    (N1124.pdf, J.1, p. 488)

    or

    “When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecied values.”
    (N1124.pdf, §6.2.6.1, p. 50)

    So, to make a long story short: While you are right that it is possible to implement bit-fields with C bit-fields, it is not advisible if you must agree on an exact layout (for example, bits of some hardware register or to write files which are read by other programs). Note also that even one compiler might “change its opinion” with some newer version, thus, even saying “but I know my compiler” will not help you.

    Regarding your question: Does C++ even allow struct initialisation with the “.bit31 = TRUE” syntax? Last time I looked at it, it did not. To be more precise, even now, there are still many C compiler arounds that do not support this, as this was added with C99. Before C99, it was not supported on C, either.

  2. quackzilla says:

    If you’re going to advocate this kind of approach for bitfields, there’s a tiny gotcha which can make debugging a nightmare:

    typedef int boolean_t;

    should be

    typedef unsigned int boolean_t;

    You’ve taken other steps to mitigate sign debugging hell (using the high bit, TRUE is !FALSE, always use != FALSE rather than == TRUE comparisons, etc), but it can be a terrible slice of purgatory to debug if many of these steps are missing as they often are in other’s code.

  3. Doesn’t this approach run into issues depending on how the C compiler packs the structs?

  4. strik says:

    @Myron A. Semack:

    That’s exactly what I wanted to say with my rather lengthy comment. ;)

  5. ZungBang says:

    Bit-fields are a tempting abstraction. The problem with bit-fields is that the normal use case for them is in a setting which tends to break the abstraction.

    Case in point: a former boss of mine had me debug machine exceptions caused by a piece of code he wrote that used bit-fields to represent and access hardware registers. He was quite in love with it, I swear.

    Turns out that the assembly code that was generated by the compiler, did byte-wide reads/writes on hardware that was designed for 32-bit wide accesses only. The code would’ve been perfectly fine for accessing normal memory, but not on our system.

    The fix was to use “boolean magic”, which worked nicely both on the actual system and in a simulation environment.

  6. Michael Steil says:

    Thanks for all the input; I changed “int” to “unsigned int”.

  7. Anonymous says:

    You really should be changing it to say NOT to use this method! It is completely unreliable and will most likely cause you problems when compiling on different compilers or different architectures, as was thoroughly explained by the earlier comments. You’re relying on unspecified behaviors in several different ways.

  8. Bobby says:

    “Bitfields in C always start at bit 0.” is not true. The spec says “The order of allocation of bit-elds within a unit (high-order to low-order or low-order to high-order) is implementation-dened.”

    As everybody else has been saying, relying on any details of how this code works is just asking for trouble. If you just want bitfields and don’t need to interact with code from other compilers, or hardware registers, and you don’t particularly care how it works behind the scenes, C bitfields are fine. For anything else, they aren’t suitable.

  9. lorenzo says:

    >> If you compile code that does something like “((bitfield_t) { { .bit31 = TRUE } }).raw” with GCC in C++ mode, it fails. Why?

    to build anonymous struct (and array too) on fly i use the syntax {field:value}
    the above code compile with g++ on linux:

    #include

    union flags_t {
    struct {
    unsigned int _user:1;
    unsigned int _zero:1;
    unsigned int _force:1;
    unsigned int _dummy:28;
    unsigned int _compat:1;
    } _bitfield;
    unsigned int _raw;
    };

    void print(const flags_t &flags) {
    std::cout << “flags._user: ” << flags._bitfield._user << “\n”
    << “flags._zero: ” << flags._bitfield._zero << “\n”
    << “flags._force: ” << flags._bitfield._force << “\n”
    << “flags._compat: ” << flags._bitfield._compat << std::endl;
    }

    int main(int argc, char **argv) {
    print( (flags_t){{_user:1, _zero:0, _force:1, _dummy:0, _compat:0}} );

    return 0;
    }

  10. [...] Torquemada (D-Spain), Pol Pot (D-Cambodia), Mao Tse-tung… Small is Beautiful A much better way of handling bitfields in C, more readable and more maintainable Does anyone else think the internet’s furor over Iran is largely the result of people who just [...]

  11. bitfieldhater says:

    Please change the title to:
    Non-Portable Bitfields in C

    If you ever think that someone may possibly want to use your code in a cross-platform manner or even interfacing to it from code written in a different language on the same platform, please refrain from using bitfields. Use an unsigned int with masks instead.

    Thank you.

  12. Matt H says:

    @bitfieldhater

    How would masks be cross platform? You still have to deal with endianness issues.

    Bitfields are great in my opinion. I find its easiest to determine how your compiler is aligning them, and then adjust the structure for other compilers using preprocessor defines (if you want to interop)

  13. Anonymous says:

    @Matt H

    You have to deal with endianness issues anyway if you’re doing cross-platform code. Bitfields or masks aren’t going to change that. However, using bitfields adds a lot of other issues on top of that, none of which you actually have any control over!

  14. strik says:

    Michael, as other’s have also pointed out: DON’T DO IT! This type of code is the best way to be non-portable!

    It might work on your specific platform – NOW. But every little change (even time of compilation) can probably break it.

  15. Anonymous says:

    That use of the union is undefined behavior

  16. Felix says:

    I also have to agree with the other commenters – bitfields suck. They are tempting, yes, but next to all the problems with theoretically “undefined behavior”, there are real world problems. The largest problem is endianness. The right way to handle the wrong endianness (which is little endian, obviously) is to treat reading binary data as “decoding” and writing binary data as “encoding”. Like treating user data in a web application as “unsafe”, you should treat “binary data” as “binary” (or 8-bit numbers, if you want), but never as “32-bit numbers”.

    The right way is to handle endianness is not by doing swapping of data at whatever place where it’s “broken otherwise”, it’s by having a defined (and correct) funtion to put “bytes of data” into “registers”, or generally spoken: how to convert a stream of 8 bit values into a stream of 32bit (or 64bit) values. PowerPC for example does it right; there is no “bswap”-instruction, but instead there is a “load non-native endian word” and “store non-native endian word”. Endianness is an issue with interfacing memory spaces with different data widths; registers have 32 (or 64) bit data width, and memory can be considered as an 8-bit memory space, if you do bytewise access.

    If you start having a bit-addressed memory space (like a bitfield), then WHENEVER you interface to a word-addressed memory space (register) or a byte-addressed memory space (memory) you have to take endianness into account. And while swapping in byte-based domains is cheap (bswap ftw), swapping bits is (usually) not cheap. Sure, you can just redefine your bitfield to magically match whatever the native bit-to-word conversion does (usually by inverting the order), but would you ever re-define any (word) constants because you’re too cheap to do proper endianness correction on bytes? No. Why would you redefine your bitfields then?

  17. [...] I came upon some spirited discussion on reddit concerning a blog post that discussed the use of bit-fields in C. As a quick refresher to anyone unfamiliar or rusty on bit-fields, they are a construct in C that [...]

  18. Graham says:

    u3_u5_u8_u12_DECODER {
    // see libs_apps/docs/
    }

    Supposing you had a C++ class,
    for use at COMPILE_TIME or RUN_TIME

    BIT_FIELD { // probably template // probably derived from NAMED_OBJECT

    uns nbits_gap_right;
    uns nbits_data;
    bool is_signed;

    // thats all you need, maybe expand “is_signed” to TYPE_SPEC_of_enum_bit_field
    // plus NAME + SPEC of this
    // plus WHEN you know this, CT_COMPILE_TIME, RT_RUNTIME, const ?, changed?

    enum byte_order_of_memory_and_code_used_to_handle_it
    byte_order = byte_order_hilo_by_definition_probably_on_lohi_cpu;

    static enum compiler_bit_order_code_and_metrics
    bit_order = bit_order_gcc_on_AMD64_using_masks_AND_bit_fields_mixed; // !

    // starts to get virtual here

    u32 GET_mask_1s_rhj();
    u32 GET_mask_1s_in_situ(); // 1 where data is

    u32 GET_mask_0s_rhj();
    u32 GET_mask_0s_in_situ(); // 0 where data is

    bool GET_is_within_byte_boundry(); // does_not_cross_byte_boundry
    bool know_gap_right_is_zero;
    bool know_no_need_to_mask_off_upper_bits;

    };

    class LIST_of_BIT_FIELD
    {
    BIT_FIELD & LIST[4]; // however many you want
    bool generate_code( generator & );
    };

    and some friends
    u32_hilo
    u32_lohi
    u32_cpu // probably typedef to one or the other, with C++ casting all over the place

    and some distant cousins: u32_cpu_lohi_holding_inverted_hilo.

    Plus of course some optimising bswap functions (HTON macros evaporate to swapb),
    and some failsafe compile-mode-use-masks-and-shifts,
    plus some unit tests,
    plus a community to maintain it for a range of CPUs, compilers, times-of-day,

    Plus, you also can run this data-driven C++ class,
    to prints the correct C/C++ for your machine/compiler/timeofday
    (or falls back on masks and swapb),
    as long as there is a non-cross-compiler available, at compile-time.

    Add to that, the attempt to access u32_hilo IN-SITU from a u32_cpu_lohi
    knowing that the sub-byte-values are easier, but the byte-boundry cross isnt,
    but never-the-less attempt to get a good engineering compromise

    Then what _YOU_ do with a family of types named

    u3_u5_u8_u16 // upper tray is blind to lower tray u16
    u3_u5_u8_u4_u12 // decoder finds common case of 4K pools
    u8_u8_u16 // decode u3_u5 as plain lookup[u8] // sparse: void * lookup[ decode[u8] ]
    u16_u16 // u16_upper_u16_lower
    u32_hilo // as_found_in_file_preferably_aligned

    Remember that decoding, will probably extract the values all-in-one-go,
    because you KNOW that you will decode the entire multi-step-address
    (which is an index not an offset, looking up the object in a tray_of_256_of_similar_type)

    (Q) If it is to be quick on all architectures, and all compiler-modes,
    what should the bit layout be: ?

    u3_u5_u8_u16
    u16_u8_u5_u3

    u3_u5_u8_u4_u12
    u12_u4_u8_u5_u3

    NB By using different names for each bitfield,
    it remains easier to prototype here, for a larger space,
    otherwise its:

    u3_A
    u5_B
    u8_C ….
    u16_Lower

    NB My storage layout allocates within the lower 16 bits (object per item)
    with multiple parallel worlds selected in the upper 16 bits.
    That upper layout is implemented as a (hidden) lower_tray_of_items
    to reuse code, but allow mixing trays_of_lower_u16 from different files

  19. kohlrak says:

    I know this is old, but, truely, the instability comes with the datatype itself. An int on an older system is 16bits, and int on a going out of style system is 32bits, and a new system an int is 64bits. Probably best to revert to non-changing “types” such as byte, word, dword, qtword, etc.

  20. [...] Readable and Maintainable Bitfields in C, a good blog post from pagetable on how to work with bitfields in a readable way using C. [...]

  21. Rena says:

    struct {
    boolean_t bit0:1;
    boolean_t bit1:1;
    int :19;
    boolean_t bit31:1;
    };

    Your integers are 22 bits?

  22. woody says:

    I disagree strongly with all the bit field detractors.
    I do embeded firmware. Bit fields are critical. Most porting of embedded code is going to be to a similar platform, for example, various flavors of an 8051, using the same compiler.

    The PIC24 family uses the GNU compiler, and their entire method of accessing the bit fields in all the registers, depends on bit fields in unions.
    Every pic compiler does the same thing.

    I use Keil 8051 compiler in my work. I’ve never seen any problems when using the Keil compiler in the 8051 world, and have ported code back and forth across many 8051 platforms. Bit fields have been used extensively to unpack, and access registers and bits, and communication buffers.

    The moral of the story is: Bit fields are highly useful. Now when I go to port my 8051 code over to an arm, I will of course have to use whatever compiler is furnished, and adhere to the way that compiler implements things, but most of that can be hidden behind defines, just as it is in the PCI 24 family.

  23. A poster who dislikes bitfields is one who has never seriously programmed an embedded system.

  24. Sean Ellis says:

    Scott said: “A poster who dislikes bitfields is one who has never seriously programmed an embedded system.”

    I do serious embedded work. I dislike bitfields. I would love to use them – they would make life so much easier – but they’re all but unusable for serious work.

    The reasons for not using them have all been set out above. They’re a mess of undefinedness. Which end do they pack from? Which end of what (bytes, words, halfwords)? What exact set of operations is used for a bitfield set, and how does it interact with memory mapped I/O?

    Even if you only use a single toolchain, as woody apparently dows, any of the behaviors you rely on can be broken with an updgrade, at any time, for any reason.

    This is a real pity, because they would remove about half the evil #defines in the embedded world. But even the preprocessor isn’t quite as evil as undefined behavior.

  25. t says:

    I have programmed using C for numerous processors, dsps, and microcontrollers and I have never found the need to use bitfields. I use masks and macros when I want to manipulate bits.

  26. Rod says:

    Can someone point me to a embedded processor vendor who uses bitfields in their sample code foro access registers? Up til now, no chip maker code I know has. I’ve lost count how many embedded architectures I’ve worked with.

    Maybe I should have read the C spec before today, but I’ve been embedded programming (mostly self taught) for 6 years now. Yesterday was the first time I had ever heard of bitfields in C.

  27. Jon says:

    Rod, Microchip uses bitfields in their processor header files, in their mcc18 compiler. For example, the STATUS register in a PIC18F26K20 is defined as so in their header file:
    extern near unsigned char STATUS;
    extern near struct {
    unsigned C:1;
    unsigned DC:1;
    unsigned Z:1;
    unsigned OV:1;
    unsigned N:1;
    } STATUSbits;

    This way, you can clear (for example) the C bit by:
    STATUSbits.C = 0;

    Good luck!

  28. Frederick Eek says:

    I have been doing extensive embedded programming (amongst others) for 20 years. I fail to understand why you would not use them. All hardware have bit fields in their registers. Most protocols contain information with bit fields.
    As for their undefinedness, exactly the same applies for “boolean magic”. Here you use defines to map bits to positions in unsigned shorts or ints. You are therefore fixing the flags to the endianness of your system.
    The endianess problem does not lie in your compiler/system, but how different systems interpret them (i.e. when communicating over a bus or network).
    I have been using bit fields in various systems (ported and sharing code) including 8086, .NET programming, 8051, Arm, TMS320, AVR, etc. I never have to modify flags or #defines. The worst I have ever had to do was to have two definitions for the same structure depending on the compiler used (the old 8051 Franklin compiler did not support 32 bit packed structures and the packed TIMESTAMP data type we used on the network was packed into 32 bits).
    The biggest advantage of bit fields is the fact that you do not continuously have to keep track of how flags and masks actually map to your memory. Once the structure is defined, you are completely abstracted from the memory representation whilst for boolean magic you have to remember the masks and sometimes even shifts at every point you use them.
    I think people that detract from bit fields have actually never really had to do anything but simple flag checking and setting.

  29. Eswar says:

    Can any body explain the memory size of a bit field variable? What is the advantage of using Bit fields particular to memory management point of view?

  30. tissit says:

    TI uses bitfields and so does Fujitsu. I have a feeling I’ve met others, but I’m not sure. I think AVR is the exception that doesn’t.

  31. [...] then how will it be useful ? But now I got to know how significant this union is ! X_O Staring at this page for about 1 hour [...]

  32. chalapathi says:

    Hi ,
    I’m chalapathi , which is very helpful for me updating the technology.

    Thanks & Regards

    Chalapathi

  33. Kitty says:

    It is not my first time to pay a visit this web page, i am browsing this web site dailly
    and get fastidious information from here daily.

  34. Felix says:

    Your style is soo unique compared to other folks I’ve read stuff
    from. Thanks for posting when you’ve got the opportunity,
    Guess I’ll just bookmark this page.

  35. Everette says:

    Askking questions are really pleasant thing if you are not understanding anything fully, except this paragraph gives fastidious
    understanding even.

  36. shoes says:

    shoes…

    If some one needs to be updated with hottest technologies then he must be pay a quick visit this web siteReadable and Maintainable Bitfields in C ? pagetable.com and be up to date every day….

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word