Readable and Maintainable Bitfields in C

Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this.

The straightforward way to deal with bitfields is to do the Boolean logic by hand:

Boolean Magic

#define FLAG_USER   (1 << 0)
#define FLAG_ZERO   (1 << 1)
#define FLAG_FORCE  (1 << 2)
/* bits 3-30 reserved */
#define FLAG_COMPAT (1 << 31)

int
create_object(int flags)
{
        int is_compat = (flags & FLAG_COMPAT);

        if (is_compat)
                flags &= ~FLAGS_FORCE;

        if (flags & FLAG_FORCE) {
                [...]
        }
        [...]
}

int
create_object_zero(int flags)
{
	create_object(flags | FLAGS_ZERO);
}

void
caller()
{
        create_object(FLAG_FORCE | FLAG_COMPAT);
}

You can see code like this everywhere, so most C programmers can probably read and understand this quite easily. But unfortunately, this method is very error-prone. Mixing up "&" and "&&" and omitting the "~" when doing "&=" are common oversights, and since the compiler only sees "int" types, this also doesn't give you any kind of type-safety.

Bitfields

Let us look at the same code using bitfields instead:

typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        int raw;
} flags_t;

int
create_object(flags_t flags)
{
        boolean_t is_compat = flags.compat;

        if (is_compat)
                flags.force = FALSE;

        if (flags.force) {
                [...]
        }
        [...]
}

int
create_object_zero(flags_t flags)
{
	flags.zero = TRUE;
	create_object(flags);
}

void
caller()
{
        create_object((flags_t) { { .force = TRUE, .compat = TRUE } });
}

Flags can just be used like any variables. The compiler abstracts the Boolean logic away. The only downside is that the code with the static initializer requires some advanced syntax.

Endianness

Bitfields in C always start at bit 0. While this is the least significant bit (LSB) on Little Endian (bit 0 has a weight of 2^0), most compilers on Big Endian systems inconveniently consider the most significant bit (MSB) bit 0 (bit 0 has a weight of 2^31, 2^63 etc. depending on the word size), so in case your bitfield needs to be binary-compatible across machines with different endianness, you will need to define two versions of the bitfield.

The Raw Bitfield

Did you notice the "int raw" in the union? It lets us conveniently (and type-safely) export the raw bit value without having to use a cast.

	printf("raw flags: 0x%xn", flags.raw);

We can use this to reconstruct the FLAG_* constants in the original example, in case some code needs it:

#define FLAG_USER   (((flags_t) { { .user   = TRUE } }).raw)
#define FLAG_ZERO   (((flags_t) { { .zero   = TRUE } }).raw)
#define FLAG_FORCE  (((flags_t) { { .force  = TRUE } }).raw)
#define FLAG_COMPAT (((flags_t) { { .compat = TRUE } }).raw)

This code constructs a temporary instance of the bitfield, sets one bit, and converts it into a raw integer - all at compile time.

Bitfield Access from Assembly

With the same trick, you can also access your bitfield from assembly, for example if the bitfield is part of the Thread Control Block in your operating system kernel, and the low level context switch code needs to access one of the bits. The "int raw" can be used to statically convert a flag into the corresponding raw mask:

typedef unsigned int boolean_t;

typedef union {
	struct {
		boolean_t bit0:1;
		boolean_t bit1:1;
		int :19;
		boolean_t bit31:1;
	};
	int raw;
} bitfield_t;


int test()
{
	int param = -1;
	int result;

	__asm__ volatile (
		"test    %2, %1    n"
		"xor     %0, %0    n"
		"setcc   %0        n"
		: "=r" (result)
		: "r" (param),
		  "i" (((bitfield_t) { { .bit31 = TRUE } }).raw)
	);
	return result;
}

The corresponding x86 assembly code looks like this:

	.text
	.align	4,0x90
	.globl	_test
_test:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$-1, %eax
	## InlineAsm Start
	test	$0x80000000, %eax
	xor	%eax, %eax
	setcc	%eax
	## InlineAsm End
	popl	%ebp
	ret

	.subsections_via_symbols

This works fine with LLVM, but unfortunately GCC (4.2.1) has problems detecting that the raw value is available at compile time, so the "i" has to be replaced with an "r": GCC will then pre-assign a register with the raw value instead of being able to use an immediate with the "test" instruction.

How to Not Do It

I have seen C++ code doing this:

enum {
	FLAG_USER,
	FLAG_ZERO,
	FLAG_FORCE,
	FLAG_COMPAT = 31
}

int
create_object(bitfield_t flags)
{
        bool is_compat = flags.is_set(FLAG_COMPAT);

        if (is_compat)
                flags -= FLAGS_FORCE;

        if (flags.is_set(FLAG_FORCE)) {
                [...]
        }
        [...]
}

int
create_object_zero(int flags)
{
	create_object(flags + FLAGS_ZERO);
}

void
caller()
{
        create_object(((bitfield_t)FLAG_FORCE) + FLAG_COMPAT);
}

This all looks quite weird. The constants are bit index values, and they are added and subtracted. The reason is C++ operator overloading:

class bitmask_t
{
    word_t      maskvalue;

public:
    [...]
    inline bitmask_t operator -= (int n)
        {
            maskvalue = maskvalue & ~(1UL << n);
            return (bitmask_t) maskvalue;
        }
    [...]
}

This is horrible. The code that uses this class makes no sense unless you read and understand the implementation of the class. And you have to be very careful: While it is possible to "add" a flag to an existing bitfield, you cannot just add two flags - it would do the arithmetic and add the two values.

Mapping the setting and clearing of bits onto the addition and subtraction operators is clearly wrong in the first place: Flags in a bitfield are equivalent to elements in a set. Setting a flag is equivalent to the "union" operation, which even in Mathematics has its own symbol instead of overloading the "+" operator.

Question

If you compile code that does something like "((bitfield_t) { { .bit31 = TRUE } }).raw" with GCC in C++ mode, it fails. Why?

40 thoughts on “Readable and Maintainable Bitfields in C”

  1. Regarding the bitfield:

    It is bad practise to define a bit-field entry with “int”; quoting N1124, footnote 105:

    “As specied in 6.7.2 above, if the actual type specier used is int or a typed
    then it is implementation-dened whether the bit-eld is signed or unsigned.”

    Thus, it may be that your comparison flags.compat == 1 is never true, as flags.compat is -1. (Of course, for flags, this is no problem. However, if you have some bit-field entry of “int abc : 4;”, you might be very surprised if abc is in the range -8..7 instead of 0..15.)

    Thus, better use “unsigned int compat : 1;”

    Furthermore, other than endianness issues, bit-fields are not very portable if you rely only on what C guarantees:

    “An implementation may allocate any addressable storage unit large enough to hold a bit-eld. If enough space remains, a bit-eld that immediately follows another bit-eld in a structure shall be packed into adjacent bits of the same unit. If insufcient space remains, whether a bit-eld that does not t is put into the next unit or overlaps adjacent units is implementation-dened. The order of allocation of bit-elds within a unit (high-order to
    low-order or low-order to high-order) is implementation-dened. The alignment of the addressable storage unit is unspecied.”
    (N1124.pdf, §6.7.2.1 “Structure and union specifiers”, no. 10, p. 102)

    That is, here is very much unspecified. Especially a bit-field that does not fit in the storage unit where the current bit-field is located can be placed either crossing the storage unit (“byte”), or it may begin at the next one.

    For example: In your example above, .compat could be placed in a byte of its own, if the compiler likes it this way.

    Regarding using the union to access the raw value: While it works in practise (on most machines), there might be obscure machines where this does not work, as the C standard does not guarantee anything.

    “Annex J (informative)
    Portability issues
    […]
    J.1 Unspecified behaviour
    […]
    The value of a union member other than the last one stored into (6.2.6.1)”
    (N1124.pdf, J.1, p. 488)

    or

    “When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecied values.”
    (N1124.pdf, §6.2.6.1, p. 50)

    So, to make a long story short: While you are right that it is possible to implement bit-fields with C bit-fields, it is not advisible if you must agree on an exact layout (for example, bits of some hardware register or to write files which are read by other programs). Note also that even one compiler might “change its opinion” with some newer version, thus, even saying “but I know my compiler” will not help you.

    Regarding your question: Does C++ even allow struct initialisation with the “.bit31 = TRUE” syntax? Last time I looked at it, it did not. To be more precise, even now, there are still many C compiler arounds that do not support this, as this was added with C99. Before C99, it was not supported on C, either.

  2. If you’re going to advocate this kind of approach for bitfields, there’s a tiny gotcha which can make debugging a nightmare:

    typedef int boolean_t;

    should be

    typedef unsigned int boolean_t;

    You’ve taken other steps to mitigate sign debugging hell (using the high bit, TRUE is !FALSE, always use != FALSE rather than == TRUE comparisons, etc), but it can be a terrible slice of purgatory to debug if many of these steps are missing as they often are in other’s code.

  3. Bit-fields are a tempting abstraction. The problem with bit-fields is that the normal use case for them is in a setting which tends to break the abstraction.

    Case in point: a former boss of mine had me debug machine exceptions caused by a piece of code he wrote that used bit-fields to represent and access hardware registers. He was quite in love with it, I swear.

    Turns out that the assembly code that was generated by the compiler, did byte-wide reads/writes on hardware that was designed for 32-bit wide accesses only. The code would’ve been perfectly fine for accessing normal memory, but not on our system.

    The fix was to use “boolean magic”, which worked nicely both on the actual system and in a simulation environment.

  4. You really should be changing it to say NOT to use this method! It is completely unreliable and will most likely cause you problems when compiling on different compilers or different architectures, as was thoroughly explained by the earlier comments. You’re relying on unspecified behaviors in several different ways.

  5. “Bitfields in C always start at bit 0.” is not true. The spec says “The order of allocation of bit-elds within a unit (high-order to low-order or low-order to high-order) is implementation-dened.”

    As everybody else has been saying, relying on any details of how this code works is just asking for trouble. If you just want bitfields and don’t need to interact with code from other compilers, or hardware registers, and you don’t particularly care how it works behind the scenes, C bitfields are fine. For anything else, they aren’t suitable.

  6. >> If you compile code that does something like “((bitfield_t) { { .bit31 = TRUE } }).raw” with GCC in C++ mode, it fails. Why?

    to build anonymous struct (and array too) on fly i use the syntax {field:value}
    the above code compile with g++ on linux:

    #include

    union flags_t {
    struct {
    unsigned int _user:1;
    unsigned int _zero:1;
    unsigned int _force:1;
    unsigned int _dummy:28;
    unsigned int _compat:1;
    } _bitfield;
    unsigned int _raw;
    };

    void print(const flags_t &flags) {
    std::cout << “flags._user: ” << flags._bitfield._user << “n”
    << “flags._zero: ” << flags._bitfield._zero << “n”
    << “flags._force: ” << flags._bitfield._force << “n”
    << “flags._compat: ” << flags._bitfield._compat << std::endl;
    }

    int main(int argc, char **argv) {
    print( (flags_t){{_user:1, _zero:0, _force:1, _dummy:0, _compat:0}} );

    return 0;
    }

  7. Please change the title to:
    Non-Portable Bitfields in C

    If you ever think that someone may possibly want to use your code in a cross-platform manner or even interfacing to it from code written in a different language on the same platform, please refrain from using bitfields. Use an unsigned int with masks instead.

    Thank you.

  8. @bitfieldhater

    How would masks be cross platform? You still have to deal with endianness issues.

    Bitfields are great in my opinion. I find its easiest to determine how your compiler is aligning them, and then adjust the structure for other compilers using preprocessor defines (if you want to interop)

  9. @Matt H

    You have to deal with endianness issues anyway if you’re doing cross-platform code. Bitfields or masks aren’t going to change that. However, using bitfields adds a lot of other issues on top of that, none of which you actually have any control over!

  10. Michael, as other’s have also pointed out: DON’T DO IT! This type of code is the best way to be non-portable!

    It might work on your specific platform – NOW. But every little change (even time of compilation) can probably break it.

  11. I also have to agree with the other commenters – bitfields suck. They are tempting, yes, but next to all the problems with theoretically “undefined behavior”, there are real world problems. The largest problem is endianness. The right way to handle the wrong endianness (which is little endian, obviously) is to treat reading binary data as “decoding” and writing binary data as “encoding”. Like treating user data in a web application as “unsafe”, you should treat “binary data” as “binary” (or 8-bit numbers, if you want), but never as “32-bit numbers”.

    The right way is to handle endianness is not by doing swapping of data at whatever place where it’s “broken otherwise”, it’s by having a defined (and correct) funtion to put “bytes of data” into “registers”, or generally spoken: how to convert a stream of 8 bit values into a stream of 32bit (or 64bit) values. PowerPC for example does it right; there is no “bswap”-instruction, but instead there is a “load non-native endian word” and “store non-native endian word”. Endianness is an issue with interfacing memory spaces with different data widths; registers have 32 (or 64) bit data width, and memory can be considered as an 8-bit memory space, if you do bytewise access.

    If you start having a bit-addressed memory space (like a bitfield), then WHENEVER you interface to a word-addressed memory space (register) or a byte-addressed memory space (memory) you have to take endianness into account. And while swapping in byte-based domains is cheap (bswap ftw), swapping bits is (usually) not cheap. Sure, you can just redefine your bitfield to magically match whatever the native bit-to-word conversion does (usually by inverting the order), but would you ever re-define any (word) constants because you’re too cheap to do proper endianness correction on bytes? No. Why would you redefine your bitfields then?

  12. u3_u5_u8_u12_DECODER {
    // see libs_apps/docs/
    }

    Supposing you had a C++ class,
    for use at COMPILE_TIME or RUN_TIME

    BIT_FIELD { // probably template // probably derived from NAMED_OBJECT

    uns nbits_gap_right;
    uns nbits_data;
    bool is_signed;

    // thats all you need, maybe expand “is_signed” to TYPE_SPEC_of_enum_bit_field
    // plus NAME + SPEC of this
    // plus WHEN you know this, CT_COMPILE_TIME, RT_RUNTIME, const ?, changed?

    enum byte_order_of_memory_and_code_used_to_handle_it
    byte_order = byte_order_hilo_by_definition_probably_on_lohi_cpu;

    static enum compiler_bit_order_code_and_metrics
    bit_order = bit_order_gcc_on_AMD64_using_masks_AND_bit_fields_mixed; // !

    // starts to get virtual here

    u32 GET_mask_1s_rhj();
    u32 GET_mask_1s_in_situ(); // 1 where data is

    u32 GET_mask_0s_rhj();
    u32 GET_mask_0s_in_situ(); // 0 where data is

    bool GET_is_within_byte_boundry(); // does_not_cross_byte_boundry
    bool know_gap_right_is_zero;
    bool know_no_need_to_mask_off_upper_bits;

    };

    class LIST_of_BIT_FIELD
    {
    BIT_FIELD & LIST[4]; // however many you want
    bool generate_code( generator & );
    };

    and some friends
    u32_hilo
    u32_lohi
    u32_cpu // probably typedef to one or the other, with C++ casting all over the place

    and some distant cousins: u32_cpu_lohi_holding_inverted_hilo.

    Plus of course some optimising bswap functions (HTON macros evaporate to swapb),
    and some failsafe compile-mode-use-masks-and-shifts,
    plus some unit tests,
    plus a community to maintain it for a range of CPUs, compilers, times-of-day,

    Plus, you also can run this data-driven C++ class,
    to prints the correct C/C++ for your machine/compiler/timeofday
    (or falls back on masks and swapb),
    as long as there is a non-cross-compiler available, at compile-time.

    Add to that, the attempt to access u32_hilo IN-SITU from a u32_cpu_lohi
    knowing that the sub-byte-values are easier, but the byte-boundry cross isnt,
    but never-the-less attempt to get a good engineering compromise

    Then what _YOU_ do with a family of types named

    u3_u5_u8_u16 // upper tray is blind to lower tray u16
    u3_u5_u8_u4_u12 // decoder finds common case of 4K pools
    u8_u8_u16 // decode u3_u5 as plain lookup[u8] // sparse: void * lookup[ decode[u8] ]
    u16_u16 // u16_upper_u16_lower
    u32_hilo // as_found_in_file_preferably_aligned

    Remember that decoding, will probably extract the values all-in-one-go,
    because you KNOW that you will decode the entire multi-step-address
    (which is an index not an offset, looking up the object in a tray_of_256_of_similar_type)

    (Q) If it is to be quick on all architectures, and all compiler-modes,
    what should the bit layout be: ?

    u3_u5_u8_u16
    u16_u8_u5_u3

    u3_u5_u8_u4_u12
    u12_u4_u8_u5_u3

    NB By using different names for each bitfield,
    it remains easier to prototype here, for a larger space,
    otherwise its:

    u3_A
    u5_B
    u8_C ….
    u16_Lower

    NB My storage layout allocates within the lower 16 bits (object per item)
    with multiple parallel worlds selected in the upper 16 bits.
    That upper layout is implemented as a (hidden) lower_tray_of_items
    to reuse code, but allow mixing trays_of_lower_u16 from different files

  13. I know this is old, but, truely, the instability comes with the datatype itself. An int on an older system is 16bits, and int on a going out of style system is 32bits, and a new system an int is 64bits. Probably best to revert to non-changing “types” such as byte, word, dword, qtword, etc.

  14. struct {
    boolean_t bit0:1;
    boolean_t bit1:1;
    int :19;
    boolean_t bit31:1;
    };

    Your integers are 22 bits?

  15. I disagree strongly with all the bit field detractors.
    I do embeded firmware. Bit fields are critical. Most porting of embedded code is going to be to a similar platform, for example, various flavors of an 8051, using the same compiler.

    The PIC24 family uses the GNU compiler, and their entire method of accessing the bit fields in all the registers, depends on bit fields in unions.
    Every pic compiler does the same thing.

    I use Keil 8051 compiler in my work. I’ve never seen any problems when using the Keil compiler in the 8051 world, and have ported code back and forth across many 8051 platforms. Bit fields have been used extensively to unpack, and access registers and bits, and communication buffers.

    The moral of the story is: Bit fields are highly useful. Now when I go to port my 8051 code over to an arm, I will of course have to use whatever compiler is furnished, and adhere to the way that compiler implements things, but most of that can be hidden behind defines, just as it is in the PCI 24 family.

  16. Scott said: “A poster who dislikes bitfields is one who has never seriously programmed an embedded system.”

    I do serious embedded work. I dislike bitfields. I would love to use them – they would make life so much easier – but they’re all but unusable for serious work.

    The reasons for not using them have all been set out above. They’re a mess of undefinedness. Which end do they pack from? Which end of what (bytes, words, halfwords)? What exact set of operations is used for a bitfield set, and how does it interact with memory mapped I/O?

    Even if you only use a single toolchain, as woody apparently dows, any of the behaviors you rely on can be broken with an updgrade, at any time, for any reason.

    This is a real pity, because they would remove about half the evil #defines in the embedded world. But even the preprocessor isn’t quite as evil as undefined behavior.

  17. I have programmed using C for numerous processors, dsps, and microcontrollers and I have never found the need to use bitfields. I use masks and macros when I want to manipulate bits.

  18. Can someone point me to a embedded processor vendor who uses bitfields in their sample code foro access registers? Up til now, no chip maker code I know has. I’ve lost count how many embedded architectures I’ve worked with.

    Maybe I should have read the C spec before today, but I’ve been embedded programming (mostly self taught) for 6 years now. Yesterday was the first time I had ever heard of bitfields in C.

  19. Rod, Microchip uses bitfields in their processor header files, in their mcc18 compiler. For example, the STATUS register in a PIC18F26K20 is defined as so in their header file:
    extern near unsigned char STATUS;
    extern near struct {
    unsigned C:1;
    unsigned DC:1;
    unsigned Z:1;
    unsigned OV:1;
    unsigned N:1;
    } STATUSbits;

    This way, you can clear (for example) the C bit by:
    STATUSbits.C = 0;

    Good luck!

  20. I have been doing extensive embedded programming (amongst others) for 20 years. I fail to understand why you would not use them. All hardware have bit fields in their registers. Most protocols contain information with bit fields.
    As for their undefinedness, exactly the same applies for “boolean magic”. Here you use defines to map bits to positions in unsigned shorts or ints. You are therefore fixing the flags to the endianness of your system.
    The endianess problem does not lie in your compiler/system, but how different systems interpret them (i.e. when communicating over a bus or network).
    I have been using bit fields in various systems (ported and sharing code) including 8086, .NET programming, 8051, Arm, TMS320, AVR, etc. I never have to modify flags or #defines. The worst I have ever had to do was to have two definitions for the same structure depending on the compiler used (the old 8051 Franklin compiler did not support 32 bit packed structures and the packed TIMESTAMP data type we used on the network was packed into 32 bits).
    The biggest advantage of bit fields is the fact that you do not continuously have to keep track of how flags and masks actually map to your memory. Once the structure is defined, you are completely abstracted from the memory representation whilst for boolean magic you have to remember the masks and sometimes even shifts at every point you use them.
    I think people that detract from bit fields have actually never really had to do anything but simple flag checking and setting.

  21. Can any body explain the memory size of a bit field variable? What is the advantage of using Bit fields particular to memory management point of view?

  22. TI uses bitfields and so does Fujitsu. I have a feeling I’ve met others, but I’m not sure. I think AVR is the exception that doesn’t.

  23. Hi ,
    I’m chalapathi , which is very helpful for me updating the technology.

    Thanks & Regards

    Chalapathi

  24. I hate bitfields, but see no alternative for US Military Interface programming

    For the last 16 years, I have been fighting with endianness, bitfields and bitmasking. For better or worse, US military hardware typically has a Bib Endian interface containing information with bit lengths from 1 to 32 bits. These interfaces can frequently have hundreds of individual data items in them.

    I have used bit masking in the past to bypass the problem that bitfields are typically allocated from LSB to MSB on little endian machines and in the reverse order on big endian machines, however, the problem with this approach is that of data reduction. In the Military programming environment is is CRITICAL to record data traversing an interface and be able to reduce that data in a human readable format at a later date. To do this with a program using bitmasks means lots of hand coded diagnostic code.

    In my current programming environment, we have an automated data dictionary generation program which reads the .h files to produce a data dictionary which can be used to automatically reduce all defined fields in human readable format, e.g. if bit 7 is a bit named status defined as an enum of OK = 1 and FAULT = 0 and this is defined in a structure, the data parser will generate output such as status = OK.

    Bitmasking makes in the above impossible, if you know how to do it please tell me. So I am ‘forced’ to deal with the portability issue of bitfields. We have a number of ways to deal with this, however, for our case where we have little endian programs on the application side is to simply define the bitfields in each word in reverse order from how they are defined in the Interface Design Document (IDD) for the big endian hardware and do the required byte swapping. To prevent us having portability issues should we end up on a big endian platform (right), we keep representations of the the structure with the fields in the order defined in the IDD and in reverse order. The only thing left to do is swap the bytes in each word when going big endian to little endian. An example of the dual definition is:

    // Set LITTLE_ENDIAN to the type of the machine the code will run on,
    // 0 for big endian.. Alternatively set LITTLE_ENDIAN in the makefile
    #ifndef LITTLE_ENDIAN
    #define LITTLE_ENDIAN 1
    #endif

    #if LITTLE_ENDIAN
    struct ExampleStruct
    {
    unsigned b :22;
    unsigned a : 10;
    unsigned c;
    unsigned int e:14;
    unsigned d : 18;
    };
    #else
    struct ExampleStruct
    {
    unsigned a : 10;
    unsigned b :22;
    unsigned c;
    unsigned d : 18;
    unsigned int e:14;

    };
    #endif

    The above may seem very hard to maintain, however a utility program can be used to convert the first definition into the second. Further, changes in military hardware interfaces are VERY rare and typically used previously unused bits.

  25. For those blessed embedded programmers who haven’t run into chip vendors that define their interfaces in terms of bitfields, Freescale is one of those vendors.

    For those of you who labor under the view that masks & shifts have just as many portability problems, I’m afraid you’re mistaken. When you deal with values that are the same width as your registers, the same mask and shift operations that work correctly on a big-endian machine will work correctly on a little-endian machine. This is because big- and little-endian issues are representation issues that are entirely absent once you have translated representation into value, and machine-level arithmetic operations only work on values.

    Now, there may be an issue of correctly translating documentation of a register on a big-endian architecture into mask and shift operations, but this can be alleviated by thinking correctly about the problem. You need to figure out which end of the diagram holds the bits that form the least-significant portion of the value; these are almost always the ones on the right regardless of endianness, but they usually have the highest numbers in a big-endian architecture’s diagrams. Now look at the first field in the least significant bits; this is the field with a shift of 0 and some number of bits, n_1, of width. The next field then has a shift of n_1 and a width of n_2. The third field has a shift of (n_1+n_2) and a width of n_3, and so on.

    You can create a mask of width n for all n less than the width of the register by shifting the value 1 left by n bits and then subtracting 1. I.e. mask(n) = (1 << n) – 1. So to get the mask for a field of width 'n' and shift 's', you have field_mask(s,n) = mask(n) << s. To prepare a value for a field, mask it to the field width and shift it, i.e. field_value(s,n,v) = (mask(n) & v) << s. To clear a field in a value, apply the inverted mask: clear_field(s,n,v) = v & ~field_mask(s,n). To replace a field in a value, combine the two: replace_field(s,n,old,new) = clear_field(s,n,old) | field_value(s,n,new).

    If you define the above via macros and use constant values for the shift and width parameters, you'll get expressions that compilers can usually simplify fairly well (the mask calculations become constant expressions, for example) and things remain fairly readable. If you can't put calculations in macros due to style guidelines, then you can pre-calculate the masks and the remaining shift/mask operations become a quickly-learned idiom.

    It should not be too difficult to create a document in a machine-readable form that, for each register, has its shift, width, and an enumerated list of possible values and their human-readable names. Since the values are values and not representations, they are again not vulnerable to representation issues. You should be able to easily generate from this document a set of header files with enums and constant-macros for each field’s shift, width, and values, along with portable data parsing code. There’s absolutely no need to use bitfields.

    I agree that this is ugly and it’s ridiculous that we can’t do better with portable C code, but it’s unlikely to change, and portable code is better than the syntactic convenience of bitfield syntax.

  26. I’m going to have to disagree with quite a few folks here.

    Bitfields are essential in some areas of embedded programming. If you don’t know why, then you’re not doing the kind of work that I’m doing. When your goal is not portability, but instead provability, then it is far easier to use bitfields. In these cases, the compiler is a qualified tool. The compiler vendor will never change the documented behavior without ample warning and a switch to make it behave the way it used to. It takes decades for the defaults to change. One of the compilers I’m using today doesn’t support C99 unless you explicitly enable it on the command line. It is much more difficult for the static analysis tools to verify the correctness of masks and shifts than it is for the bit fields. If your static analysis tool isn’t happy, you have to verify the code manually. This can cost millions of dollars.

    I’ve been using bit fields in C since 1982 for device drivers, embedded run time executives, etc. I’ve also used masks and shifts, both explicitly and hidden in function style macros. It depends on the application. If you are confident of your toolchain and understand how it orders and packs bitfields, and are aware of the portability issues of using bitfields, I say go ahead and use them. The compiler turns bitfield accesses into masks and shifts (unless you have a machine with bitfield instructions, in which case you don’t want to use masks and shifts if you can help it). The compiler does a lot of work for you, which is better than the other way around. Note that the optimization in the compiler can make bit fields more efficient than most mask and shift code. I think the abstraction that bit fields provide make the code easier to read and understand. Most symbolic debuggers don’t back-translate #defines, so it’s also easier to debug C with bit fields.

    When I have software with bit fields that I need to port across machines or toolchains, I add or enable a bit of “porting check” code that declares objects of each bit field stuct in a union, sets the individual fields of each type and compares the bit patterns to the desired pattern for that value in that bit field. If the desired bit pattern is not present, an “assert()” fails. If you’ve done this right (and you only have to do it once unless you change the template on which the bit fields are based), the porting problem is not hard to deal with. The undefined nature of unions of structs and basic types is a matter of the standard. No compiler I’ve ever used for embedded work has been ill-behaved, and I’ve been doing this since well before the ANSI/ISO standard was in committee.

    It is too bad that this is implementation defined behavior, but I know the committee politics that lead to such things being left open — I’ve been on ANSI committees.

    • I have to agree 100% with Daniel. I’ve been doing embedded C since the 80’s and bitfields can be perfect.

      Meanwhile, as far as portability is concerned, I have to ask people here what is the size of an int? 8 bits, 16 bits, 32 bits or 64 bits? I’ve worked on machines whose C supported all of the above. That’s NOT even getting into endian-ness.

      And you are concerned about bitfields? Please, I think by this time I have figured out how to manipulate bits to send to the outside world.

  27. For those complaining about “portability” let us talk about int.

    Int is not portable. Whether it is endian-ness or size.

    If you are going to complain about bit-fields being non-portable, then you have to complain about endian-ness (and, yes Virginia, endian-ness DOES matter).

  28. IMHO, the way bitfields are presently handled in C makes no sense as a mandatory feature, since they are just about useless without some guarantees beyond what the Standard provides. On the other hand, they could be a very useful feature in *portable* code with one simple addition: a means of instructing the compiler to interpret a structure member as an alias for a range of bits in another member, thus allowing e.g.

    struct woozler {
    uint16_t CTRL;
    unsigned ENABLE = CTRL.0:1;
    unsigned PRESCALE = CTRL.4:4; // Four bits starting at #4
    unsigned DIVISOR = CTRL.8:8
    }

    and maybe even something like:

    struct woozler {
    unsigned char b0,b1,b2,b3L;
    unsigned long as_bigend32 = {b3:0.8, b2:0.8, b1.0:8, b0.0:8};
    unsigned long as_litend32 = {b0:0.8, b1:0.8, b2.0:8, b3.0:8};
    }

    with the latter assembling bigend32 by concatenating groups of 8 bits from each of b3..b0.

    Allowing code to specify where within an object the bits should be placed would mean that any bitfield declaration written in such fashion could only have one meaning on any platform that could handle it. The inability of some platforms might not be able to handle some possible declarations could be accommodated by making the range of declarations an implementation could accept a Quality of Implementation issue. Note that declarations like the above would allow platforms with weird “char” sizes to handle octet-based code by specifying that as_bigend32 takes the lower 8 bits from each of the constituent bytes. A compiler for such platforms may need to generate a bunch of ugly shifts in machine code, but that would be better than requiring programmers to assemble numbers using a bunch of ugly shifts in source code and hope that compilers can optimize them away on platforms where they’re not needed.

  29. Note that `#define FLAG_COMPAT (1 << 31)` is incorrect as `1 << 31` has undefined behavior on 32 bit systems. You should use `1U << 31`

Leave a Reply to ZungBang Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.