{"id":250,"date":"2009-06-23T00:01:46","date_gmt":"2009-06-23T08:01:46","guid":{"rendered":"http:\/\/www.pagetable.com\/?p=250"},"modified":"2009-06-23T00:01:46","modified_gmt":"2009-06-23T08:01:46","slug":"readable-bitfields-in-c","status":"publish","type":"post","link":"https:\/\/www.pagetable.com\/?p=250","title":{"rendered":"Readable and Maintainable Bitfields in C"},"content":{"rendered":"<p>Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this.<\/p>\n<p>The straightforward way to deal with bitfields is to do the Boolean logic by hand:<\/p>\n<h3>Boolean Magic<\/h3>\n<pre>\n#define FLAG_USER   (1 << 0)\n#define FLAG_ZERO   (1 << 1)\n#define FLAG_FORCE  (1 << 2)\n\/* bits 3-30 reserved *\/\n#define FLAG_COMPAT (1 << 31)\n\nint\ncreate_object(int flags)\n{\n        int is_compat = (flags &#038; FLAG_COMPAT);\n\n        if (is_compat)\n                flags &#038;= ~FLAGS_FORCE;\n\n        if (flags &#038; FLAG_FORCE) {\n                [...]\n        }\n        [...]\n}\n\nint\ncreate_object_zero(int flags)\n{\n\tcreate_object(flags | FLAGS_ZERO);\n}\n\nvoid\ncaller()\n{\n        create_object(FLAG_FORCE | FLAG_COMPAT);\n}\n<\/pre>\n<p>You can see code like this everywhere, so most C programmers can probably read and understand this quite easily. But unfortunately, this method is very error-prone. Mixing up \"&\" and \"&&\" and omitting the \"~\" when doing \"&=\" are common oversights, and since the compiler only sees \"int\" types, this also doesn't give you any kind of type-safety.<\/p>\n<h3>Bitfields<\/h3>\n<p>Let us look at the same code using bitfields instead:<\/p>\n<pre>\ntypedef unsigned int boolean_t;\n#define FALSE 0\n#define TRUE !FALSE\ntypedef union {\n        struct {\n                boolean_t user:1;\n                boolean_t zero:1;\n                boolean_t force:1;\n                int :28;                \/* unused *\/\n                boolean_t compat:1;     \/* bit 31 *\/\n        };\n        int raw;\n} flags_t;\n\nint\ncreate_object(flags_t flags)\n{\n        boolean_t is_compat = flags.compat;\n\n        if (is_compat)\n                flags.force = FALSE;\n\n        if (flags.force) {\n                [...]\n        }\n        [...]\n}\n\nint\ncreate_object_zero(flags_t flags)\n{\n\tflags.zero = TRUE;\n\tcreate_object(flags);\n}\n\nvoid\ncaller()\n{\n        create_object((flags_t) { { .force = TRUE, .compat = TRUE } });\n}\n<\/pre>\n<p>Flags can just be used like any variables. The compiler abstracts the Boolean logic away. The only downside is that the code with the static initializer requires some advanced syntax.<\/p>\n<h3>Endianness<\/h3>\n<p>Bitfields in C always start at bit 0. While this is the least significant bit (LSB) on Little Endian (bit 0 has a weight of 2^0), most compilers on Big Endian systems inconveniently consider the most significant bit (MSB) bit 0 (bit 0 has a weight of 2^31, 2^63 etc. depending on the word size), so in case your bitfield needs to be binary-compatible across machines with different endianness, you will need to define two versions of the bitfield.<\/p>\n<h3>The Raw Bitfield<\/h3>\n<p>Did you notice the \"int raw\" in the union? It lets us conveniently (and type-safely) export the raw bit value without having to use a cast.<\/p>\n<pre>\n\tprintf(\"raw flags: 0x%xn\", flags.raw);\n<\/pre>\n<p>We can use this to reconstruct the FLAG_* constants in the original example, in case some code needs it:<\/p>\n<pre>\n#define FLAG_USER   (((flags_t) { { .user   = TRUE } }).raw)\n#define FLAG_ZERO   (((flags_t) { { .zero   = TRUE } }).raw)\n#define FLAG_FORCE  (((flags_t) { { .force  = TRUE } }).raw)\n#define FLAG_COMPAT (((flags_t) { { .compat = TRUE } }).raw)\n<\/pre>\n<p>This code constructs a temporary instance of the bitfield, sets one bit, and converts it into a raw integer - all at compile time.<\/p>\n<h3>Bitfield Access from Assembly<\/h3>\n<p>With the same trick, you can also access your bitfield from assembly, for example if the bitfield is part of the Thread Control Block in your operating system kernel, and the low level context switch code needs to access one of the bits. The \"int raw\" can be used to statically convert a flag into the corresponding raw mask:<\/p>\n<pre>\ntypedef unsigned int boolean_t;\n\ntypedef union {\n\tstruct {\n\t\tboolean_t bit0:1;\n\t\tboolean_t bit1:1;\n\t\tint :19;\n\t\tboolean_t bit31:1;\n\t};\n\tint raw;\n} bitfield_t;\n\n\nint test()\n{\n\tint param = -1;\n\tint result;\n\n\t__asm__ volatile (\n\t\t\"test    %2, %1    n\"\n\t\t\"xor     %0, %0    n\"\n\t\t\"setcc   %0        n\"\n\t\t: \"=r\" (result)\n\t\t: \"r\" (param),\n\t\t  \"i\" (((bitfield_t) { { .bit31 = TRUE } }).raw)\n\t);\n\treturn result;\n}\n<\/pre>\n<p>The corresponding x86 assembly code looks like this:<\/p>\n<pre>\n\t.text\n\t.align\t4,0x90\n\t.globl\t_test\n_test:\n\tpushl\t%ebp\n\tmovl\t%esp, %ebp\n\tmovl\t$-1, %eax\n\t## InlineAsm Start\n\ttest\t$0x80000000, %eax\n\txor\t%eax, %eax\n\tsetcc\t%eax\n\t## InlineAsm End\n\tpopl\t%ebp\n\tret\n\n\t.subsections_via_symbols\n<\/pre>\n<p>This works fine with LLVM, but unfortunately GCC (4.2.1) has problems detecting that the raw value is available at compile time, so the \"i\" has to be replaced with an \"r\": GCC will then pre-assign a register with the raw value instead of being able to use an immediate with the \"test\" instruction.<\/p>\n<h3>How to <i>Not<\/i> Do It<\/h3>\n<p>I have seen C++ code doing this:<\/p>\n<pre>\nenum {\n\tFLAG_USER,\n\tFLAG_ZERO,\n\tFLAG_FORCE,\n\tFLAG_COMPAT = 31\n}\n\nint\ncreate_object(bitfield_t flags)\n{\n        bool is_compat = flags.is_set(FLAG_COMPAT);\n\n        if (is_compat)\n                flags -= FLAGS_FORCE;\n\n        if (flags.is_set(FLAG_FORCE)) {\n                [...]\n        }\n        [...]\n}\n\nint\ncreate_object_zero(int flags)\n{\n\tcreate_object(flags + FLAGS_ZERO);\n}\n\nvoid\ncaller()\n{\n        create_object(((bitfield_t)FLAG_FORCE) + FLAG_COMPAT);\n}\n<\/pre>\n<p>This all looks quite weird. The constants are bit index values, and they are added and subtracted. The reason is C++ operator overloading:<\/p>\n<pre>\nclass bitmask_t\n{\n    word_t      maskvalue;\n\npublic:\n    [...]\n    inline bitmask_t operator -= (int n)\n        {\n            maskvalue = maskvalue & ~(1UL << n);\n            return (bitmask_t) maskvalue;\n        }\n    [...]\n}\n<\/pre>\n<p>This is horrible. The code that uses this class makes no sense unless you read and understand the implementation of the class. And you have to be very careful: While it is possible to \"add\" a flag to an existing bitfield, you cannot just add two flags - it would do the arithmetic and add the two values.<\/p>\n<p>Mapping the setting and clearing of bits onto the addition and subtraction operators is clearly wrong in the first place: Flags in a bitfield are equivalent to elements in a set. Setting a flag is equivalent to the \"union\" operation, which even in Mathematics has its own symbol instead of overloading the \"+\" operator.<\/p>\n<h3>Question<\/h3>\n<p>If you compile code that does something like \"((bitfield_t) { { .bit31 = TRUE } }).raw\" with GCC in C++ mode, it fails. Why?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this. The straightforward way to deal with bitfields is to do the Boolean &#8230; <a title=\"Readable and Maintainable Bitfields in C\" class=\"read-more\" href=\"https:\/\/www.pagetable.com\/?p=250\" aria-label=\"Read more about Readable and Maintainable Bitfields in C\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26,32],"tags":[],"class_list":["post-250","post","type-post","status-publish","format-standard","hentry","category-puzzle","category-tricks"],"_links":{"self":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=250"}],"version-history":[{"count":0,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/250\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=250"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}