The little bool of doom

2025-01-28

I must confess I have a bit of a soft spot for classic DOOM. Despite 31 long years, the game is still mighty fun to play yourself (admittedly, I rather suck at it) or to watch others play it (this one I'm better at); and with the source code being available, you can enjoy it on every modern platform – be it desktop, smartphone, digital camera, oscilloscope, or anything else you can imagine. As a result of this, through various circumstances, I came to maintain several DOOM-related packages in Fedora Linux.

Now, a few months before each new release, the Fedora Linux project performs a mass rebuild of all its packages. This has a few benefits, such as ensuring ABI compatibility, updating statically-linked dependencies, making use of new compiler optimisations/code hardening options, and so on. Either way, with Fedora Linux 42 slated to be released mid-April, the time for the Mass Rebuild has come – and, as it often happens, not all of my packages made it through. One of those that failed the rebuild was chocolate-doom.

Two times false does not make right

Well, all right. The first step to finding out what happened was to check the build logs.

gcc -DHAVE_CONFIG_H -I. -I../.. -I../../src -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -O2 -g -Wall -Wdeclaration-after-statement -Wredundant-decls -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/libpng16 -DWITH_GZFILEOP -I/usr/include/pipewire-0.3 -I/usr/include/spa-0.2 -D_REENTRANT -I/usr/lib64/pkgconfig/../../include/dbus-1.0 -I/usr/lib64/pkgconfig/../../lib64/dbus-1.0/include -I/usr/include/libinstpatch-2 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-6 -pthread -I/usr/include/opus -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -c -o deh_bexstr.o deh_bexstr.c In file included from ../../src/sha1.h:21, from ../../src/deh_defs.h:21, from deh_bexstr.c:22: ../../src/doomtype.h:113:5: error: cannot use keyword ‘false’ as enumeration constant 113 | false, | ^~~~~ ../../src/doomtype.h:113:5: note: ‘false’ is a keyword with ‘-std=c23’ onwards

Oh, so this was a compilation error. Having received a lot of flak over the years about cryptic error messages, GCC spent the last few years vastly improving on this front; and now, the error message I got, together with the note that followed it, made the issue clear. Inside the engine code, chocolate-doom declares its own boolean type:

#if defined(__cplusplus) || defined(__bool_true_false_are_defined)   typedef bool boolean;   #else   typedef enum { false, true } boolean; #endif

So, when treating the sources as C++, it uses the C++ bool type, whereas when treating the sources as C, it ships its own, custom type. This worked fine with older C standards, as C89 did not contain a boolean type at all, whereas the type introduced by C99 was named _Bool – though you could include the <stdbool.h> header, which declared bool, true and false macros to make things prettier. Come C23, _Bool is renamed to bool, and all three of bool, true and false are now proper keywords.

Ok, so it makes perfect sense that this custom boolean type clashes with the false and true keywords – but begs the question, why did it fail now, when it worked perfectly fine a few months ago? Why does it build as C23?

A fairly standard change

As I wrote a few paragraphs ago, one of the goals of the Mass Rebuild is to ensure that code inside the distro can still be built with a modern compiler. If you look at GCC's version history, you'll notice that for the last decade, the release schedule seems to follow a pattern of introducing a new major version once a year, somewhere between April and May – which fits the Fedora Linux release schedule quite nicely, and makes Fedora a great place to test GCC pre-releases on a large, varied codebase.

This time was no different, with GCC 15.0.1 landing in Fedora Rawhide (the "eternal alpha" development version) just a few hours before the Mass Rebuild began. Among this versions's changes, there is one relevant to my problem – the default C standard has been changed from -std=gnu17 to -std=gnu23. And sure enough, if you go back to the build log and carefully go over the compiler invocation, you'll notice that there's no -std= option to explicitly set the standard.

Ok, so what now? Thinking about it for a bit, there seemed to be three solutions to this problem:

  1. Explicitly set the C standard to C17 or older, so the code is built using the custom boolean type.

  2. Modify the code by changing the #ifdef, so the compiler uses the built-in bool type when in C23 mode.

  3. Rename the enum variants to False and True and modify all the usages accordingly.

Option 1) seemed like the easiest one, but it also felt a bit like kicking the can down the road – plus, it introduced the question of which standard to use. Option 2) was fairly straightforward to implement, and arguably, it made for more idiomatic code. Option 3) was probably the safest one, but also most tedious. After brief deliberation, I went with the second one.

--- a/src/doomtype.h +++ b/src/doomtype.h @@ -100,9 +100,9 @@   #include <inttypes.h>   -#if defined(__cplusplus) || defined(__bool_true_false_are_defined) +#if defined(__cplusplus) || defined(__bool_true_false_are_defined) || (__STDC_VERSION__ >= 202311L)   -// Use builtin bool type with C++. +// Use builtin bool type with C++ and C23.   typedef bool boolean;

The patch was easy to implement, and after applying it, the Fedora package built successfully. Satisfied with my quick and easy fix, I opened a pull request upstream.

The engine goes boom

My proposal prompted some debate between the maintainers, who eventually decided that the best course of action will be to declare the project as written using C99. One of them cooked up another pull request:

--- a/src/doomtype.h +++ b/src/doomtype.h @@ -99,12 +99,11 @@ // standard and defined to include stdint.h, so include this.   #include <inttypes.h> +#include <stdbool.h>   #if defined(__cplusplus) || defined(__bool_true_false_are_defined)   -// Use builtin bool type with C++. - -typedef bool boolean; +typedef int boolean;   #else

The added #include made perfect sense – with C99, <stdbool.h> is guaranteed to exist and it should provide definitions for bool, true and false. However, the change in the typedef was curious – it meant that despite the switch to C99, the code would still store boolean values as integers, instead using of the proper bool/_Bool type. This prompted another short discussion:

<turol>
Explicitly setting the C standard is fine but we shouldn't change the includes or boolean type.

<fabiangreffrath>
Chocolate Doom doesn't even start if we keep #typedef bool boolean.

<suve>
What do you mean by "doesn't even start"? Crashes right away? Including <stdbool.h> adds the #define bool _Bool macro, so keeping typedef bool boolean means you're using the C99 _Bool type.

<fabiangreffrath>
It fails with: R_InitSprites: Sprite TROO frame I has rotations and a rot=0 lump.

Huh. Apparently using _Bool for boolean values dooms the engine to exit with an error during startup. Interesting! Let's try debugging. From the error message, one can find the place in the code where the error occurs:

if (sprtemp[frame].rotate == false) I_Error ("R_InitSprites: Sprite %s frame %c has rotations " "and a rot=0 lump", spritename, 'A'+frame);

What exactly's happening here? I'll skip on pasting more code snippets and just give you a quick rundown:

All right. So the problem is this: when boolean is a custom enum type, the code works as expected and the error condition is false; but when using _Bool, the condition evaluates as true and exit-on-error is triggered. Sounds rather sketchy... Let's try poking at memory using gdb. I'll start with the working version.

$ gdb ./build/src/chocolate-doom--enum [...] (gdb) break src/doom/r_things.c:138 Breakpoint 1 at 0x44f822: file chocolate-doom/src/doom/r_things.c, line 138. (gdb) run [...] Thread 1 "chocolate-doom" hit Breakpoint 1, R_InstallSpriteLump (lump=1242, frame=0, rotation=1, flipped=false) at chocolate-doom/src/doom/r_things.c:138 138 if (sprtemp[frame].rotate == false) (gdb) print sprtemp[frame] $1 = {rotate = (true | unknown: 0xfffffffe), lump = {-1, -1, -1, -1, -1, -1, -1, -1}, flip = "\377\377\377\377\377\377\377\377"} (gdb) step 142 sprtemp[frame].rotate = true;

Okay, so our boolean value is filled with... uh, what? The notation was rather confusing for me at first, but it turns out that since the boolean type is an enum, gdb tries to be helpful by showing how the value can be constructed by bit-oring several legal values of the enum. (Except it doesn't really work in this case.) Either way, true is 1, and bit-oring that with the "unknown" value gives 0xffffffff – which suggests that the field is initialized by writing -1 to it at some earlier point. And indeed, looking through the backtrace and into the calling function, one can find a memset (sprtemp,-1, sizeof(sprtemp)); call.

Great, so at least that's one mystery solved. Back to our breakpoint, we have a simple comparison between 0xffffffff and 0x0, which evaluates as false. That makes sense. Let's take a look at the _Bool version, then.

$ gdb ./build/src/chocolate-doom--bool [...] (gdb) break src/doom/r_things.c:138 Breakpoint 1 at 0x44f822: file chocolate-doom/src/doom/r_things.c, line 138. (gdb) run [...] Thread 1 "chocolate-doom" hit Breakpoint 1, R_InstallSpriteLump (lump=1242, frame=0, rotation=1, flipped=false) at chocolate-doom/src/doom/r_things.c:138 138 if (sprtemp[frame].rotate == false) (gdb) print sprtemp[frame] $1 = {rotate = 255, lump = {-1, -1, -1, -1, -1, -1, -1, -1}, flip = "\377\377\377\377\377\377\377\377"}

The size of boolean is now smaller – down from 4 bytes to just 1 byte. Apart from that, we're in the exact same place as before – the field's been initialized to -1, and the program's about to check if 255 == 0.

(gdb) step 139 I_Error ("R_InitSprites: Sprite %s frame %c has rotations "

I, uh... I mean... w h a t ?

We need to go deeper

Yeah, so that was rather unexpected. Thinking that there may be some hidden interaction with another part of the code that I'm not seeing, I tried reproducing the issue in a smaller program.

#include <stdio.h> #include <string.h>   #ifdef DUPA #include <stdbool.h> typedef bool boolean; #else typedef enum { false, true } boolean; #endif   boolean some_var[30];   int main(void) { memset(some_var, -1, sizeof(some_var)); some_var[0] = false; some_var[1] = 500;   for(int i = 0; i <= 2; ++i) { if(some_var[i] == false) printf("some_var[%d] is false\n", i); if(some_var[i] == true) printf("some_var[%d] is true\n", i); printf("value of some_var[%d] is %d\n", i, some_var[i]); } return 0; }

Sure enough, the behaviour was consistent with what I saw inside the game:

$ gcc -o booltest ./booltest.c $ ./booltest some_var[0] is false value of some_var[0] is 0 some_var[1] is true value of some_var[1] is 1 value of some_var[2] is -1 $ gcc -DDUPA -o booltest ./booltest.c $ ./booltest some_var[0] is false value of some_var[0] is 0 some_var[1] is true value of some_var[1] is 1 some_var[2] is false some_var[2] is true value of some_var[2] is 255

Somehow, setting a _Bool value to 255 meant that it was true and false at the same time. I wasn't gonna get more answers by just juggling C code around, which meant that it was time to take a look at how things look at the assembly level. For this, I used the Godbolt compiler explorer.

A screenshot from the Goldbolt compiler explorer, showing the reproducer program compiled in two modes.

Ah-ha! The generated instructions were ever so slightly different. This would be great news, if it wasn't for me forgetting about one little detail: I have zero knowledge of x86 assembly.

Luckily for me, there's this nifty little thing called the Internet, where you can find answers to a lot of life's questions. After consulting some sources, I started translating the assembly instructions into plain Polish (for your convenience, though, I'll go with English). In all four scenarios, the code starts by loading some_value[i] into the eax register.

boolean is an enum, checking for true

  1. CMP eax, 1
    This one is straightforward – it compares the value of the register with 1. If the arguments are equal, the "zero flag" (ZF) is set; otherwise, the flag gets unset.

  2. JNE .L4
    This checks ZF and if it's unset, performs a jump, skipping over the printf() call.

Okay, so the behaviour is flipped – instead of "do stuff when variable equals 1", I got "skip over stuff when variable does not equal 1" – but that makes perfect sense in assembly. Overall, the code is exactly what I expected.

boolean is an enum, checking for false

  1. TEST eax, eax
    This calculates a bitwise AND of the register and itself. ZF is set if the result is zero and unset otherwise.

  2. JNE .L3
    If ZF is unset, this performs a jump and skips over the printf() call.

This time the assembly is slightly more funky, as instead of "compare with zero", I got "perform bitwise AND of the value and itself" – which I assume is a micro-optimisation of some sort. Still, the code effectively does what I'd expect it to – skipping the printf() on any non-zero value.

boolean is _Bool, checking for true

  1. TEST al, al
    This once again calculates a bitwise AND of the value and itself and sets ZF if the result is zero. al is an 8-bit register that corresponds to the 8 lowest bits of eax – this is done because the boolean values are now just 1 byte in size, instead of 4, like before.

  2. JE .L4
    Jump if ZF is set.

Now it's getting interesting! Looks like in this version, the logic is not "skip when variable does not equal 1", but rather "skip when variable equals 0".

boolean is _Bool, checking for false

  1. XOR eax, 1
    This performs a bitwise XOR of the value and 1. The first argument is re-used as the destination. Effectively, this will flip the lowest bit of the boolean value – changing 0 to 1, or 1 to 0.

  2. TEST al, al
    Once again, perform a bitwise AND and set ZF if the result is zero.

  3. JE .L3
    Jump if ZF is set.

Oh boy. So it flips the lowest bit, and then jumps if the post-flip value is zero. This means that the logic becomes "skip when variable equals 1".

Summing up

When boolean is an enum, the compiler does exactly what I expected – the == false condition checks if the value is equal to 0, and the == true condition checks if the value is equal to 1.

However, when boolean is actually _Bool, the == false check is transformed into != 1, and the == true check is transformed into != 0 – which makes perfect sense in the realm of boolean logic. But it also means that for a value of 255, hilarity ensures: since 255 is neither 0 nor 1, both conditions pass!

Do they really have a law for that?

Now that I knew what was happening, one last riddle to solve was why it was happening. I mean, sticking an invalid value into a type and getting a weird result isn't exactly unexpected, and since we're talking about C, I was fairly certain I've just ran into everyone's favourite aspect of the language – Undefined Behaviour. Compiling with UBSan quickly confirmed this hypothesis.

$ gcc -DDUPA -fsanitize=undefined -o booltest ./booltest.c $ ./booltest some_var[0] is false value of some_var[0] is 0 some_var[1] is true value of some_var[1] is 1 booltest.c:22:14: runtime error: load of value 255, which is not a valid value for type '_Bool' booltest.c:23:14: runtime error: load of value 255, which is not a valid value for type '_Bool' some_var[2] is true booltest.c:24:69: runtime error: load of value 255, which is not a valid value for type '_Bool' value of some_var[2] is 1

I could call it done at this point, but I was still rather curious as to which part of the standard was violated by the code above. I looked up a PDF version of the C99 standard and started skimming through. Unfortunately for me, Annex J, "Portability issues", which lists the many possible ways one can stumble into unspecified behaviour, undefined behaviour or implementation-defined behaviour, didn't explicitly say anything about invalid _Bool values.

After some more searching, I believe I've found my answer in section 6.2.6, "Representation of types". Paragraph 6.2.6.1.5 states the following:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

This fit my criteria – I had a _Bool object that did not represent a valid _Bool value, and it was being read as part of an expression that did not treat is as a char. So yeah, the code was very much in UB land; luckily, nasal demons spared my nostrils this time.

References

Share with friends

e-mail Google+ Hacker News LinkedIn Reddit Tumblr VKontakte Wykop Xing

Comments

Do you have some interesting thoughts to share? You can comment by sending an e-mail to blog-comments@svgames.pl.