The little bool of doom
2025-01-28I must confess I have a bit of a soft spot for classic DOOM. Despite 31 long years, the game is still mighty fun to play yourself (admittedly, I rather suck at it) or to watch others play it (this one I'm better at); and with the source code being available, you can enjoy it on every modern platform – be it desktop, smartphone, digital camera, oscilloscope, or anything else you can imagine. As a result of this, through various circumstances, I came to maintain several DOOM-related packages in Fedora Linux.
Now, a few months before each new release, the Fedora Linux project performs a mass rebuild of all its packages.
This has a few benefits, such as ensuring ABI compatibility, updating statically-linked dependencies,
making use of new compiler optimisations/code hardening options, and so on. Either way, with Fedora Linux 42
slated to be released mid-April, the time for the Mass Rebuild has come – and, as it often happens,
not all of my packages made it through. One of those that failed the rebuild was chocolate-doom
.
Two times false does not make right
Well, all right. The first step to finding out what happened was to check the build logs.
gcc -DHAVE_CONFIG_H -I. -I../.. -I../../src -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -O2 -g -Wall -Wdeclaration-after-statement -Wredundant-decls -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/include/SDL2 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/libpng16 -DWITH_GZFILEOP -I/usr/include/pipewire-0.3 -I/usr/include/spa-0.2 -D_REENTRANT -I/usr/lib64/pkgconfig/../../include/dbus-1.0 -I/usr/lib64/pkgconfig/../../lib64/dbus-1.0/include -I/usr/include/libinstpatch-2 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-6 -pthread -I/usr/include/opus -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -c -o deh_bexstr.o deh_bexstr.c
In file included from ../../src/sha1.h:21,
from ../../src/deh_defs.h:21,
from deh_bexstr.c:22:
../../src/doomtype.h:113:5: error: cannot use keyword ‘false’ as enumeration constant
113 | false,
| ^~~~~
../../src/doomtype.h:113:5: note: ‘false’ is a keyword with ‘-std=c23’ onwards
Oh, so this was a compilation error. Having received a lot of flak over the years about cryptic
error messages, GCC spent the last few years vastly improving on this front; and now, the error
message I got, together with the note that followed it, made the issue clear. Inside the engine
code, chocolate-doom
declares its own boolean type:
#if defined(__cplusplus) || defined(__bool_true_false_are_defined)
typedef bool boolean;
#else
typedef enum
{
false,
true
} boolean;
#endif
So, when treating the sources as C++, it uses the C++ bool
type, whereas when treating
the sources as C, it ships its own, custom type. This worked fine with older C standards,
as C89 did not contain a boolean type at all, whereas the type introduced by C99 was named _Bool
– though you could include the <stdbool.h>
header, which declared bool
, true
and false
macros to make things prettier. Come C23, _Bool
is renamed to bool
, and all three
of bool
, true
and false
are now proper keywords.
Ok, so it makes perfect sense that this custom boolean type clashes
with the false
and true
keywords – but begs the question, why did it fail
now, when it worked perfectly fine a few months ago? Why does it build as C23?
A fairly standard change
As I wrote a few paragraphs ago, one of the goals of the Mass Rebuild is to ensure that code inside the distro can still be built with a modern compiler. If you look at GCC's version history, you'll notice that for the last decade, the release schedule seems to follow a pattern of introducing a new major version once a year, somewhere between April and May – which fits the Fedora Linux release schedule quite nicely, and makes Fedora a great place to test GCC pre-releases on a large, varied codebase.
This time was no different, with GCC 15.0.1
landing in Fedora Rawhide (the "eternal alpha" development version) just a few hours before
the Mass Rebuild began. Among this versions's changes,
there is one relevant to my problem – the default C standard has been changed from -std=gnu17
to -std=gnu23
.
And sure enough, if you go back to the build log and carefully go over the compiler invocation,
you'll notice that there's no -std=
option to explicitly set the standard.
Ok, so what now? Thinking about it for a bit, there seemed to be three solutions to this problem:
-
Explicitly set the C standard to C17 or older, so the code is built using the custom
boolean
type. -
Modify the code by changing the
#ifdef
, so the compiler uses the built-inbool
type when in C23 mode. - Rename the enum variants to
False
andTrue
and modify all the usages accordingly.
Option 1) seemed like the easiest one, but it also felt a bit like kicking the can down the road – plus, it introduced the question of which standard to use. Option 2) was fairly straightforward to implement, and arguably, it made for more idiomatic code. Option 3) was probably the safest one, but also most tedious. After brief deliberation, I went with the second one.
--- a/src/doomtype.h
+++ b/src/doomtype.h
@@ -100,9 +100,9 @@
#include <inttypes.h>
-#if defined(__cplusplus) || defined(__bool_true_false_are_defined)
+#if defined(__cplusplus) || defined(__bool_true_false_are_defined) || (__STDC_VERSION__ >= 202311L)
-// Use builtin bool type with C++.
+// Use builtin bool type with C++ and C23.
typedef bool boolean;
The patch was easy to implement, and after applying it, the Fedora package built successfully. Satisfied with my quick and easy fix, I opened a pull request upstream.
The engine goes boom
My proposal prompted some debate between the maintainers, who eventually decided that the best course of action will be to declare the project as written using C99. One of them cooked up another pull request:
--- a/src/doomtype.h
+++ b/src/doomtype.h
@@ -99,12 +99,11 @@
// standard and defined to include stdint.h, so include this.
#include <inttypes.h>
+#include <stdbool.h>
#if defined(__cplusplus) || defined(__bool_true_false_are_defined)
-// Use builtin bool type with C++.
-
-typedef bool boolean;
+typedef int boolean;
#else
The added #include
made perfect sense – with C99, <stdbool.h>
is guaranteed to exist
and it should provide definitions for bool
, true
and false
. However, the change
in the typedef
was curious – it meant that despite the switch to C99, the code
would still store boolean values as integers, instead using of the proper bool
/_Bool
type.
This prompted another short discussion:
<turol>
Explicitly setting the C standard is fine but we shouldn't change the includes or boolean type.<fabiangreffrath>
Chocolate Doom doesn't even start if we keep#typedef bool boolean
.<suve>
What do you mean by "doesn't even start"? Crashes right away? Including<stdbool.h>
adds the#define bool _Bool
macro, so keepingtypedef bool boolean
means you're using the C99_Bool
type.<fabiangreffrath>
It fails with:R_InitSprites: Sprite TROO frame I has rotations and a rot=0 lump
.
Huh. Apparently using _Bool
for boolean values dooms the engine
to exit with an error during startup.
Interesting! Let's try debugging. From the error message, one can find the
place in the code
where the error occurs:
if (sprtemp[frame].rotate == false)
I_Error ("R_InitSprites: Sprite %s frame %c has rotations "
"and a rot=0 lump", spritename, 'A'+frame);
What exactly's happening here? I'll skip on pasting more code snippets and just give you a quick rundown:
-
The code's inside a function,
R_InstallSpriteLump()
. -
frame
is an argument to the function. While it's not marked asconst
, it is not modified inside the function. -
sprtemp
is a global variable, holding an array of 29spriteframe_t
structs. - The
.rotate
field inside said struct is ofboolean
type.
All right. So the problem is this: when boolean
is a custom enum type, the code
works as expected and the error condition is false; but when using _Bool
,
the condition evaluates as true and exit-on-error is triggered. Sounds rather sketchy...
Let's try poking at memory using gdb
. I'll start with the working version.
$ gdb ./build/src/chocolate-doom--enum
[...]
(gdb) break src/doom/r_things.c:138
Breakpoint 1 at 0x44f822: file chocolate-doom/src/doom/r_things.c, line 138.
(gdb) run
[...]
Thread 1 "chocolate-doom" hit Breakpoint 1, R_InstallSpriteLump (lump=1242, frame=0, rotation=1, flipped=false) at chocolate-doom/src/doom/r_things.c:138
138 if (sprtemp[frame].rotate == false)
(gdb) print sprtemp[frame]
$1 = {rotate = (true | unknown: 0xfffffffe), lump = {-1, -1, -1, -1, -1, -1, -1, -1}, flip = "\377\377\377\377\377\377\377\377"}
(gdb) step
142 sprtemp[frame].rotate = true;
Okay, so our boolean
value is filled with... uh, what? The notation was rather confusing for me at first,
but it turns out that since the boolean
type is an enum, gdb
tries to be helpful by showing how the value
can be constructed by bit-oring several legal values of the enum. (Except it doesn't really work in this case.)
Either way, true
is 1, and bit-oring that with the "unknown" value gives 0xffffffff
– which suggests
that the field is initialized by writing -1
to it at some earlier point. And indeed, looking
through the backtrace and into the calling function, one can find a
memset (sprtemp,-1, sizeof(sprtemp));
call.
Great, so at least that's one mystery solved. Back to our breakpoint,
we have a simple comparison between 0xffffffff
and 0x0
, which evaluates as false.
That makes sense. Let's take a look at the _Bool
version, then.
$ gdb ./build/src/chocolate-doom--bool
[...]
(gdb) break src/doom/r_things.c:138
Breakpoint 1 at 0x44f822: file chocolate-doom/src/doom/r_things.c, line 138.
(gdb) run
[...]
Thread 1 "chocolate-doom" hit Breakpoint 1, R_InstallSpriteLump (lump=1242, frame=0, rotation=1, flipped=false) at chocolate-doom/src/doom/r_things.c:138
138 if (sprtemp[frame].rotate == false)
(gdb) print sprtemp[frame]
$1 = {rotate = 255, lump = {-1, -1, -1, -1, -1, -1, -1, -1}, flip = "\377\377\377\377\377\377\377\377"}
The size of boolean
is now smaller – down from 4 bytes to just 1 byte.
Apart from that, we're in the exact same place as before – the field's been initialized to -1
,
and the program's about to check if 255 == 0
.
(gdb) step
139 I_Error ("R_InitSprites: Sprite %s frame %c has rotations "
I, uh... I mean... w h a t ?
We need to go deeper
Yeah, so that was rather unexpected. Thinking that there may be some hidden interaction with another part of the code that I'm not seeing, I tried reproducing the issue in a smaller program.
#include <stdio.h>
#include <string.h>
#ifdef DUPA
#include <stdbool.h>
typedef bool boolean;
#else
typedef enum {
false,
true
} boolean;
#endif
boolean some_var[30];
int main(void) {
memset(some_var, -1, sizeof(some_var));
some_var[0] = false;
some_var[1] = 500;
for(int i = 0; i <= 2; ++i) {
if(some_var[i] == false) printf("some_var[%d] is false\n", i);
if(some_var[i] == true) printf("some_var[%d] is true\n", i);
printf("value of some_var[%d] is %d\n", i, some_var[i]);
}
return 0;
}
Sure enough, the behaviour was consistent with what I saw inside the game:
$ gcc -o booltest ./booltest.c
$ ./booltest
some_var[0] is false
value of some_var[0] is 0
some_var[1] is true
value of some_var[1] is 1
value of some_var[2] is -1
$ gcc -DDUPA -o booltest ./booltest.c
$ ./booltest
some_var[0] is false
value of some_var[0] is 0
some_var[1] is true
value of some_var[1] is 1
some_var[2] is false
some_var[2] is true
value of some_var[2] is 255
Somehow, setting a _Bool
value to 255 meant that it was true and false at the same time.
I wasn't gonna get more answers by just juggling C code around, which meant that it was
time to take a look at how things look at the assembly level.
For this, I used the Godbolt compiler explorer.
Ah-ha! The generated instructions were ever so slightly different. This would be great news, if it wasn't for me forgetting about one little detail: I have zero knowledge of x86 assembly.
Luckily for me, there's this nifty little thing called the Internet, where you can find answers
to a lot of life's questions. After consulting some sources, I started translating the assembly
instructions into plain Polish (for your convenience, though, I'll go with English).
In all four scenarios, the code starts by loading some_value[i]
into the eax
register.
boolean is an enum
, checking for true
-
CMP eax, 1
This one is straightforward – it compares the value of the register with 1. If the arguments are equal, the "zero flag" (ZF) is set; otherwise, the flag gets unset. JNE .L4
This checks ZF and if it's unset, performs a jump, skipping over theprintf()
call.
Okay, so the behaviour is flipped – instead of "do stuff when variable equals 1", I got "skip over stuff when variable does not equal 1" – but that makes perfect sense in assembly. Overall, the code is exactly what I expected.
boolean is an enum
, checking for false
-
TEST eax, eax
This calculates a bitwise AND of the register and itself. ZF is set if the result is zero and unset otherwise. JNE .L3
If ZF is unset, this performs a jump and skips over theprintf()
call.
This time the assembly is slightly more funky, as instead of "compare with zero",
I got "perform bitwise AND of the value and itself" – which I assume is a micro-optimisation of some sort.
Still, the code effectively does what I'd expect it to – skipping the printf()
on any non-zero value.
boolean is _Bool
, checking for true
-
TEST al, al
This once again calculates a bitwise AND of the value and itself and sets ZF if the result is zero.al
is an 8-bit register that corresponds to the 8 lowest bits ofeax
– this is done because theboolean
values are now just 1 byte in size, instead of 4, like before. JE .L4
Jump if ZF is set.
Now it's getting interesting! Looks like in this version, the logic is not "skip when variable does not equal 1", but rather "skip when variable equals 0".
boolean is _Bool
, checking for false
-
XOR eax, 1
This performs a bitwise XOR of the value and 1. The first argument is re-used as the destination. Effectively, this will flip the lowest bit of the boolean value – changing 0 to 1, or 1 to 0. -
TEST al, al
Once again, perform a bitwise AND and set ZF if the result is zero. JE .L3
Jump if ZF is set.
Oh boy. So it flips the lowest bit, and then jumps if the post-flip value is zero. This means that the logic becomes "skip when variable equals 1".
Summing up
When boolean
is an enum
, the compiler does exactly what I expected – the == false
condition
checks if the value is equal to 0, and the == true
condition checks if the value is equal to 1.
However, when boolean
is actually _Bool
, the == false
check is transformed into != 1
,
and the == true
check is transformed into != 0
– which makes perfect sense in the realm
of boolean logic. But it also means that for a value of 255, hilarity ensures:
since 255 is neither 0 nor 1, both conditions pass!
Do they really have a law for that?
Now that I knew what was happening, one last riddle to solve was why it was happening. I mean, sticking an invalid value into a type and getting a weird result isn't exactly unexpected, and since we're talking about C, I was fairly certain I've just ran into everyone's favourite aspect of the language – Undefined Behaviour. Compiling with UBSan quickly confirmed this hypothesis.
$ gcc -DDUPA -fsanitize=undefined -o booltest ./booltest.c
$ ./booltest
some_var[0] is false
value of some_var[0] is 0
some_var[1] is true
value of some_var[1] is 1
booltest.c:22:14: runtime error: load of value 255, which is not a valid value for type '_Bool'
booltest.c:23:14: runtime error: load of value 255, which is not a valid value for type '_Bool'
some_var[2] is true
booltest.c:24:69: runtime error: load of value 255, which is not a valid value for type '_Bool'
value of some_var[2] is 1
I could call it done at this point, but I was still rather curious as to which part of the
standard was violated by the code above. I looked up a PDF version of the C99 standard and started
skimming through. Unfortunately for me, Annex J, "Portability issues", which lists the many
possible ways one can stumble into unspecified behaviour, undefined behaviour or implementation-defined
behaviour, didn't explicitly say anything about invalid _Bool
values.
After some more searching, I believe I've found my answer in section 6.2.6, "Representation of types". Paragraph 6.2.6.1.5 states the following:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.
This fit my criteria – I had a _Bool
object that did not represent a valid _Bool
value,
and it was being read as part of an expression that did not treat is as a char
.
So yeah, the code was very much in UB land; luckily, nasal demons spared my nostrils this time.
Comments
Do you have some interesting thoughts to share? You can comment by sending an e-mail to blog-comments@svgames.pl.