Last week I was trying to add some testing code to libsecp256k1 and I was pulling out my hair trying to get it to work.
No amount of printf
was working to illuminate what I was doing wrong.
Finally, out of desperation, I thought I would do a quick check to see if there are any compiler bugs related to memcmp
, and lo and behold, I found GCC bug #95189: memcmp being wrongly stripped like strcmp.
Honestly this was a pretty horrifying bug to read about.
Under some circumstances GCC 9 and 10 will cause memcmp
to return an incorrect value when one of the inputs is statically known array that contains NULL
bytes.
As I rushed to recompile my computer system using GCC 8, I contemplated what the vast consequences of such a bug could be, and pondered how it was possible that computers could function at all.
However over the week, with the help of my colleagues, we managed to get a better understanding of the scope of the bug.
The bug can only convert non-zero values to zero values.
The static array needs to have a NULL
byte within the first 4 bytes.
Most importantly, the memcmp
result must not immediately be compared to 0
for equality or inequality, or any equivalent test.
A different code path is taken in the compiler in that case.
That explained why computers were still functioning.
I expect the vast majority of the uses of memcmp
does an immediate test for equality with 0
.
I still wondered though, how much code was being affected. My colleague Tim suggested that it would be possible to instrument GCC to emit a message when it was about to miscompile a program. Together we came up with a patch to GCC 9 and 10 that would print a debugging message. Once again, I recompiled my entire system, to see what GCC was miscompiling. This is what I found:
- https://github.com/unicode-org/icu/blob/4fb47b12a70737ee12326220e71c2d73c5ec658f/icu4c/source/common/uniset_props.cpp#L709
- https://github.com/xiph/flac/blob/ce6dd6b5732e319ef60716d9cc9af6a836a4011a/src/flac/decode.c#L1310
- https://github.com/torvalds/linux/blob/fb0155a09b0224a7147cb07a4ce6034c8d29667f/drivers/atm/zatm.c#L1172
- https://github.com/nss-dev/nss/blob/1f3746f5107535a47bb4e3969f561e1bd1314bab/gtests/pk11_gtest/pk11_chacha20poly1305_unittest.cc#L425
- https://github.com/GNOME/glib/blob/010569b3734f864fcf584f771915b78bd391eb5f/glib/tests/refstring.c#L70
- https://github.com/heimdal/heimdal/blob/7ae2dfd853c87f9cbecb6f399de4dad09bad4606/lib/gssapi/krb5/arcfour.c#L390, https://github.com/heimdal/heimdal/blob/7ae2dfd853c87f9cbecb6f399de4dad09bad4606/lib/gssapi/krb5/arcfour.c#L661, https://github.com/heimdal/heimdal/blob/7ae2dfd853c87f9cbecb6f399de4dad09bad4606/lib/gssapi/krb5/arcfour.c#L1279
- https://github.com/zeromq/libzmq/blob/22d218a182855f28038e865cb75bf5897ff0c786/tests/test_mock_pub_sub.cpp#L203
- https://github.com/pigoz/mplayer-svn/blob/8d651873a9eb193f5155ffb51ece206f187cf00f/sub/sub_cc.c#L391-L412
On my entire system I only found 10 lines of code that were miscompiled. Three lines are tests. All of the lines could be rewritten as a comparison to 0. None of the lines looked that serious. I am not sure which one is the worse: the reduced message integrity code(?) from some ARCFOUR implementation or the something something from an ATM driver?
The mplayer miscompilation is the most mysterious.
The code surrounding that function all appears to be immediately compare memcmp
with 0
.
And given that my debug message refused to point to exactly what line is being miscompiled in that function, I fear some set of optimizations has happened to allow this code to be miscompiled in some way.
With more hardware I could do a more thorough investigation of the consequences of this GCC bug. Until then I am going to stick with GCC 8 until GCC 9 and 10 have a new point releases.
Update: Thanks goes to Marc ‘risson’ Schmitt, who had more hardware. Please check out his results.