When compilers get overzealous

We have seen the compilers are crazy smart. But smart is a neighbor of too smart and that is a neighbor of dangerous! So, with the crazy improvements in compiler optimizations some dangerous edge cases have found their way in, which can show us the dark side of compilers.

The first (and maybe biggest) of these problems is the treatment of C++'s undefined behavior (UB) by compilers. The C and C++ standards define some program errors as UB and do not require compilers to diagnose or report it. The behavior stays undefined, which means it can be anything whatsoever. Hence, the old joke that on UB your program might format your hard drive and send you an email. Examples of some common undefined behaviors are as follows:

  • Null pointer dereference
  • Out of bound read
  • Signed integer overflow
  • Modification of a scalar more than once in a single exception

Now, what has that to do with dangerous optimizations? These are simple runtime errors, aren't they? Yes, in the old days UB amounted to HW-specific behavior, that is, normally to a crash during runtime. But recently, compilers started to exploit their license to optimize to the fullest and to treat malformed programs as impossible programs, thus assuming that programs cannot ever display UB. So, consequently, compilers started silently to remove the offending code.

Now, think about irony of that—a compiler doesn't have to diagnose UB, which was intended to simplify the compilers' jobs, but they do that anyway, but keep the knowledge to themselves. But compilers don't do that because they are mischievous; the impossibility assumption is used to simplify the optimization logic. 

One well-known example of UB in action, recently seen on the Stack Overflow website, is this loop:

std::complex<int> delta;
std::complex<int> mc[4]= {0};
for (int di=0; di < 4; di++, delta = mc[di])
{
printf("%i\n", di);
}

 This loop should run four times, but runs endlessly instead. Here, the mc[i] access after i was incremented is UB, so the whole loop test was removed by the GCC 8.2 compiler! However, other compilers didn't spot it and compiled the loop normally. Another well-known story is GCC removing a check for a null pointer in Linux kernel code (check it out!).

But, UB isn't the only surprise optimizers have got up their sleeves. A second problem is that a compiler is entitled to remove all code which doesn't have any visible effect. A very famous example of that is the case of the disappearing memset() function, as shown in the following code:

void GetData(const char* RemoteAddr)
{
char passwd[64];
if(AskForUserPassd(passwd, sizeof(passwd)))
{
// get data from remote ..
}
// erase the passwd form cache!
memset(passwd, 0, sizeof(passwd));
}

Here, we were trying to zero out the password buffer to remove it from memory to guard off against side-attacks and the reading of memory crash dumps. Unfortunately, for the optimizer, this is nothing but a dead store write, that is, writing to a memory location that won't be read by anyone, and it can be optimized away! This is an open problem and there are even special functions such as SecureZeroMemory() on Windows, explicit_bzero() in FreeBSD, and memset_s in the C11 standard library to be used in such cases. Their specifications directly disallow dead store optimization for those functions.