The following article is highly inspired by the work of Will Dietz, Peng Li, John Regehr, and Vikram Adve: 'Understanding Integer Overflow in C/C++'.
The work has been sliced down to the core and amended with some notes.
Mathematically, n-bit two's complement arithmetic is congruent, modulo 2^n, to n-bit unsigned arithmetic for addition, subtraction, and the n least significant bits in multiplication. Both kinds of arithmetic "wrap around" at multiples of 2^n.
On modern processors n-bit signed and unsigned operations both have this well-defined behavior when an operation overflows: the result wraps around and condition code bits are set appropriately.
In contrast, integer overflows in C/C++ programs are subtle due to combination of complex and counter-intuitive rules in the language standards and undefined behaviors.
C an C++ have undefined semantics for signed overflow and shift past bitwidth: operations that are perfectly defined in other languages such as Java.
Wraparound operation with signed types have undefined behavior. Today's compiler may compile these overflows into correct code. But those overflows are time bombs because they remain latent until a compiler upgrade turns them into observable errors.
Inserted to implement a specific function.
Commomly found to implement cryptographic promitives, hash functions, pseudo random numbers generators, and to find the max value of a type.
/* Rotate right a 32-bit value by 4 bits */
u32rot = (u32 << (32 - 4) | u32 >> 4);
An overflow that is caused by a coding error (a bug).
uint8_t i;
for (i = 0; i < 255; i += 2) /* if i = 254 then i += 2 wraps back to 0 */
printf("%u\n", i); /* never ending loop */
An operation result evaluates to an expected value.
Example
uint8_t u8 = 0x80;
u8 >>= 1; /* Gives 0x40 */
or
UINT_MAX + 1; /* Gives 0 */
Note that well-defined doesn't mean "portable". Some well-defined values are implementation defined. Meaning that we can rely on a value but only within the context of a given compiler (or compiler version).
For example 0U - 1
is well-defined and evaluates to UINT_MAX
, but the
actual value of that constant is implementation defined.
Obviously is better to avoid rely on implementation defined behaviors.
According to the C99 standard, undefined behaviour is:
"Behavior, upon use of a non-portable or erroneous program construct or erroneous data, which this International Standard imposes no requirements"
Example
uint32_t u32 = 1;
u32 <<= 32; /* Undefined */
or
INT_MAX + 1; /* Undefined, commonly INT_MIN (overflow in sign bit) */
(char) INT_MAX; /* Undefined, commonly -1 */
Silent Breakage
When programs execute undefined operations, optimizing compilers may silently break them in non-obvious and not necessarily consistent ways.
int foo(int x) { return (x + 1) > x; }
int main(void)
{
printf("%d\n", (INT_MAX + 1) > INT_MAX);
printf("%d\n", foo(INT_MAX));
return 0;
}
Compiling without optimizations the results are consistently: 0 and 0.
Compiling with -O2
optimizations an inconsistent answer is produced: 0 and 1.
Time bombs
Code that works under today's compilers, but may break in future versions.
Bogus Predictability
Predictable behavior for some undefined operations only under some optimization levels.
Informal Dialects
Support for stronger semantics than are mandated by the standard.
Non-Standard Standards
Some kinds of overflow have changed meaning across different versions of
the standards. For example 1<<31
is implementation-defined in C89 and
C++98, while is explicitly undefined by C99 and C11 (assuming 32-bit ints).
In integer arithmetic operations every integer type is eventually promoted to int before performing the operation.
Checking for overflows in shift operations is pretty straightforward;
A << B
gives an overflow if (A & ~((1 << (bitsize(A) - B)) - 1)) != 0
For example: if B
is 2
and bitsize(A)
is 8
then
~((1 << (bitsize(A) - B)) - 1)
is equal to 11000000
(binary)
Similar check can be performed to detect an overflow on the right side.
int8_t i8 = 1; /* Hex values are unsigned, warning */
i8 >>= 8; /* No warnings due to type promotion */
int32_t i32 = 1;
i32 >>= 32; /* Waning: right shift count >= width of type */
For types with a size greather than the size of an int
, if the shift
count is greater that the width of the type, then the shift value is taken
modulo the size, in bits, of the destination type.
i32 = 2;
i32 >>= 32; /* gives 2 (same as >> 0)*/
i32 = 2;
i32 >>= (32 + 1); /* gives 1 (same as >>1) */
i32 = 2;
i32 >>= (32 + 2); /* gives 0 (same as >>2) */
The same happens with an int64_t
value, but modulo 64.
Note that there isn't, in any case, a binary digit rotation or rollover.
i32 = 1;
i32 >>= 2; /* gives 0 */
Exactly the same behavior happens in case of left shift.
To respect the arithmetic meaning of a right shift, that is division by 2, the right shift operator when applied to signed integer with the most significat bit set (negative values) the bit sign is extended.
i8 = -4; /* 11111100 (binary) */
i8 >>= 1; /* gives -2 = 11111110 (binary) */
Without sign extension the resul would be arithmetically wrong.
Addition or subtraction of two n-bit integers may require n+1 bits of precision.
(2^n - 1) + (2^n - 1) = 2^(n+1) - 2 = 2â‹…(2^n - 1)
Worst case example (binary)
1111 + 1111 = [1]1110
^carry
Precondition test
Given two signed integers i1 and i2, signed addition will wrap if and only if the following expression is true.
((i1 > 0) && (i2 > 0) && (i1 > (INT_MAX - i2))) ||
((i1 < 0) && (i2 < 0) && (i1 < (INT_MIN - i2)))
The above can be siplified in
((i2 > 0) && (i1 > (INT_MAX - i2))) || /* i1 > 0 is implicit */
((i2 < 0) && (i1 < (INT_MIN - i2))) /* i1 < 0 is implicit */
Unsigned test
This test can be done to detect overflows on addition of two unsigned integers
(u1 + u2) < u1
Multiplication of two n-bit integers may require 2n bit of precision.
(2^n - 1) â‹… (2^n - 1) = 2^(2n) - 2^(n+1) + 1 = 2^(n+1) â‹… (2^(n-1) - 1) + 1
1111 â‹… 1111 = [1110]0001
^carry
Precondition text
Given two signed integers one trivial test that can detect an overflow is (assuming i2 != 0)
x = i1 * i2;
if (i2 != 0 && x / i2 != i1) { overflow }
Separate hi/lo word
Given two 64 bit integers i1 and i2, in the worst case the result requires 128 bit. Such a huge width are not defined by standard C language. To detect overflow we can split the 64-bit integer in two 32-bit parts.
#define LO(x) ((x) >> 32)
#define HI(x) ((x) & ((1 << 32) - 1))
uint64_t x, s0, s1, s2, s3;
uint64_t h, l;
x = LO(i1) * LO(i2);
s0 = LO(x);
x = HI(i1) * LO(i2) + HI(x);
s1 = LO(x);
s2 = HI(x);
x = s1 + LO(i1) * HI(i2);
s1 = LO(x);
x = s2 + HI(i1) * HI(i2) + HI(x);
s2 = LO(x);
s3 = HI(x);
l = (s1 << 32) | s0; /* eventually truncated result */
h = (s3 << 32) | s2; /* carry */
CPU flag test
Most processor contain hardware support for detecting overflow. It is not possible to inspect processor flags in aportable ANSI C code.
Width extension postcondition
If an integer datatype with wider bitwidth than the values being operated on is available, overflow can be trivialy detected by converting i1 and i2 into the wider type before performing the operation and assign the result to a wider type temporary variable.
For addition the wider type must have at least n+1 bits. For multiplication it must have at least 2n bits.
Proudly self-hosted on a cheap Raspberry Pi