Integer Overflow Detection
Created:
2017-11-07
Updated:
2017-11-07
The following article is highly inspired by the work of Will Dietz, Peng Li, John Regehr, and Vikram Adve: ‘Understanding Integer Overflow in C/C++’.
The work has been sliced down to the core and amended with some notes.
Introduction
Mathematically, n-bit two’s complement arithmetic is congruent, modulo 2^n, to n-bit unsigned arithmetic for addition, subtraction, and the n least significant bits in multiplication. Both kinds of arithmetic “wrap around” at multiples of 2^n.
On modern processors n-bit signed and unsigned operations both have this well-defined behavior when an operation overflows: the result wraps around and condition code bits are set appropriately.
In contrast, integer overflows in C/C++ programs are subtle due to combination of complex and counter-intuitive rules in the language standards and undefined behaviors.
C an C++ have undefined semantics for signed overflow and shift past bit-width: operations that are perfectly defined in other languages such as Java.
Wraparound operation with signed types have undefined behavior. Today’s compiler may compile these overflows into correct code. But those overflows are time bombs because they remain latent until a compiler upgrade turns them into observable errors.
Taxonomy
Developer intentions categories
- Intentional
- Unintentional
Intentional
Inserted to implement a specific function.
Commonly found to implement cryptographic primitives, hash functions, pseudo random numbers generators, and to find the max value of a type.
// Rotate right a 32-bit value by 4 bits
u32rot = (u32 << (32 - 4) | u32 >> 4);
Unintentional
An overflow that is caused by a coding error (a bug).
uint8_t i;
for (i = 0; i < 255; i += 2) // if i = 254 then i += 2 wraps back to 0
printf("%u\n", i); // Never ending loop */
Behavior categories
- Well-defined
- Undefined
Well-defined behaviors
An operation result evaluates to an expected value.
Example
uint8_t u8 = 0x80;
u8 >>= 1; // Gives 0x40
or
UINT_MAX + 1; // Gives 0
Note that well-defined doesn’t mean “portable”. Some well-defined values are implementation defined. Meaning that we can rely on a value but only within the context of a given compiler (or compiler version).
For example 0U - 1
is well-defined and evaluates to UINT_MAX
, but the
actual value of that constant is implementation defined.
Obviously is better to avoid rely on implementation defined behaviors.
Undefined behaviors
According to the C99 standard, undefined behavior is:
“Behavior, upon use of a non-portable or erroneous program construct or erroneous data, which this International Standard imposes no requirements”
Example
uint32_t u32 = 1;
u32 = u32 << 32; // Undefined
or
INT_MAX + 1; // Undefined, commonly INT_MIN (overflow in sign bit)
(char) INT_MAX; // Undefined, commonly -1
Silent Breakage
When programs execute undefined operations, optimizing compilers may silently break them in non-obvious and not necessarily consistent ways.
int foo(int x) { return (x + 1) > x; }
int main(void)
{
printf("%d\n", (INT_MAX + 1) > INT_MAX);
printf("%d\n", foo(INT_MAX));
return 0;
}
Compiling without optimizations the results are consistently: 0 and 0.
Compiling with -O2
optimizations an inconsistent answer is produced: 0 and 1.
Time bombs
Code that works under today’s compilers, but may break in future versions.
Bogus Predictability
Predictable behavior for some undefined operations only under some optimization levels.
Informal Dialects
Support for stronger semantics than are mandated by the standard.
Non-Standard Standards
Some kinds of overflow have changed meaning across different versions of
the standards. For example 1<<31
is implementation-defined in C89 and
C++98, while is explicitly undefined by C99 and C11 (assuming 32-bit
integers).
Overflow Detection
In integer arithmetic operations every integer type is eventually promoted to int before performing the operation.
Shift
Checking for overflows in shift operations is pretty straightforward;
A << B
gives an overflow if (A & ~((1 << (bitsize(A) - B)) - 1)) != 0
For example: if B
is 2
and bitsize(A)
is 8
then
~((1 << (bitsize(A) - B)) - 1)
is equal to 11000000
(binary)
Similar check can be performed to detect an overflow on the right side.
Truncated shift count
int8_t i8 = 1; // Hex values are unsigned, warning
i8 = i8 >> 8; // No warnings due to type promotion
int32_t i32 = 1;
i32 = i32 >> 32; // Waning: right shift count >= width of type
For types with a size greater than the size of a int
, if the shift
count is greater that the width of the type, then the shift value is taken
modulo the size, in bits, of the destination type.
i32 = 2;
i32 = i32 >> 32; // gives 2 (same as >> 0)
i32 = 2;
i32 = i32 >> (32 + 1); // gives 1 (same as >>1)
i32 = 2;
i32 = i32 >> (32 + 2); // gives 0 (same as >>2)
The same happens with a int64_t
value, but modulo 64.
Note that there isn’t, in any case, a binary digit rotation or rollover.
i32 = 1;
i32 = i32 >> 2; // gives 0
Exactly the same behavior happens in case of left shift.
Sign extension
To respect the arithmetic meaning of a right shift, that is division by 2, the right shift operator when applied to signed integer with the most significant bit set (negative values) the bit sign is extended.
i8 = -4; // 11111100 (binary)
i8 = i8 >> 1; // gives -2 = 11111110 (binary)
Without sign extension the result would be arithmetically wrong.
Addition
Addition or subtraction of two n-bit integers may require n+1 bits of precision.
(2^n - 1) + (2^n - 1) = 2^(n+1) - 2 = 2⋅(2^n - 1)
Worst case example (binary)
1111 + 1111 = [1]1110
^carry
Precondition test
Given two signed integers i1 and i2, signed addition will wrap if and only if the following expression is true.
((i1 > 0) && (i2 > 0) && (i1 > (INT_MAX - i2))) ||
((i1 < 0) && (i2 < 0) && (i1 < (INT_MIN - i2)))
The above can be simplified in
((i2 > 0) && (i1 > (INT_MAX - i2))) || // i1 > 0 is implicit
((i2 < 0) && (i1 < (INT_MIN - i2))) // i1 < 0 is implicit
Unsigned test
This test can be done to detect overflows on addition of two unsigned integers
(u1 + u2) < u1
Multiplication
Multiplication of two n-bit integers may require 2n bit of precision.
(2^n - 1) ⋅ (2^n - 1) = 2^(2n) - 2^(n+1) + 1 = 2^(n+1) ⋅ (2^(n-1) - 1) + 1
1111 ⋅ 1111 = [1110]0001
^carry
Precondition text
Given two signed integers one trivial test that can detect an overflow is (assuming i2 != 0)
x = i1 * i2;
if (i2 != 0 && x / i2 != i1) { overflow }
Separate hi/lo word
Given two 64-bit integers i1 and i2, in the worst case the result requires 128 bit. Such a huge width are not defined by standard C language. To detect overflow we can split the 64-bit integer in two 32-bit parts.
#define LO(x) ((x) >> 32)
#define HI(x) ((x) & ((1 << 32) - 1))
uint64_t x, s0, s1, s2, s3;
uint64_t h, l;
x = LO(i1) * LO(i2);
s0 = LO(x);
x = HI(i1) * LO(i2) + HI(x);
s1 = LO(x);
s2 = HI(x);
x = s1 + LO(i1) * HI(i2);
s1 = LO(x);
x = s2 + HI(i1) * HI(i2) + HI(x);
s2 = LO(x);
s3 = HI(x);
l = (s1 << 32) | s0; // optionally truncated result
h = (s3 << 32) | s2; // carry
Generic detection
CPU flag test
Most processor contain hardware support for detecting overflow. It is not possible to inspect processor flags in a portable ANSI C code.
Width extension post-condition
If an integer datatype with wider bit-width than the values being operated on is available, overflow can be trivially detected by converting i1 and i2 into the wider type before performing the operation and assign the result to a wider type temporary variable.
For addition the wider type must have at least n+1 bits. For multiplication, it must have at least 2n bits.