Commit 1d9d5cbe authored by Nathan Bronson's avatar Nathan Bronson Committed by Facebook Github Bot 4

integer division with controlled rounding in Math.h

Summary:
C++'s integer division performs truncation, but it is fairly
common that people actually want to round up.  There are actually
four rounding modes that make some sense: toward negative infinity
(floor), toward positive infinity (ceil), toward zero (truncation),
and away from zero (?).  It is pretty common that code that wants ceil
actually writes (a + b - 1) / b, which doesn't work at all for negative
values (and has a potential overflow issue).  This diff adds 4 templated
functions for performing integer division: divFloor, divCeil, divTrunc,
and divRoundAway.  They are not subject to unnecessary internal underflow
or overflow, and they work correctly across their entire input domain.

I did a bit of benchmarking across x86_64, arm64, and 32-bit ARM.
Only 32-bit ARM was different.  That's not surprising since it doesn't
have an integer division instruction, and the function that implements
integer division doesn't produce the remainder for free.  On 32-bit ARM
a branchful version that doesn't need the modulus is used.

Reviewed By: yfeldblum

Differential Revision: D3806743

fbshipit-source-id: c14c56717e96f135321920e64acbfe9dcb1fe039
parent f6b5f78b
...@@ -254,6 +254,7 @@ nobase_follyinclude_HEADERS = \ ...@@ -254,6 +254,7 @@ nobase_follyinclude_HEADERS = \
MallctlHelper.h \ MallctlHelper.h \
Malloc.h \ Malloc.h \
MapUtil.h \ MapUtil.h \
Math.h \
Memory.h \ Memory.h \
MemoryMapping.h \ MemoryMapping.h \
MicroSpinLock.h \ MicroSpinLock.h \
......
/*
* Copyright 2016 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Some arithmetic functions that seem to pop up or get hand-rolled a lot.
* So far they are all focused on integer division.
*/
#pragma once
#include <stdint.h>
#include <limits>
#include <type_traits>
namespace folly {
namespace detail {
template <typename T>
inline constexpr T divFloorBranchless(T num, T denom) {
// floor != trunc when the answer isn't exact and truncation went the
// wrong way (truncation went toward positive infinity). That happens
// when the true answer is negative, which happens when num and denom
// have different signs. The following code compiles branch-free on
// many platforms.
return (num / denom) +
((num % denom) != 0 ? 1 : 0) *
(std::is_signed<T>::value && (num ^ denom) < 0 ? -1 : 0);
}
template <typename T>
inline constexpr T divFloorBranchful(T num, T denom) {
// First case handles negative result by preconditioning numerator.
// Preconditioning decreases the magnitude of the numerator, which is
// itself sign-dependent. Second case handles zero or positive rational
// result, where trunc and floor are the same.
return std::is_signed<T>::value && (num ^ denom) < 0 && num != 0
? (num + (num > 0 ? -1 : 1)) / denom - 1
: num / denom;
}
template <typename T>
inline constexpr T divCeilBranchless(T num, T denom) {
// ceil != trunc when the answer isn't exact (truncation occurred)
// and truncation went away from positive infinity. That happens when
// the true answer is positive, which happens when num and denom have
// the same sign.
return (num / denom) +
((num % denom) != 0 ? 1 : 0) *
(std::is_signed<T>::value && (num ^ denom) < 0 ? 0 : 1);
}
template <typename T>
inline constexpr T divCeilBranchful(T num, T denom) {
// First case handles negative or zero rational result, where trunc and ceil
// are the same.
// Second case handles positive result by preconditioning numerator.
// Preconditioning decreases the magnitude of the numerator, which is
// itself sign-dependent.
return (std::is_signed<T>::value && (num ^ denom) < 0) || num == 0
? num / denom
: (num + (num > 0 ? -1 : 1)) / denom + 1;
}
template <typename T>
inline constexpr T divRoundAwayBranchless(T num, T denom) {
// away != trunc whenever truncation actually occurred, which is when
// there is a non-zero remainder. If the unrounded result is negative
// then fixup moves it toward negative infinity. If the unrounded
// result is positive then adjustment makes it larger.
return (num / denom) +
((num % denom) != 0 ? 1 : 0) *
(std::is_signed<T>::value && (num ^ denom) < 0 ? -1 : 1);
}
template <typename T>
inline constexpr T divRoundAwayBranchful(T num, T denom) {
// First case of second ternary operator handles negative rational
// result, which is the same as divFloor. Second case of second ternary
// operator handles positive result, which is the same as divCeil.
// Zero case is separated for simplicity.
return num == 0 ? 0
: (num + (num > 0 ? -1 : 1)) / denom +
(std::is_signed<T>::value && (num ^ denom) < 0 ? -1 : 1);
}
template <typename N, typename D>
using IdivResultType = typename std::enable_if<
std::is_integral<N>::value && std::is_integral<D>::value &&
!std::is_same<N, bool>::value &&
!std::is_same<D, bool>::value,
decltype(N{1} / D{1})>::type;
}
#if defined(__arm__) && !FOLLY_A64
constexpr auto kIntegerDivisionGivesRemainder = false;
#else
constexpr auto kIntegerDivisionGivesRemainder = true;
#endif
/**
* Returns num/denom, rounded toward negative infinity. Put another way,
* returns the largest integral value that is less than or equal to the
* exact (not rounded) fraction num/denom.
*
* The matching remainder (num - divFloor(num, denom) * denom) can be
* negative only if denom is negative, unlike in truncating division.
* Note that for unsigned types this is the same as the normal integer
* division operator. divFloor is equivalent to python's integral division
* operator //.
*
* This function undergoes the same integer promotion rules as a
* built-in operator, except that we don't allow bool -> int promotion.
* This function is undefined if denom == 0. It is also undefined if the
* result type T is a signed type, num is std::numeric_limits<T>::min(),
* and denom is equal to -1 after conversion to the result type.
*/
template <typename N, typename D>
inline constexpr detail::IdivResultType<N, D> divFloor(N num, D denom) {
using R = decltype(num / denom);
return kIntegerDivisionGivesRemainder && std::is_signed<R>::value
? detail::divFloorBranchless<R>(num, denom)
: detail::divFloorBranchful<R>(num, denom);
}
/**
* Returns num/denom, rounded toward positive infinity. Put another way,
* returns the smallest integral value that is greater than or equal to
* the exact (not rounded) fraction num/denom.
*
* This function undergoes the same integer promotion rules as a
* built-in operator, except that we don't allow bool -> int promotion.
* This function is undefined if denom == 0. It is also undefined if the
* result type T is a signed type, num is std::numeric_limits<T>::min(),
* and denom is equal to -1 after conversion to the result type.
*/
template <typename N, typename D>
inline constexpr detail::IdivResultType<N, D> divCeil(N num, D denom) {
using R = decltype(num / denom);
return kIntegerDivisionGivesRemainder && std::is_signed<R>::value
? detail::divCeilBranchless<R>(num, denom)
: detail::divCeilBranchful<R>(num, denom);
}
/**
* Returns num/denom, rounded toward zero. If num and denom are non-zero
* and have different signs (so the unrounded fraction num/denom is
* negative), returns divCeil, otherwise returns divFloor. If T is an
* unsigned type then this is always equal to divFloor.
*
* Note that this is the same as the normal integer division operator,
* at least since C99 (before then the rounding for negative results was
* implementation defined). This function is here for completeness and
* as a place to hang this comment.
*
* This function undergoes the same integer promotion rules as a
* built-in operator, except that we don't allow bool -> int promotion.
* This function is undefined if denom == 0. It is also undefined if the
* result type T is a signed type, num is std::numeric_limits<T>::min(),
* and denom is equal to -1 after conversion to the result type.
*/
template <typename N, typename D>
inline constexpr detail::IdivResultType<N, D> divTrunc(N num, D denom) {
return num / denom;
}
/**
* Returns num/denom, rounded away from zero. If num and denom are
* non-zero and have different signs (so the unrounded fraction num/denom
* is negative), returns divFloor, otherwise returns divCeil. If T is
* an unsigned type then this is always equal to divCeil.
*
* This function undergoes the same integer promotion rules as a
* built-in operator, except that we don't allow bool -> int promotion.
* This function is undefined if denom == 0. It is also undefined if the
* result type T is a signed type, num is std::numeric_limits<T>::min(),
* and denom is equal to -1 after conversion to the result type.
*/
template <typename N, typename D>
inline constexpr detail::IdivResultType<N, D> divRoundAway(N num, D denom) {
using R = decltype(num / denom);
return kIntegerDivisionGivesRemainder && std::is_signed<R>::value
? detail::divRoundAwayBranchless<R>(num, denom)
: detail::divRoundAwayBranchful<R>(num, denom);
}
} // namespace folly
...@@ -12,6 +12,7 @@ TESTS= \ ...@@ -12,6 +12,7 @@ TESTS= \
conv_test \ conv_test \
expected_test \ expected_test \
range_test \ range_test \
math_test \
bits_test \ bits_test \
bit_iterator_test bit_iterator_test
...@@ -135,6 +136,9 @@ expected_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la ...@@ -135,6 +136,9 @@ expected_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la
range_test_SOURCES = RangeTest.cpp range_test_SOURCES = RangeTest.cpp
range_test_LDADD = libfollytestmain.la range_test_LDADD = libfollytestmain.la
math_test_SOURCES = MathTest.cpp
math_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la
bits_test_SOURCES = BitsTest.cpp bits_test_SOURCES = BitsTest.cpp
bits_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la bits_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la
......
This diff is collapsed.
/*
* Copyright 2016 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <folly/Math.h>
#include <algorithm>
#include <type_traits>
#include <utility>
#include <vector>
#include <glog/logging.h>
#include <gtest/gtest.h>
#include <folly/Portability.h>
using namespace folly;
using namespace folly::detail;
namespace {
// Workaround for https://llvm.org/bugs/show_bug.cgi?id=16404,
// issues with __int128 multiplication and UBSAN
template <typename T>
T mul(T lhs, T rhs) {
if (rhs < 0) {
rhs = -rhs;
lhs = -lhs;
}
T accum = 0;
while (rhs != 0) {
if ((rhs & 1) != 0) {
accum += lhs;
}
lhs += lhs;
rhs >>= 1;
}
return accum;
}
template <typename T, typename B>
T referenceDivFloor(T numer, T denom) {
// rv = largest integral value <= numer / denom
B n = numer;
B d = denom;
if (d < 0) {
d = -d;
n = -n;
}
B r = n / d;
while (mul(r, d) > n) {
--r;
}
while (mul(r + 1, d) <= n) {
++r;
}
T rv = static_cast<T>(r);
assert(static_cast<B>(rv) == r);
return rv;
}
template <typename T, typename B>
T referenceDivCeil(T numer, T denom) {
// rv = smallest integral value >= numer / denom
B n = numer;
B d = denom;
if (d < 0) {
d = -d;
n = -n;
}
B r = n / d;
while (mul(r, d) < n) {
++r;
}
while (mul(r - 1, d) >= n) {
--r;
}
T rv = static_cast<T>(r);
assert(static_cast<B>(rv) == r);
return rv;
}
template <typename T, typename B>
T referenceDivRoundAway(T numer, T denom) {
if ((numer < 0) != (denom < 0)) {
return referenceDivFloor<T, B>(numer, denom);
} else {
return referenceDivCeil<T, B>(numer, denom);
}
}
template <typename T>
std::vector<T> cornerValues() {
std::vector<T> rv;
for (T i = 1; i < 24; ++i) {
rv.push_back(i);
rv.push_back(std::numeric_limits<T>::max() / i);
rv.push_back(std::numeric_limits<T>::max() - i);
rv.push_back(std::numeric_limits<T>::max() / 2 - i);
if (std::is_signed<T>::value) {
rv.push_back(-i);
rv.push_back(std::numeric_limits<T>::min() / i);
rv.push_back(std::numeric_limits<T>::min() + i);
rv.push_back(std::numeric_limits<T>::min() / 2 + i);
}
}
return rv;
}
template <typename A, typename B, typename C>
void runDivTests() {
using T = decltype(static_cast<A>(1) / static_cast<B>(1));
auto numers = cornerValues<A>();
numers.push_back(0);
auto denoms = cornerValues<B>();
for (A n : numers) {
for (B d : denoms) {
if (std::is_signed<T>::value && n == std::numeric_limits<T>::min() &&
d == static_cast<T>(-1)) {
// n / d overflows in two's complement
continue;
}
EXPECT_EQ(divCeil(n, d), (referenceDivCeil<T, C>(n, d))) << n << "/" << d;
EXPECT_EQ(divFloor(n, d), (referenceDivFloor<T, C>(n, d))) << n << "/"
<< d;
EXPECT_EQ(divTrunc(n, d), n / d) << n << "/" << d;
EXPECT_EQ(divRoundAway(n, d), (referenceDivRoundAway<T, C>(n, d)))
<< n << "/" << d;
T nn = n;
T dd = d;
EXPECT_EQ(divCeilBranchless(nn, dd), divCeilBranchful(nn, dd));
EXPECT_EQ(divFloorBranchless(nn, dd), divFloorBranchful(nn, dd));
EXPECT_EQ(divRoundAwayBranchless(nn, dd), divRoundAwayBranchful(nn, dd));
}
}
}
}
TEST(Bits, divTestInt8) {
runDivTests<int8_t, int8_t, int64_t>();
runDivTests<int8_t, uint8_t, int64_t>();
runDivTests<int8_t, int16_t, int64_t>();
runDivTests<int8_t, uint16_t, int64_t>();
runDivTests<int8_t, int32_t, int64_t>();
runDivTests<int8_t, uint32_t, int64_t>();
#ifdef FOLLY_HAVE_INT128_T
runDivTests<int8_t, int64_t, __int128>();
runDivTests<int8_t, uint64_t, __int128>();
#endif
}
TEST(Bits, divTestInt16) {
runDivTests<int16_t, int8_t, int64_t>();
runDivTests<int16_t, uint8_t, int64_t>();
runDivTests<int16_t, int16_t, int64_t>();
runDivTests<int16_t, uint16_t, int64_t>();
runDivTests<int16_t, int32_t, int64_t>();
runDivTests<int16_t, uint32_t, int64_t>();
#ifdef FOLLY_HAVE_INT128_T
runDivTests<int16_t, int64_t, __int128>();
runDivTests<int16_t, uint64_t, __int128>();
#endif
}
TEST(Bits, divTestInt32) {
runDivTests<int32_t, int8_t, int64_t>();
runDivTests<int32_t, uint8_t, int64_t>();
runDivTests<int32_t, int16_t, int64_t>();
runDivTests<int32_t, uint16_t, int64_t>();
runDivTests<int32_t, int32_t, int64_t>();
runDivTests<int32_t, uint32_t, int64_t>();
#ifdef FOLLY_HAVE_INT128_T
runDivTests<int32_t, int64_t, __int128>();
runDivTests<int32_t, uint64_t, __int128>();
#endif
}
#ifdef FOLLY_HAVE_INT128_T
TEST(Bits, divTestInt64) {
runDivTests<int64_t, int8_t, __int128>();
runDivTests<int64_t, uint8_t, __int128>();
runDivTests<int64_t, int16_t, __int128>();
runDivTests<int64_t, uint16_t, __int128>();
runDivTests<int64_t, int32_t, __int128>();
runDivTests<int64_t, uint32_t, __int128>();
runDivTests<int64_t, int64_t, __int128>();
runDivTests<int64_t, uint64_t, __int128>();
}
#endif
TEST(Bits, divTestUint8) {
runDivTests<uint8_t, int8_t, int64_t>();
runDivTests<uint8_t, uint8_t, int64_t>();
runDivTests<uint8_t, int16_t, int64_t>();
runDivTests<uint8_t, uint16_t, int64_t>();
runDivTests<uint8_t, int32_t, int64_t>();
runDivTests<uint8_t, uint32_t, int64_t>();
#ifdef FOLLY_HAVE_INT128_T
runDivTests<uint8_t, int64_t, __int128>();
runDivTests<uint8_t, uint64_t, __int128>();
#endif
}
TEST(Bits, divTestUint16) {
runDivTests<uint16_t, int8_t, int64_t>();
runDivTests<uint16_t, uint8_t, int64_t>();
runDivTests<uint16_t, int16_t, int64_t>();
runDivTests<uint16_t, uint16_t, int64_t>();
runDivTests<uint16_t, int32_t, int64_t>();
runDivTests<uint16_t, uint32_t, int64_t>();
#ifdef FOLLY_HAVE_INT128_T
runDivTests<uint16_t, int64_t, __int128>();
runDivTests<uint16_t, uint64_t, __int128>();
#endif
}
TEST(Bits, divTestUint32) {
runDivTests<uint32_t, int8_t, int64_t>();
runDivTests<uint32_t, uint8_t, int64_t>();
runDivTests<uint32_t, int16_t, int64_t>();
runDivTests<uint32_t, uint16_t, int64_t>();
runDivTests<uint32_t, int32_t, int64_t>();
runDivTests<uint32_t, uint32_t, int64_t>();
#ifdef FOLLY_HAVE_INT128_T
runDivTests<uint32_t, int64_t, __int128>();
runDivTests<uint32_t, uint64_t, __int128>();
#endif
}
#ifdef FOLLY_HAVE_INT128_T
TEST(Bits, divTestUint64) {
runDivTests<uint64_t, int8_t, __int128>();
runDivTests<uint64_t, uint8_t, __int128>();
runDivTests<uint64_t, int16_t, __int128>();
runDivTests<uint64_t, uint16_t, __int128>();
runDivTests<uint64_t, int32_t, __int128>();
runDivTests<uint64_t, uint32_t, __int128>();
runDivTests<uint64_t, int64_t, __int128>();
runDivTests<uint64_t, uint64_t, __int128>();
}
#endif
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment