integer division with controlled rounding in Math.h

Summary: C++'s integer division performs truncation, but it is fairly common that people actually want to round up. There are actually four rounding modes that make some sense: toward negative infinity (floor), toward positive infinity (ceil), toward zero (truncation), and away from zero (?). It is pretty common that code that wants ceil actually writes (a + b - 1) / b, which doesn't work at all for negative values (and has a potential overflow issue). This diff adds 4 templated functions for performing integer division: divFloor, divCeil, divTrunc, and divRoundAway. They are not subject to unnecessary internal underflow or overflow, and they work correctly across their entire input domain. I did a bit of benchmarking across x86_64, arm64, and 32-bit ARM. Only 32-bit ARM was different. That's not surprising since it doesn't have an integer division instruction, and the function that implements integer division doesn't produce the remainder for free. On 32-bit ARM a branchful version that doesn't need the modulus is used. Reviewed By: yfeldblum Differential Revision: D3806743 fbshipit-source-id: c14c56717e96f135321920e64acbfe9dcb1fe039

integer division with controlled rounding in Math.h
Summary: C++'s integer division performs truncation, but it is fairly common that people actually want to round up. There are actually four rounding modes that make some sense: toward negative infinity (floor), toward positive infinity (ceil), toward zero (truncation), and away from zero (?). It is pretty common that code that wants ceil actually writes (a + b - 1) / b, which doesn't work at all for negative values (and has a potential overflow issue). This diff adds 4 templated functions for performing integer division: divFloor, divCeil, divTrunc, and divRoundAway. They are not subject to unnecessary internal underflow or overflow, and they work correctly across their entire input domain. I did a bit of benchmarking across x86_64, arm64, and 32-bit ARM. Only 32-bit ARM was different. That's not surprising since it doesn't have an integer division instruction, and the function that implements integer division doesn't produce the remainder for free. On 32-bit ARM a branchful version that doesn't need the modulus is used. Reviewed By: yfeldblum Differential Revision: D3806743 fbshipit-source-id: c14c56717e96f135321920e64acbfe9dcb1fe039
1d9d5cbe · Nathan Bronson · Facebook Github Bot 4 · f6b5f78b · 1d9d5cbe · 1d9d5cbe
Commit 1d9d5cbe authored Sep 07, 2016 by Nathan Bronson Committed by Facebook Github Bot 4 Sep 07, 2016
5 changed files
--- a/folly/Makefile.am
+++ b/folly/Makefile.am
@@ -254,6 +254,7 @@ nobase_follyinclude_HEADERS = \
 	MallctlHelper.h \
 	Malloc.h \
 	MapUtil.h \
+	Math.h \
 	Memory.h \
 	MemoryMapping.h \
 	MicroSpinLock.h \

--- a/folly/Math.h
+++ b/folly/Math.h
+/*
+ * Copyright 2016 Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * Some arithmetic functions that seem to pop up or get hand-rolled a lot.
+ * So far they are all focused on integer division.
+ */
+#pragma once
+#include <stdint.h>
+#include <limits>
+#include <type_traits>
+namespace folly {
+namespace detail {
+template <typename T>
+inline constexpr T divFloorBranchless(T num, T denom) {
+  // floor != trunc when the answer isn't exact and truncation went the
+  // wrong way (truncation went toward positive infinity).  That happens
+  // when the true answer is negative, which happens when num and denom
+  // have different signs.  The following code compiles branch-free on
+  // many platforms.
+  return (num / denom) +
+      ((num % denom) != 0 ? 1 : 0) *
+      (std::is_signed<T>::value && (num ^ denom) < 0 ? -1 : 0);
+}
+template <typename T>
+inline constexpr T divFloorBranchful(T num, T denom) {
+  // First case handles negative result by preconditioning numerator.
+  // Preconditioning decreases the magnitude of the numerator, which is
+  // itself sign-dependent.  Second case handles zero or positive rational
+  // result, where trunc and floor are the same.
+  return std::is_signed<T>::value && (num ^ denom) < 0 && num != 0
+      ? (num + (num > 0 ? -1 : 1)) / denom - 1
+      : num / denom;
+}
+template <typename T>
+inline constexpr T divCeilBranchless(T num, T denom) {
+  // ceil != trunc when the answer isn't exact (truncation occurred)
+  // and truncation went away from positive infinity.  That happens when
+  // the true answer is positive, which happens when num and denom have
+  // the same sign.
+  return (num / denom) +
+      ((num % denom) != 0 ? 1 : 0) *
+      (std::is_signed<T>::value && (num ^ denom) < 0 ? 0 : 1);
+}
+template <typename T>
+inline constexpr T divCeilBranchful(T num, T denom) {
+  // First case handles negative or zero rational result, where trunc and ceil
+  // are the same.
+  // Second case handles positive result by preconditioning numerator.
+  // Preconditioning decreases the magnitude of the numerator, which is
+  // itself sign-dependent.
+  return (std::is_signed<T>::value && (num ^ denom) < 0) || num == 0
+      ? num / denom
+      : (num + (num > 0 ? -1 : 1)) / denom + 1;
+}
+template <typename T>
+inline constexpr T divRoundAwayBranchless(T num, T denom) {
+  // away != trunc whenever truncation actually occurred, which is when
+  // there is a non-zero remainder.  If the unrounded result is negative
+  // then fixup moves it toward negative infinity.  If the unrounded
+  // result is positive then adjustment makes it larger.
+  return (num / denom) +
+      ((num % denom) != 0 ? 1 : 0) *
+      (std::is_signed<T>::value && (num ^ denom) < 0 ? -1 : 1);
+}
+template <typename T>
+inline constexpr T divRoundAwayBranchful(T num, T denom) {
+  // First case of second ternary operator handles negative rational
+  // result, which is the same as divFloor.  Second case of second ternary
+  // operator handles positive result, which is the same as divCeil.
+  // Zero case is separated for simplicity.
+  return num == 0 ? 0
+                  : (num + (num > 0 ? -1 : 1)) / denom +
+          (std::is_signed<T>::value && (num ^ denom) < 0 ? -1 : 1);
+}
+template <typename N, typename D>
+using IdivResultType = typename std::enable_if<
+    std::is_integral<N>::value && std::is_integral<D>::value &&
+        !std::is_same<N, bool>::value &&
+        !std::is_same<D, bool>::value,
+    decltype(N{1} / D{1})>::type;
+}
+#if defined(__arm__) && !FOLLY_A64
+constexpr auto kIntegerDivisionGivesRemainder = false;
+#else
+constexpr auto kIntegerDivisionGivesRemainder = true;
+#endif
+/**
+ * Returns num/denom, rounded toward negative infinity.  Put another way,
+ * returns the largest integral value that is less than or equal to the
+ * exact (not rounded) fraction num/denom.
+ *
+ * The matching remainder (num - divFloor(num, denom) * denom) can be
+ * negative only if denom is negative, unlike in truncating division.
+ * Note that for unsigned types this is the same as the normal integer
+ * division operator.  divFloor is equivalent to python's integral division
+ * operator //.
+ *
+ * This function undergoes the same integer promotion rules as a
+ * built-in operator, except that we don't allow bool -> int promotion.
+ * This function is undefined if denom == 0.  It is also undefined if the
+ * result type T is a signed type, num is std::numeric_limits<T>::min(),
+ * and denom is equal to -1 after conversion to the result type.
+ */
+template <typename N, typename D>
+inline constexpr detail::IdivResultType<N, D> divFloor(N num, D denom) {
+  using R = decltype(num / denom);
+  return kIntegerDivisionGivesRemainder && std::is_signed<R>::value
+      ? detail::divFloorBranchless<R>(num, denom)
+      : detail::divFloorBranchful<R>(num, denom);
+}
+/**
+ * Returns num/denom, rounded toward positive infinity.  Put another way,
+ * returns the smallest integral value that is greater than or equal to
+ * the exact (not rounded) fraction num/denom.
+ *
+ * This function undergoes the same integer promotion rules as a
+ * built-in operator, except that we don't allow bool -> int promotion.
+ * This function is undefined if denom == 0.  It is also undefined if the
+ * result type T is a signed type, num is std::numeric_limits<T>::min(),
+ * and denom is equal to -1 after conversion to the result type.
+ */
+template <typename N, typename D>
+inline constexpr detail::IdivResultType<N, D> divCeil(N num, D denom) {
+  using R = decltype(num / denom);
+  return kIntegerDivisionGivesRemainder && std::is_signed<R>::value
+      ? detail::divCeilBranchless<R>(num, denom)
+      : detail::divCeilBranchful<R>(num, denom);
+}
+/**
+ * Returns num/denom, rounded toward zero.  If num and denom are non-zero
+ * and have different signs (so the unrounded fraction num/denom is
+ * negative), returns divCeil, otherwise returns divFloor.  If T is an
+ * unsigned type then this is always equal to divFloor.
+ *
+ * Note that this is the same as the normal integer division operator,
+ * at least since C99 (before then the rounding for negative results was
+ * implementation defined).  This function is here for completeness and
+ * as a place to hang this comment.
+ *
+ * This function undergoes the same integer promotion rules as a
+ * built-in operator, except that we don't allow bool -> int promotion.
+ * This function is undefined if denom == 0.  It is also undefined if the
+ * result type T is a signed type, num is std::numeric_limits<T>::min(),
+ * and denom is equal to -1 after conversion to the result type.
+ */
+template <typename N, typename D>
+inline constexpr detail::IdivResultType<N, D> divTrunc(N num, D denom) {
+  return num / denom;
+}
+/**
+ * Returns num/denom, rounded away from zero.  If num and denom are
+ * non-zero and have different signs (so the unrounded fraction num/denom
+ * is negative), returns divFloor, otherwise returns divCeil.  If T is
+ * an unsigned type then this is always equal to divCeil.
+ *
+ * This function undergoes the same integer promotion rules as a
+ * built-in operator, except that we don't allow bool -> int promotion.
+ * This function is undefined if denom == 0.  It is also undefined if the
+ * result type T is a signed type, num is std::numeric_limits<T>::min(),
+ * and denom is equal to -1 after conversion to the result type.
+ */
+template <typename N, typename D>
+inline constexpr detail::IdivResultType<N, D> divRoundAway(N num, D denom) {
+  using R = decltype(num / denom);
+  return kIntegerDivisionGivesRemainder && std::is_signed<R>::value
+      ? detail::divRoundAwayBranchless<R>(num, denom)
+      : detail::divRoundAwayBranchful<R>(num, denom);
+}
+} // namespace folly
--- a/folly/test/Makefile.am
+++ b/folly/test/Makefile.am
@@ -12,6 +12,7 @@ TESTS= \
 	conv_test \
 	expected_test \
 	range_test \
+	math_test \
 	bits_test \
 	bit_iterator_test
@@ -135,6 +136,9 @@ expected_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la
 range_test_SOURCES = RangeTest.cpp
 range_test_LDADD = libfollytestmain.la
+math_test_SOURCES = MathTest.cpp
+math_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la
 bits_test_SOURCES = BitsTest.cpp
 bits_test_LDADD = libfollytestmain.la $(top_builddir)/libfollybenchmark.la

--- a/folly/test/MathBenchmark.cpp
+++ b/folly/test/MathBenchmark.cpp
--- a/folly/test/MathTest.cpp
+++ b/folly/test/MathTest.cpp
+/*
+ * Copyright 2016 Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <folly/Math.h>
+#include <algorithm>
+#include <type_traits>
+#include <utility>
+#include <vector>
+#include <glog/logging.h>
+#include <gtest/gtest.h>
+#include <folly/Portability.h>
+using namespace folly;
+using namespace folly::detail;
+namespace {
+// Workaround for https://llvm.org/bugs/show_bug.cgi?id=16404,
+// issues with __int128 multiplication and UBSAN
+template <typename T>
+T mul(T lhs, T rhs) {
+  if (rhs < 0) {
+    rhs = -rhs;
+    lhs = -lhs;
+  }
+  T accum = 0;
+  while (rhs != 0) {
+    if ((rhs & 1) != 0) {
+      accum += lhs;
+    }
+    lhs += lhs;
+    rhs >>= 1;
+  }
+  return accum;
+}
+template <typename T, typename B>
+T referenceDivFloor(T numer, T denom) {
+  // rv = largest integral value <= numer / denom
+  B n = numer;
+  B d = denom;
+  if (d < 0) {
+    d = -d;
+    n = -n;
+  }
+  B r = n / d;
+  while (mul(r, d) > n) {
+    --r;
+  }
+  while (mul(r + 1, d) <= n) {
+    ++r;
+  }
+  T rv = static_cast<T>(r);
+  assert(static_cast<B>(rv) == r);
+  return rv;
+}
+template <typename T, typename B>
+T referenceDivCeil(T numer, T denom) {
+  // rv = smallest integral value >= numer / denom
+  B n = numer;
+  B d = denom;
+  if (d < 0) {
+    d = -d;
+    n = -n;
+  }
+  B r = n / d;
+  while (mul(r, d) < n) {
+    ++r;
+  }
+  while (mul(r - 1, d) >= n) {
+    --r;
+  }
+  T rv = static_cast<T>(r);
+  assert(static_cast<B>(rv) == r);
+  return rv;
+}
+template <typename T, typename B>
+T referenceDivRoundAway(T numer, T denom) {
+  if ((numer < 0) != (denom < 0)) {
+    return referenceDivFloor<T, B>(numer, denom);
+  } else {
+    return referenceDivCeil<T, B>(numer, denom);
+  }
+}
+template <typename T>
+std::vector<T> cornerValues() {
+  std::vector<T> rv;
+  for (T i = 1; i < 24; ++i) {
+    rv.push_back(i);
+    rv.push_back(std::numeric_limits<T>::max() / i);
+    rv.push_back(std::numeric_limits<T>::max() - i);
+    rv.push_back(std::numeric_limits<T>::max() / 2 - i);
+    if (std::is_signed<T>::value) {
+      rv.push_back(-i);
+      rv.push_back(std::numeric_limits<T>::min() / i);
+      rv.push_back(std::numeric_limits<T>::min() + i);
+      rv.push_back(std::numeric_limits<T>::min() / 2 + i);
+    }
+  }
+  return rv;
+}
+template <typename A, typename B, typename C>
+void runDivTests() {
+  using T = decltype(static_cast<A>(1) / static_cast<B>(1));
+  auto numers = cornerValues<A>();
+  numers.push_back(0);
+  auto denoms = cornerValues<B>();
+  for (A n : numers) {
+    for (B d : denoms) {
+      if (std::is_signed<T>::value && n == std::numeric_limits<T>::min() &&
+          d == static_cast<T>(-1)) {
+        // n / d overflows in two's complement
+        continue;
+      }
+      EXPECT_EQ(divCeil(n, d), (referenceDivCeil<T, C>(n, d))) << n << "/" << d;
+      EXPECT_EQ(divFloor(n, d), (referenceDivFloor<T, C>(n, d))) << n << "/"
+                                                                 << d;
+      EXPECT_EQ(divTrunc(n, d), n / d) << n << "/" << d;
+      EXPECT_EQ(divRoundAway(n, d), (referenceDivRoundAway<T, C>(n, d)))
+          << n << "/" << d;
+      T nn = n;
+      T dd = d;
+      EXPECT_EQ(divCeilBranchless(nn, dd), divCeilBranchful(nn, dd));
+      EXPECT_EQ(divFloorBranchless(nn, dd), divFloorBranchful(nn, dd));
+      EXPECT_EQ(divRoundAwayBranchless(nn, dd), divRoundAwayBranchful(nn, dd));
+    }
+  }
+}
+}
+TEST(Bits, divTestInt8) {
+  runDivTests<int8_t, int8_t, int64_t>();
+  runDivTests<int8_t, uint8_t, int64_t>();
+  runDivTests<int8_t, int16_t, int64_t>();
+  runDivTests<int8_t, uint16_t, int64_t>();
+  runDivTests<int8_t, int32_t, int64_t>();
+  runDivTests<int8_t, uint32_t, int64_t>();
+#ifdef FOLLY_HAVE_INT128_T
+  runDivTests<int8_t, int64_t, __int128>();
+  runDivTests<int8_t, uint64_t, __int128>();
+#endif
+}
+TEST(Bits, divTestInt16) {
+  runDivTests<int16_t, int8_t, int64_t>();
+  runDivTests<int16_t, uint8_t, int64_t>();
+  runDivTests<int16_t, int16_t, int64_t>();
+  runDivTests<int16_t, uint16_t, int64_t>();
+  runDivTests<int16_t, int32_t, int64_t>();
+  runDivTests<int16_t, uint32_t, int64_t>();
+#ifdef FOLLY_HAVE_INT128_T
+  runDivTests<int16_t, int64_t, __int128>();
+  runDivTests<int16_t, uint64_t, __int128>();
+#endif
+}
+TEST(Bits, divTestInt32) {
+  runDivTests<int32_t, int8_t, int64_t>();
+  runDivTests<int32_t, uint8_t, int64_t>();
+  runDivTests<int32_t, int16_t, int64_t>();
+  runDivTests<int32_t, uint16_t, int64_t>();
+  runDivTests<int32_t, int32_t, int64_t>();
+  runDivTests<int32_t, uint32_t, int64_t>();
+#ifdef FOLLY_HAVE_INT128_T
+  runDivTests<int32_t, int64_t, __int128>();
+  runDivTests<int32_t, uint64_t, __int128>();
+#endif
+}
+#ifdef FOLLY_HAVE_INT128_T
+TEST(Bits, divTestInt64) {
+  runDivTests<int64_t, int8_t, __int128>();
+  runDivTests<int64_t, uint8_t, __int128>();
+  runDivTests<int64_t, int16_t, __int128>();
+  runDivTests<int64_t, uint16_t, __int128>();
+  runDivTests<int64_t, int32_t, __int128>();
+  runDivTests<int64_t, uint32_t, __int128>();
+  runDivTests<int64_t, int64_t, __int128>();
+  runDivTests<int64_t, uint64_t, __int128>();
+}
+#endif
+TEST(Bits, divTestUint8) {
+  runDivTests<uint8_t, int8_t, int64_t>();
+  runDivTests<uint8_t, uint8_t, int64_t>();
+  runDivTests<uint8_t, int16_t, int64_t>();
+  runDivTests<uint8_t, uint16_t, int64_t>();
+  runDivTests<uint8_t, int32_t, int64_t>();
+  runDivTests<uint8_t, uint32_t, int64_t>();
+#ifdef FOLLY_HAVE_INT128_T
+  runDivTests<uint8_t, int64_t, __int128>();
+  runDivTests<uint8_t, uint64_t, __int128>();
+#endif
+}
+TEST(Bits, divTestUint16) {
+  runDivTests<uint16_t, int8_t, int64_t>();
+  runDivTests<uint16_t, uint8_t, int64_t>();
+  runDivTests<uint16_t, int16_t, int64_t>();
+  runDivTests<uint16_t, uint16_t, int64_t>();
+  runDivTests<uint16_t, int32_t, int64_t>();
+  runDivTests<uint16_t, uint32_t, int64_t>();
+#ifdef FOLLY_HAVE_INT128_T
+  runDivTests<uint16_t, int64_t, __int128>();
+  runDivTests<uint16_t, uint64_t, __int128>();
+#endif
+}
+TEST(Bits, divTestUint32) {
+  runDivTests<uint32_t, int8_t, int64_t>();
+  runDivTests<uint32_t, uint8_t, int64_t>();
+  runDivTests<uint32_t, int16_t, int64_t>();
+  runDivTests<uint32_t, uint16_t, int64_t>();
+  runDivTests<uint32_t, int32_t, int64_t>();
+  runDivTests<uint32_t, uint32_t, int64_t>();
+#ifdef FOLLY_HAVE_INT128_T
+  runDivTests<uint32_t, int64_t, __int128>();
+  runDivTests<uint32_t, uint64_t, __int128>();
+#endif
+}
+#ifdef FOLLY_HAVE_INT128_T
+TEST(Bits, divTestUint64) {
+  runDivTests<uint64_t, int8_t, __int128>();
+  runDivTests<uint64_t, uint8_t, __int128>();
+  runDivTests<uint64_t, int16_t, __int128>();
+  runDivTests<uint64_t, uint16_t, __int128>();
+  runDivTests<uint64_t, int32_t, __int128>();
+  runDivTests<uint64_t, uint32_t, __int128>();
+  runDivTests<uint64_t, int64_t, __int128>();
+  runDivTests<uint64_t, uint64_t, __int128>();
+}
+#endif