Commit ffa5b314 authored by Michael Bolin's avatar Michael Bolin Committed by Facebook Github Bot

Clarify detail about small-string optimization in FBString.

Summary:
I was reading through about the various approaches to small-string
optimization: https://shaharmike.com/cpp/std-string/.
I noticed that Clang can store up to 22 bytes inline but FBString claims it
can do 23, so I read through the code to figure out where the extra byte came
from.

Specifically, I wasn't sure how it could have space to store the size as well
as ensure the buffer was null-terminated. After playing with the code for a bit
(this was further complicated because running the code under ASAN changes the
behavior, which I didn't realize before I started this exploration), I saw
how we don't store the size, but `maxSmallSize - size`, so that null-termination
works out even when `size==23`.

This updates the docs to hopefully save someone else this same exploration.

(Note: this ignores all push blocking failures!)

Reviewed By: ot

Differential Revision: D10258831

fbshipit-source-id: bfc0dd7ae55518af4173625bd719cfd4778180cc
parent 453735cb
......@@ -303,15 +303,25 @@ class fbstring_core_model {
* allocated right before the character array.
*
* The discriminator between these three strategies sits in two
* bits of the rightmost char of the storage. If neither is set, then the
* string is small (and its length sits in the lower-order bits on
* little-endian or the high-order bits on big-endian of that
* rightmost character). If the MSb is set, the string is medium width.
* If the second MSb is set, then the string is large. On little-endian,
* these 2 bits are the 2 MSbs of MediumLarge::capacity_, while on
* big-endian, these 2 bits are the 2 LSbs. This keeps both little-endian
* and big-endian fbstring_core equivalent with merely different ops used
* to extract capacity/category.
* bits of the rightmost char of the storage:
* - If neither is set, then the string is small. Its length is represented by
* the lower-order bits on little-endian or the high-order bits on big-endian
* of that rightmost character. The value of these six bits is
* `maxSmallSize - size`, so this quantity must be subtracted from
* `maxSmallSize` to compute the `size` of the string (see `smallSize()`).
* This scheme ensures that when `size == `maxSmallSize`, the last byte in the
* storage is \0. This way, storage will be a null-terminated sequence of
* bytes, even if all 23 bytes of data are used on a 64-bit architecture.
* This enables `c_str()` and `data()` to simply return a pointer to the
* storage.
*
* - If the MSb is set, the string is medium width.
*
* - If the second MSb is set, then the string is large. On little-endian,
* these 2 bits are the 2 MSbs of MediumLarge::capacity_, while on
* big-endian, these 2 bits are the 2 LSbs. This keeps both little-endian
* and big-endian fbstring_core equivalent with merely different ops used
* to extract capacity/category.
*/
template <class Char>
class fbstring_core {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment