Commit 9b15aded authored by Brandon Schlinker's avatar Brandon Schlinker Committed by Facebook GitHub Bot

Fix EOR bug, always pass timestamp flags

Summary:
Socket timestamps (ACK / TX) and EoR tracking currently break for `AsyncSSLSocket` if SSL renegotiation occurs while a timestamped write / EoR write is in progress.

- If EoR tracking is enabled, the EoR flag and any timestamp flags are not included until `AsyncSSLSocket` writes a buffer containing the final byte to the socket. This is to avoid these flags from being set on a partial write of the passed in buffer.
- If a write occurs while an SSL renegotiation is in progress, OpenSSL will return `SSL_ERROR_WANT_WRITE`. When this happens, we need to call write again, passing back in the same buffer.
- The current logic for deciding whether to include the EoR and timestamping flags (`eorAwareSSLWrite`) adds the number of bytes pending to the value returned by `AsyncSSLSocket::getRawBytesWritten` to determine the minimum byte offset when the flags should be added.
  - However, when a write fails due to SSL renegotiation, `getRawBytesWritten` may include some of the bytes that were passed in the last call, despite how they have not actually been written to the transport yet. This is because `getRawBytesWritten` is calculated based on the BIO chain length.
  - As a result, the current logic for calculating the offset on which to add the flags overshoots -- it returns an offset that will never be written. This causes the flags to not be added, and timestamps to timeout.
- This results in one of two things:
  - Timestamp timeouts, where the timestamps are never received
  - If a subsequent write is timestamped, the timestamps from that write may be used instead. This will cause the timestamps to be inflated, and leads to higher TX / ACK times at upper percentiles.

Fix is as follows:
- Change the logic that determines whether the EoR is included in the buffer to no longer rely on `getRawBytesWritten`. In addition, simplify logic so that it is no longer a separate function and easier to make sense of.
- Even if EoR tracking is enabled, always write timestamp flags (TX, ACK, etc.) on every write. This reduces the amount of coordination required between different components. The socket error message handler will end up with more cases of timestamps being delivered for earlier bytes than the last body byte, but they already need to deal with that today due to partial writes.

I considered just outright removing support for EoR tracking (EoR was previously used for timestamping) but decided against this as there's still some value in setting the EoR flag for debugging; see notes in code.

Reviewed By: yfeldblum

Differential Revision: D21969420

fbshipit-source-id: db8e7e5fbd70d627f88f2c43199387f5112b5f9e
parent 7f1bda25
......@@ -390,12 +390,7 @@ std::string AsyncSSLSocket::getApplicationProtocol() const noexcept {
}
void AsyncSSLSocket::setEorTracking(bool track) {
if (isEorTrackingEnabled() != track) {
AsyncSocket::setEorTracking(track);
appEorByteNo_ = 0;
appEorByteWriteFlags_ = {};
minEorRawByteNo_ = 0;
}
AsyncSocket::setEorTracking(track);
}
size_t AsyncSSLSocket::getRawBytesWritten() const {
......@@ -1008,7 +1003,7 @@ bool AsyncSSLSocket::willBlock(
int* sslErrorOut,
unsigned long* errErrorOut) noexcept {
*errErrorOut = 0;
int error = *sslErrorOut = SSL_get_error(ssl_.get(), ret);
int error = *sslErrorOut = sslGetErrorImpl(ssl_.get(), ret);
if (error == SSL_ERROR_WANT_READ) {
// Register for read event if not already.
updateEventRegistration(EventHandler::READ, EventHandler::WRITE);
......@@ -1408,7 +1403,7 @@ AsyncSSLSocket::performRead(void** buf, size_t* buflen, size_t* offset) {
std::make_unique<SSLException>(SSLError::CLIENT_RENEGOTIATION));
}
if (bytes <= 0) {
int error = SSL_get_error(ssl_.get(), bytes);
int error = sslGetErrorImpl(ssl_.get(), bytes);
if (error == SSL_ERROR_WANT_READ) {
// The caller will register for read event if not already.
if (errno == EWOULDBLOCK || errno == EAGAIN) {
......@@ -1603,22 +1598,61 @@ AsyncSocket::WriteResult AsyncSSLSocket::performWrite(
}
}
// cork the current write if the original flags included CORK or if there
// are remaining iovec to write
corkCurrentWrite_ =
isSet(flags, WriteFlags::CORK) || (i + buffersStolen + 1 < count);
// track the EoR if:
// (1) there are write flags that require EoR tracking (EOR / TIMESTAMP_TX)
// (2) if the buffer includes the EOR byte
appEorByteWriteFlags_ = flags & kEorRelevantWriteFlags;
bool trackEor = appEorByteWriteFlags_ != folly::WriteFlags::NONE &&
(i + buffersStolen + 1 == count);
bytes = eorAwareSSLWrite(ssl_, sslWriteBuf, int(len), trackEor);
// From here, the write flow is as follows:
// - sslWriteImpl calls SSL_write, which encrypts the passed buffer.
// - SSL_write calls AsyncSSLSocket::bioWrite with the encrypted buffer.
// - AsyncSSLSocket::bioWrite calls AsyncSocket::sendSocketMessage(...).
//
// When sendSocketMessage calls sendMsg, WriteFlags are transformed into
// ancillary data and/or sendMsg flags. If WriteFlag::EOR is in flags and
// trackEor_ is set, then we should ensure that MSG_EOR is only passed to
// sendmsg when the final byte of the orginally passed in buffer is being
// written. Since the buffer originally passed to performWrite may be split
// up and written over multiple calls to sendmsg, we have to take care to
// unset the EOR flag if it was included in the WriteFlags passed in and
// we're writing a buffer that does _not_ contain the final byte of the
// orignally passed buffer.
//
// We handle EOR as follows:
// - We set currWriteFlags_ to the passed in WriteFlags.
// - If sslWriteBuf does NOT contain the last byte of the passed in iovec,
// then we set currBytesToFinalByte_ to folly::none. In bioWrite, we
// unset WriteFlags::EOR if it is set in currWriteFlags_.
// - If sslWriteBuf DOES contain the last byte of the passed in iovec,
// then we set bytesToFinalByte_ to int(len). In bioWrite, if the length
// of the passed in buffer >= currBytesToFinalByte_, then we leave the
// flags in currWriteFlags_ alone.
//
// What about timestamp flags?
// - We don't do any special handling for timestamping flags.
// - This may mean that more timestamps than necessary get generated, but
// that's OK; you already have to deal with that for timestamping due to
// the possibility of partial writes.
// - MSG_EOR used to be used for timestamping, but hasn't been for years.
//
// Finally, why even care about MSG_EOR, if not for timestamping?
// - If set, it is marked in the corresponding tcp_skb_cb; this can be
// useful when debugging.
// - The kernel uses it to decide whether socket buffers can be collapsed
// together (see tcp_skb_can_collapse_to).
currWriteFlags_ = flags;
uint32_t iovecWrittenToSslWriteBuf = i + buffersStolen + 1;
CHECK_LE(iovecWrittenToSslWriteBuf, count);
if (iovecWrittenToSslWriteBuf == count) { // last byte is in sslWriteBuf
currBytesToFinalByte_ = len; // length of current buffer
} else { // there are still remaining buffers / iovec to write
currBytesToFinalByte_ = folly::none;
currWriteFlags_ |= WriteFlags::CORK;
}
bytes = sslWriteImpl(ssl_.get(), sslWriteBuf, int(len));
if (bytes <= 0) {
int error = SSL_get_error(ssl_.get(), int(bytes));
int error = sslGetErrorImpl(ssl_.get(), int(bytes));
if (error == SSL_ERROR_WANT_WRITE) {
// The entire buffer needs to be passed in again, so *partialWritten
// is set to the original offset where we started for this call to
// performWrite(); see SSL_ERROR_WANT_WRITE documentation for details.
//
// The caller will register for write event if not already.
*partialWritten = uint32_t(offset);
return WriteResult(totalWritten);
......@@ -1651,42 +1685,6 @@ AsyncSocket::WriteResult AsyncSSLSocket::performWrite(
return WriteResult(totalWritten);
}
int AsyncSSLSocket::eorAwareSSLWrite(
const ssl::SSLUniquePtr& ssl,
const void* buf,
int n,
bool eor) {
if (eor && isEorTrackingEnabled()) {
if (appEorByteNo_) {
// cannot track for more than one app byte EOR
CHECK(appEorByteNo_ == appBytesWritten_ + n);
} else {
appEorByteNo_ = appBytesWritten_ + n;
}
// 1. It is fine to keep updating minEorRawByteNo_.
// 2. It is _min_ in the sense that SSL record will add some overhead.
minEorRawByteNo_ = getRawBytesWritten() + n;
}
n = sslWriteImpl(ssl.get(), buf, n);
if (n > 0) {
appBytesWritten_ += n;
if (appEorByteNo_) {
if (getRawBytesWritten() >= minEorRawByteNo_) {
minEorRawByteNo_ = 0;
}
if (appBytesWritten_ == appEorByteNo_) {
appEorByteNo_ = 0;
appEorByteWriteFlags_ = {};
} else {
CHECK(appBytesWritten_ < appEorByteNo_);
}
}
}
return n;
}
void AsyncSSLSocket::sslInfoCallback(const SSL* ssl, int where, int ret) {
AsyncSSLSocket* sslSocket = AsyncSSLSocket::getFromSSL(ssl);
if (sslSocket->handshakeComplete_ && (where & SSL_CB_HANDSHAKE_START)) {
......@@ -1709,46 +1707,42 @@ void AsyncSSLSocket::sslInfoCallback(const SSL* ssl, int where, int ret) {
}
int AsyncSSLSocket::bioWrite(BIO* b, const char* in, int inl) {
// get pointer to AsyncSSLSocket from BioAppData
auto appData = OpenSSLUtils::getBioAppData(b);
CHECK(appData);
AsyncSSLSocket* sslSock = reinterpret_cast<AsyncSSLSocket*>(appData);
CHECK(sslSock);
// if EOR is tracked, correct if needed
WriteFlags flags = sslSock->currWriteFlags_;
if (sslSock->trackEor_ &&
(!sslSock->currBytesToFinalByte_.has_value() ||
*(sslSock->currBytesToFinalByte_) > (size_t)inl)) {
// unset EOR if set, since we're not writing the last byte yet
flags = unSet(flags, folly::WriteFlags::EOR);
}
struct msghdr msg;
struct iovec iov;
AsyncSSLSocket* tsslSock;
iov.iov_base = const_cast<char*>(in);
iov.iov_len = size_t(inl);
memset(&msg, 0, sizeof(msg));
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
auto appData = OpenSSLUtils::getBioAppData(b);
CHECK(appData);
tsslSock = reinterpret_cast<AsyncSSLSocket*>(appData);
CHECK(tsslSock);
WriteFlags flags = WriteFlags::NONE;
if (tsslSock->isEorTrackingEnabled() && tsslSock->minEorRawByteNo_ &&
tsslSock->minEorRawByteNo_ <= BIO_number_written(b) + inl) {
flags |= tsslSock->appEorByteWriteFlags_;
}
if (tsslSock->corkCurrentWrite_) {
flags |= WriteFlags::CORK;
}
int msg_flags = tsslSock->getSendMsgParamsCB()->getFlags(
flags, false /*zeroCopyEnabled*/);
int msg_flags =
sslSock->getSendMsgParamsCB()->getFlags(flags, false /*zeroCopyEnabled*/);
msg.msg_controllen =
tsslSock->getSendMsgParamsCB()->getAncillaryDataSize(flags);
sslSock->getSendMsgParamsCB()->getAncillaryDataSize(flags);
CHECK_GE(
AsyncSocket::SendMsgParamsCallback::maxAncillaryDataSize,
msg.msg_controllen);
if (msg.msg_controllen != 0) {
msg.msg_control = reinterpret_cast<char*>(alloca(msg.msg_controllen));
tsslSock->getSendMsgParamsCB()->getAncillaryData(flags, msg.msg_control);
sslSock->getSendMsgParamsCB()->getAncillaryData(flags, msg.msg_control);
}
auto result =
tsslSock->sendSocketMessage(OpenSSLUtils::getBioFd(b), &msg, msg_flags);
sslSock->sendSocketMessage(OpenSSLUtils::getBioFd(b), &msg, msg_flags);
BIO_clear_retry_flags(b);
if (!result.exception && result.writeReturn <= 0) {
if (OpenSSLUtils::getBioShouldRetryWrite(int(result.writeReturn))) {
......
......@@ -862,11 +862,16 @@ class AsyncSSLSocket : public AsyncSocket {
uint32_t* countWritten,
uint32_t* partialWritten);
// This virtual wrapper around SSL_write exists solely for testing/mockability
// Virtual wrapper around SSL_write, solely for testing/mockability
virtual int sslWriteImpl(SSL* ssl, const void* buf, int n) {
return SSL_write(ssl, buf, n);
}
// Virtual wrapper around SSL_get_error, solely for testing/mockability
virtual int sslGetErrorImpl(const SSL* s, int ret_code) {
return SSL_get_error(s, ret_code);
}
/**
* Apply verification options passed to sslConn/sslAccept or those set
* in the underlying SSLContext object.
......@@ -884,21 +889,6 @@ class AsyncSSLSocket : public AsyncSocket {
*/
bool setupSSLBio();
/**
* A SSL_write wrapper that understand EOR
*
* @param ssl: SSL pointer
* @param buf: Buffer to be written
* @param n: Number of bytes to be written
* @param eor: Does the last byte (buf[n-1]) have the app-last-byte?
* @return: The number of app bytes successfully written to the socket
*/
int eorAwareSSLWrite(
const ssl::SSLUniquePtr& ssl,
const void* buf,
int n,
bool eor);
// Inherit error handling methods from AsyncSocket, plus the following.
void failHandshake(const char* fn, const AsyncSocketException& ex);
......@@ -931,32 +921,17 @@ class AsyncSSLSocket : public AsyncSocket {
Timeout handshakeTimeout_;
Timeout connectionTimeout_;
// The app byte num that we are tracking for EOR.
//
// Only one app EOR byte can be tracked.
// See appEorByteWriteFlags_ for details.
size_t appEorByteNo_{0};
// The WriteFlags to pass for the app byte num that is tracked for EOR.
//
// When openssl is about to send appEorByteNo_, these flags will be passed to
// the application via the getAncillaryData callback. The application can then
// generate a control message containing socket timestamping flags or other
// commands that will be included when the corresponding buffer is passed to
// the kernel via sendmsg().
//
// See AsyncSSLSocket::bioWrite (which overrides OpenSSL biowrite).
WriteFlags appEorByteWriteFlags_{};
// WriteFlags last passed to performWrite
WriteFlags currWriteFlags_{};
// Number of bytes to write before final byte
// See AsyncSSLSocket::performWrite for details
folly::Optional<size_t> currBytesToFinalByte_;
// Try to avoid calling SSL_write() for buffers smaller than this.
// It doesn't take effect when it is 0.
size_t minWriteSize_{1500};
// When openssl is about to sendmsg() across the minEorRawBytesNo_,
// it will trigger logic to include an application defined control message.
//
// See appEorByteWriteFlags_ for details.
size_t minEorRawByteNo_{0};
#if FOLLY_OPENSSL_HAS_SNI
std::shared_ptr<folly::SSLContext> handshakeCtx_;
std::string tlsextHostname_;
......
......@@ -1169,7 +1169,7 @@ class AsyncSocket : public AsyncTransport {
* @param msg Message to send
* @param msg_flags Flags to pass to sendmsg
*/
AsyncSocket::WriteResult
virtual AsyncSocket::WriteResult
sendSocketMessage(NetworkSocket fd, struct msghdr* msg, int msg_flags);
virtual ssize_t
......
......@@ -120,19 +120,6 @@ constexpr bool isSet(WriteFlags a, WriteFlags b) {
return (a & b) == b;
}
/**
* Write flags that are specifically for the final write call of a buffer.
*
* In some cases, buffers passed to send may be coalesced or split by the socket
* write handling logic. For instance, a buffer passed to AsyncSSLSocket may be
* split across multiple TLS records (and therefore multiple calls to write).
*
* When a buffer is split up, these flags will only be applied for the final
* call to write for that buffer.
*/
constexpr WriteFlags kEorRelevantWriteFlags =
WriteFlags::EOR | WriteFlags::TIMESTAMP_TX;
class AsyncReader {
public:
class ReadCallback {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment