ByteEvent (socket timestamp) foundation
Summary: Adding support for write and socket timestamps by introducing `ByteEvent` that can be delivered to observers. `AsyncTransport::WriteFlags` has long had timestamping related flags, such as `TIMESTAMP_TX`, but the code required to act on these flags only existed in proxygen. This diff generalizes the approach so that it works for other use cases of `AsyncSocket`. This diff is long, but much of it is unit tests designed to prevent regressions given the trickiness of socket timestamps and `ByteEvent`. **Each `ByteEvent` contains:** - Type (WRITE, SCHED, TX, ACK) - Byte stream offset that the timestamp is for (relative to the raw byte stream, which means after SSL in the case of AsyncSSLSocket) - `steady_clock` timestamp recorded by AsyncSocket when generating the `ByteEvent` - For SCHED, TX, and ACK events, if available, hardware and software (kernel) timestamps **How `ByteEvent` are used:** - Support is enabled when an observer is attached with the `byteEvents` config flag set. If the socket does not support timestamps, the observer is notified through the `byteEventsUnavailable` callback. Otherwise, `byteEventsEnabled` is called - When the application writes to a socket with `ByteEvent` support enabled and a relevant `WriteFlag`, SCHED/TX/ACK `ByteEvent` are requested from the kernel, and WRITE `ByteEvent` are generated by the socket for the *last byte* in the write. - If the entire write buffer cannot be written at once, then additional `ByteEvent` will also be generated for the last byte in each write. - This means that if the application wants to timestamp a specific byte, it must break up the write buffer before handing it to `AsyncSocket` such that the byte to timestamp is the last byte in the write buffer. - When socket timestamps arrive from the kernel via the socket error queue, they are transformed into `ByteEvent` and passed to observers **Caveats:** 1. Socket timestamps received from the kernel contain the byte's offset in the write stream. This counter is a `uint32_t`, and thus rolls over every ~4GB. When transforming raw timestamp into `ByteEvent`, we correct for this and transform the raw offset into an offset relative to the raw byte offset stored by `AsyncSocket` (returned via `getRawBytesWritten()`). 2. At the moment, a read callback must be installed to receive socket timestamps due to epoll's behavior. I will correct this with a patch to epoll, see https://github.com/libevent/libevent/issues/1038#issuecomment-703315425 for details 3. If a msghdr's ancillary data contains a timestamping flag (such as `SOF_TIMESTAMPING_TX_SOFTWARE`), the kernel will enqueue a socket error message containing the byte offset of the write ( `SO_EE_ORIGIN_TIMESTAMPING`) even if timestamping has not been enabled by an associated call to `setsockopt`. This creates a problem: 1. If an application was to use a timestamp `WriteFlags` such as `TIMESTAMP_TX` without enabling timestamping, and if `AsyncSocket` transformed such `WriteFlags` to ancillary data by default, it could create a situation where epoll continues to return `EV_READ` (due to items in the socket error queue), but `AsyncSocket` would not fetch anything from the socket error queue. 2. To prevent this scenario, `WriteFlags` related to timestamping are not translated into msghdr ancillary data unless timestamping is enabled. This required adding a boolean to `getAncillaryData` and `getAncillaryDataSize`. Differential Revision: D24094832 fbshipit-source-id: e3bec730ddd1fc1696023d8c982ae02ab9b5fb7d
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment