|
dispenso 1.6.0
A library for task parallelism
|
A lock-free multi-producer multi-consumer ring buffer with fixed capacity. More...
#include <mpmc_ring_buffer.h>
Public Member Functions | |
| MpmcRingBuffer () | |
| Constructs an empty ring buffer. | |
| MpmcRingBuffer (const MpmcRingBuffer &)=delete | |
| Ring buffers are not copyable. | |
| MpmcRingBuffer (MpmcRingBuffer &&)=delete | |
| Ring buffers are not movable. | |
| ~MpmcRingBuffer () | |
| Destroys the ring buffer. | |
| bool | try_push (T &&item) |
| Attempts to push an element into the buffer by moving. | |
| template<typename U = T, std::enable_if_t< std::is_nothrow_copy_constructible< U >::value, int > = 0> | |
| bool | try_push (const T &item) |
| Attempts to push an element into the buffer by copying. | |
| template<typename... Args, std::enable_if_t< std::is_nothrow_constructible< T, Args... >::value, int > = 0> | |
| bool | try_emplace (Args &&... args) |
| Attempts to construct an element in-place in the buffer. | |
| bool | try_pop (T &item) |
| Attempts to pop an element from the buffer. | |
| OpResult< T > | try_pop () |
| Attempts to pop an element from the buffer, returning an optional. | |
| bool | try_pop_into (T *storage) |
| Attempts to pop an element into uninitialized storage. | |
| size_type | try_push_batch (T *items, size_type count) |
| Attempts to push multiple elements into the buffer. | |
| bool | empty () const |
| Checks if the buffer is empty. | |
| bool | full () const |
| Checks if the buffer is full. | |
| size_type | size () const |
| Returns the approximate number of elements in the buffer. | |
Static Public Member Functions | |
| static constexpr size_type | capacity () noexcept |
| Returns the maximum number of elements the buffer can hold. | |
A lock-free multi-producer multi-consumer ring buffer with fixed capacity.
This class implements a bounded, lock-free ring buffer that supports concurrent access from multiple producer threads and multiple consumer threads simultaneously. It uses the Vyukov bounded MPMC queue algorithm with per-slot sequence numbers.
Each slot contains a sequence number that tracks its state:
seq == pos: the slot is available for writing (empty)seq == pos + 1: the slot contains data and is available for readingUnlike std::array-based implementations, this buffer does NOT require the element type to be default-constructible. Elements are constructed in-place when pushed and destroyed when popped.
| T | The type of elements stored in the buffer. Must be move-constructible. |
| Capacity | The number of elements the buffer can hold. Must be at least 2. Defaults to 16, which provides a good balance of capacity and cache footprint for per-thread work queues. |
| RoundUpToPowerOfTwo | If true (default), rounds up the internal buffer size to the next power of two for faster index wrap-around using bitwise AND instead of modulo. This may result in actual capacity being larger than requested. Set to false to use exactly the requested capacity. |
This class is fully thread-safe:
try_push(), try_emplace(), or try_push_batch() concurrentlytry_pop() concurrentlyempty(), full(), and size() may be called from any thread, but provide only a snapshot that may be immediately staleThe implementation uses:
memory_order_relaxed for loading head/tail in the fast pathmemory_order_acquire for reading per-slot sequence numbers (ensures data visibility)memory_order_release for writing per-slot sequence numbers (publishes data)compare_exchange_strong on head/tail to claim a positionhead_ and tail_ are monotonically increasing 64-bit counters — never reset, and never wrapping in any realistic runtime (2^64 push/pop operations). The slot index is counter & mask, but the CAS always targets the full counter, never the wrapped index. Because a monotonic counter can never present the same value twice, the classic ABA problem cannot arise on head_/tail_: a successful CAS proves that no other producer (resp. consumer) advanced the counter between our relaxed load and the CAS, so the position we just claimed is exclusively ours. A full lap around the ring advances the counter by capacity, so it cannot alias an earlier value.
The per-slot sequence numbers carry the other half of the Vyukov invariant and make each (slot, lap) pair distinguishable: a slot accepts a write at position pos only when seq == pos, and yields a read only when seq == pos + 1. A producer publishes with seq = pos + 1; a consumer releases with seq = pos + capacity (exactly the next counter value that maps to this slot). So a stale observer can never mistake one lap's slot state for another's, and the value written into seq is itself monotonic per slot — there is no recycled sentinel to confuse a CAS.
Deviation from canonical Vyukov: the single-element operations are fail-fast — one compare_exchange_strong, no retry loop. The canonical algorithm reloads the counter and loops on CAS failure. Here, contention (a peer claimed the same position first) simply returns false/empty even when the buffer is not actually full/empty. This is a deliberate spurious-failure trade-off for bounded fail-fast scheduling (callers retry or fall back to another tier; see fork_join_scheduling.md). It never corrupts state, double-uses a slot, or loses an element — the monotonic-counter + per-slot-sequence invariants above hold regardless of how the single CAS resolves. (try_push_batch claims a contiguous run [tail, tail+available) with one CAS; the same monotonic-counter argument makes that whole reserved run exclusively the caller's.)
Each slot is padded up to a multiple of the cache line size to avoid false sharing between adjacent slots when multiple threads operate on neighboring elements. The element buffer is placed before the sequence number so that any alignment padding T requires does not land between seq and data – such interior padding could otherwise push a slot that would have fit in a single cache line over the boundary:
The per-slot size therefore depends on sizeof(T) (and alignof(T)), not just the cache line size. A slot fits in a single cache line only when sizeof(T) + sizeof(std::atomic<size_t>) does – roughly sizeof(T) <= 56 for a 64-byte cache line (or <= 120 on platforms with a 128-byte cache line, such as Apple ARM). Larger elements round each slot up to the next cache- line multiple; e.g. a 64-byte element yields 128-byte slots.
With a small T, 16 slots x 64 bytes = 1 KB per ring – compact enough for per-thread allocation even at high thread counts (256 threads = 256 KB total). For larger elements, scale this figure by the actual per-slot size (sizeof(Slot)).
Definition at line 171 of file mpmc_ring_buffer.h.
|
inline |
Constructs an empty ring buffer.
Initializes all slot sequence numbers to their position index, indicating that all slots are available for writing.
Definition at line 202 of file mpmc_ring_buffer.h.
|
inline |
Destroys the ring buffer.
All elements remaining in the buffer are destroyed. Ensure no threads are accessing the buffer when it is destroyed.
Definition at line 236 of file mpmc_ring_buffer.h.
|
inlinestaticconstexprnoexcept |
Returns the maximum number of elements the buffer can hold.
When RoundUpToPowerOfTwo is true (default), this may be larger than the requested Capacity template parameter.
Definition at line 584 of file mpmc_ring_buffer.h.
|
inline |
Checks if the buffer is empty.
Definition at line 541 of file mpmc_ring_buffer.h.
|
inline |
Checks if the buffer is full.
Definition at line 555 of file mpmc_ring_buffer.h.
|
inline |
Returns the approximate number of elements in the buffer.
Definition at line 570 of file mpmc_ring_buffer.h.
|
inline |
Attempts to construct an element in-place in the buffer.
If the buffer has space, constructs an element directly in the buffer storage using the provided arguments, avoiding any copy or move operations.
| Args | The types of arguments to forward to T's constructor. |
| args | The arguments to forward to the element constructor. |
Definition at line 330 of file mpmc_ring_buffer.h.
|
inline |
Attempts to pop an element from the buffer, returning an optional.
If the buffer has elements, moves the front element into an OpResult and returns it. If the buffer is empty, returns an empty OpResult.
Definition at line 413 of file mpmc_ring_buffer.h.
|
inline |
Attempts to pop an element from the buffer.
If the buffer has elements, moves the front element into the output parameter and returns true. If the buffer is empty or another thread is contending for the same slot, returns false immediately.
| [out] | item | The location to move the popped element to. |
Definition at line 359 of file mpmc_ring_buffer.h.
|
inline |
Attempts to pop an element into uninitialized storage.
Similar to try_pop(T&), but uses placement new to move-construct the element into the provided storage. This is useful when T is not default-constructible.
| [out] | storage | Pointer to uninitialized storage where the element will be move-constructed. Must have proper alignment for T. |
Definition at line 448 of file mpmc_ring_buffer.h.
|
inline |
Attempts to push an element into the buffer by copying.
| item | The element to push (will be copied). |
Definition at line 294 of file mpmc_ring_buffer.h.
|
inline |
Attempts to push an element into the buffer by moving.
If the buffer has space, the element is moved into the buffer and the function returns true. If the buffer is full or another thread is contending for the same slot, returns false immediately.
| item | The element to push (will be moved from on success). |
Definition at line 272 of file mpmc_ring_buffer.h.
|
inline |
Attempts to push multiple elements into the buffer.
Pushes up to count elements from the array into the buffer in a single atomic tail reservation. Returns the number of elements actually pushed, which may be less than count if the buffer doesn't have enough space or if there is contention from other producers.
Validates each slot's sequence number to determine how many consecutive slots are available, then reserves them with a single CAS on the tail.
This design enables natural overflow handling:
| items | Pointer to the array of items to push. Items are moved from. |
| count | Number of items to attempt to push. |
Definition at line 496 of file mpmc_ring_buffer.h.