dispenso 1.6.0
A library for task parallelism
Loading...
Searching...
No Matches
dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo > Class Template Reference

A lock-free multi-producer multi-consumer ring buffer with fixed capacity. More...

#include <mpmc_ring_buffer.h>

Public Member Functions

 MpmcRingBuffer ()
 Constructs an empty ring buffer.
 
 MpmcRingBuffer (const MpmcRingBuffer &)=delete
 Ring buffers are not copyable.
 
 MpmcRingBuffer (MpmcRingBuffer &&)=delete
 Ring buffers are not movable.
 
 ~MpmcRingBuffer ()
 Destroys the ring buffer.
 
bool try_push (T &&item)
 Attempts to push an element into the buffer by moving.
 
template<typename U = T, std::enable_if_t< std::is_nothrow_copy_constructible< U >::value, int > = 0>
bool try_push (const T &item)
 Attempts to push an element into the buffer by copying.
 
template<typename... Args, std::enable_if_t< std::is_nothrow_constructible< T, Args... >::value, int > = 0>
bool try_emplace (Args &&... args)
 Attempts to construct an element in-place in the buffer.
 
bool try_pop (T &item)
 Attempts to pop an element from the buffer.
 
OpResult< T > try_pop ()
 Attempts to pop an element from the buffer, returning an optional.
 
bool try_pop_into (T *storage)
 Attempts to pop an element into uninitialized storage.
 
size_type try_push_batch (T *items, size_type count)
 Attempts to push multiple elements into the buffer.
 
bool empty () const
 Checks if the buffer is empty.
 
bool full () const
 Checks if the buffer is full.
 
size_type size () const
 Returns the approximate number of elements in the buffer.
 

Static Public Member Functions

static constexpr size_type capacity () noexcept
 Returns the maximum number of elements the buffer can hold.
 

Detailed Description

template<typename T, size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
class dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >

A lock-free multi-producer multi-consumer ring buffer with fixed capacity.

This class implements a bounded, lock-free ring buffer that supports concurrent access from multiple producer threads and multiple consumer threads simultaneously. It uses the Vyukov bounded MPMC queue algorithm with per-slot sequence numbers.

Each slot contains a sequence number that tracks its state:

  • When seq == pos: the slot is available for writing (empty)
  • When seq == pos + 1: the slot contains data and is available for reading
  • Other values: the slot is being written to or read from by another thread

Unlike std::array-based implementations, this buffer does NOT require the element type to be default-constructible. Elements are constructed in-place when pushed and destroyed when popped.

Template Parameters
TThe type of elements stored in the buffer. Must be move-constructible.
CapacityThe number of elements the buffer can hold. Must be at least 2. Defaults to 16, which provides a good balance of capacity and cache footprint for per-thread work queues.
RoundUpToPowerOfTwoIf true (default), rounds up the internal buffer size to the next power of two for faster index wrap-around using bitwise AND instead of modulo. This may result in actual capacity being larger than requested. Set to false to use exactly the requested capacity.

Thread Safety

This class is fully thread-safe:

Memory Ordering

The implementation uses:

  • memory_order_relaxed for loading head/tail in the fast path
  • memory_order_acquire for reading per-slot sequence numbers (ensures data visibility)
  • memory_order_release for writing per-slot sequence numbers (publishes data)
  • compare_exchange_strong on head/tail to claim a position

Correctness & ABA-freedom

head_ and tail_ are monotonically increasing 64-bit counters — never reset, and never wrapping in any realistic runtime (2^64 push/pop operations). The slot index is counter & mask, but the CAS always targets the full counter, never the wrapped index. Because a monotonic counter can never present the same value twice, the classic ABA problem cannot arise on head_/tail_: a successful CAS proves that no other producer (resp. consumer) advanced the counter between our relaxed load and the CAS, so the position we just claimed is exclusively ours. A full lap around the ring advances the counter by capacity, so it cannot alias an earlier value.

The per-slot sequence numbers carry the other half of the Vyukov invariant and make each (slot, lap) pair distinguishable: a slot accepts a write at position pos only when seq == pos, and yields a read only when seq == pos + 1. A producer publishes with seq = pos + 1; a consumer releases with seq = pos + capacity (exactly the next counter value that maps to this slot). So a stale observer can never mistake one lap's slot state for another's, and the value written into seq is itself monotonic per slot — there is no recycled sentinel to confuse a CAS.

Deviation from canonical Vyukov: the single-element operations are fail-fast — one compare_exchange_strong, no retry loop. The canonical algorithm reloads the counter and loops on CAS failure. Here, contention (a peer claimed the same position first) simply returns false/empty even when the buffer is not actually full/empty. This is a deliberate spurious-failure trade-off for bounded fail-fast scheduling (callers retry or fall back to another tier; see fork_join_scheduling.md). It never corrupts state, double-uses a slot, or loses an element — the monotonic-counter + per-slot-sequence invariants above hold regardless of how the single CAS resolves. (try_push_batch claims a contiguous run [tail, tail+available) with one CAS; the same monotonic-counter argument makes that whole reserved run exclusively the caller's.)

Memory Layout

Each slot is padded up to a multiple of the cache line size to avoid false sharing between adjacent slots when multiple threads operate on neighboring elements. The element buffer is placed before the sequence number so that any alignment padding T requires does not land between seq and data – such interior padding could otherwise push a slot that would have fit in a single cache line over the boundary:

Slot 0: [data (T) | seq (atomic) | padding to cache-line multiple]
Slot 1: [data (T) | seq (atomic) | padding to cache-line multiple]
...

The per-slot size therefore depends on sizeof(T) (and alignof(T)), not just the cache line size. A slot fits in a single cache line only when sizeof(T) + sizeof(std::atomic<size_t>) does – roughly sizeof(T) <= 56 for a 64-byte cache line (or <= 120 on platforms with a 128-byte cache line, such as Apple ARM). Larger elements round each slot up to the next cache- line multiple; e.g. a 64-byte element yields 128-byte slots.

With a small T, 16 slots x 64 bytes = 1 KB per ring – compact enough for per-thread allocation even at high thread counts (256 threads = 256 KB total). For larger elements, scale this figure by the actual per-slot size (sizeof(Slot)).

Performance Characteristics

  • Push: O(1), fail-fast when full (no blocking, no retry loop)
  • Pop: O(1), fail-fast when empty (no blocking, no retry loop)
  • Bulk push: O(K) with single CAS for K items
  • Memory: sizeof(Slot) * capacity + 2 cache lines for head/tail, where sizeof(Slot) is sizeof(T) plus the sequence number rounded up to a cache-line multiple (one cache line for small T, more for large T)

Example Usage

// Producer thread(s)
if (ring.try_push(42)) {
// Success
}
// Consumer thread(s)
int value;
if (ring.try_pop(value)) {
// Use value
}
// Bulk push (returns number actually pushed)
std::array<int, 4> items = {1, 2, 3, 4};
size_t pushed = ring.try_push_batch(items.data(), items.size());
// pushed <= 4; caller handles overflow for remaining items
A lock-free multi-producer multi-consumer ring buffer with fixed capacity.
size_type try_push_batch(T *items, size_type count)
Attempts to push multiple elements into the buffer.
bool try_pop(T &item)
Attempts to pop an element from the buffer.
bool try_push(T &&item)
Attempts to push an element into the buffer by moving.

Definition at line 171 of file mpmc_ring_buffer.h.

Constructor & Destructor Documentation

◆ MpmcRingBuffer()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::MpmcRingBuffer ( )
inline

Constructs an empty ring buffer.

Initializes all slot sequence numbers to their position index, indicating that all slots are available for writing.

Note
Construction is not thread-safe. Ensure the buffer is fully constructed before any thread accesses it.

Definition at line 202 of file mpmc_ring_buffer.h.

◆ ~MpmcRingBuffer()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::~MpmcRingBuffer ( )
inline

Destroys the ring buffer.

All elements remaining in the buffer are destroyed. Ensure no threads are accessing the buffer when it is destroyed.

Note
Destruction is not thread-safe.

Definition at line 236 of file mpmc_ring_buffer.h.

Member Function Documentation

◆ capacity()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
static constexpr size_type dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::capacity ( )
inlinestaticconstexprnoexcept

Returns the maximum number of elements the buffer can hold.

When RoundUpToPowerOfTwo is true (default), this may be larger than the requested Capacity template parameter.

Returns
The actual capacity of the buffer.

Definition at line 584 of file mpmc_ring_buffer.h.

◆ empty()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::empty ( ) const
inline

Checks if the buffer is empty.

Returns
true if the buffer appears to contain no elements.
Note
This is a snapshot that may be immediately stale. Safe to call from any thread, but the result is only a hint.

Definition at line 541 of file mpmc_ring_buffer.h.

◆ full()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::full ( ) const
inline

Checks if the buffer is full.

Returns
true if the buffer appears to have no space for additional elements.
Note
This is a snapshot that may be immediately stale. Safe to call from any thread, but the result is only a hint.

Definition at line 555 of file mpmc_ring_buffer.h.

◆ size()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
size_type dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::size ( ) const
inline

Returns the approximate number of elements in the buffer.

Returns
An estimate of the number of elements currently in the buffer.
Note
This is a snapshot that may be immediately stale. Safe to call from any thread, but the result is only a hint. The value may momentarily exceed capacity() under concurrent modification.

Definition at line 570 of file mpmc_ring_buffer.h.

◆ try_emplace()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
template<typename... Args, std::enable_if_t< std::is_nothrow_constructible< T, Args... >::value, int > = 0>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_emplace ( Args &&... args)
inline

Attempts to construct an element in-place in the buffer.

If the buffer has space, constructs an element directly in the buffer storage using the provided arguments, avoiding any copy or move operations.

Template Parameters
ArgsThe types of arguments to forward to T's constructor.
Parameters
argsThe arguments to forward to the element constructor.
Returns
true if the element was successfully emplaced, false if the buffer was full or contended.
Note
This operation is lock-free and fail-fast (no retry loop).

Example

if (buffer.try_emplace(42, "hello")) {
// Element constructed in-place
}
bool try_emplace(Args &&... args)
Attempts to construct an element in-place in the buffer.

Definition at line 330 of file mpmc_ring_buffer.h.

◆ try_pop() [1/2]

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
OpResult< T > dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_pop ( )
inline

Attempts to pop an element from the buffer, returning an optional.

If the buffer has elements, moves the front element into an OpResult and returns it. If the buffer is empty, returns an empty OpResult.

Returns
An OpResult containing the popped element, or an empty OpResult if the buffer was empty or contended.
Note
This operation is lock-free and fail-fast (no retry loop).

Example

buffer.try_push(std::string("hello"));
if (auto result = buffer.try_pop()) {
std::cout << result.value() << std::endl;
}

Definition at line 413 of file mpmc_ring_buffer.h.

◆ try_pop() [2/2]

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_pop ( T & item)
inline

Attempts to pop an element from the buffer.

If the buffer has elements, moves the front element into the output parameter and returns true. If the buffer is empty or another thread is contending for the same slot, returns false immediately.

Parameters
[out]itemThe location to move the popped element to.
Returns
true if an element was successfully popped, false if the buffer was empty or contended.
Note
This operation is lock-free and fail-fast (no retry loop).

Example

buffer.try_push(42);
int value;
if (buffer.try_pop(value)) {
assert(value == 42);
}

Definition at line 359 of file mpmc_ring_buffer.h.

◆ try_pop_into()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_pop_into ( T * storage)
inline

Attempts to pop an element into uninitialized storage.

Similar to try_pop(T&), but uses placement new to move-construct the element into the provided storage. This is useful when T is not default-constructible.

Parameters
[out]storagePointer to uninitialized storage where the element will be move-constructed. Must have proper alignment for T.
Returns
true if an element was successfully popped, false if the buffer was empty or contended.
Note
The caller is responsible for eventually destroying the constructed object.
This operation is lock-free and fail-fast (no retry loop).

Definition at line 448 of file mpmc_ring_buffer.h.

◆ try_push() [1/2]

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
template<typename U = T, std::enable_if_t< std::is_nothrow_copy_constructible< U >::value, int > = 0>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_push ( const T & item)
inline

Attempts to push an element into the buffer by copying.

Parameters
itemThe element to push (will be copied).
Returns
true if the element was successfully pushed, false if the buffer was full or contended.
Note
This operation is lock-free and fail-fast (no retry loop).
Prefer try_push(T&&) when the source element is no longer needed.

Definition at line 294 of file mpmc_ring_buffer.h.

◆ try_push() [2/2]

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
bool dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_push ( T && item)
inline

Attempts to push an element into the buffer by moving.

If the buffer has space, the element is moved into the buffer and the function returns true. If the buffer is full or another thread is contending for the same slot, returns false immediately.

Parameters
itemThe element to push (will be moved from on success).
Returns
true if the element was successfully pushed, false if the buffer was full or contended.
Note
This operation is lock-free and fail-fast (no retry loop).

Example

std::string msg = "hello";
if (buffer.try_push(std::move(msg))) {
// msg is now empty (moved from)
}

Definition at line 272 of file mpmc_ring_buffer.h.

◆ try_push_batch()

template<typename T , size_t Capacity = 16, bool RoundUpToPowerOfTwo = true>
size_type dispenso::MpmcRingBuffer< T, Capacity, RoundUpToPowerOfTwo >::try_push_batch ( T * items,
size_type count )
inline

Attempts to push multiple elements into the buffer.

Pushes up to count elements from the array into the buffer in a single atomic tail reservation. Returns the number of elements actually pushed, which may be less than count if the buffer doesn't have enough space or if there is contention from other producers.

Validates each slot's sequence number to determine how many consecutive slots are available, then reserves them with a single CAS on the tail.

This design enables natural overflow handling:

size_t pushed = ring.try_push_batch(items, count);
if (pushed < count) {
// Push remainder to central queue or other fallback
centralQueue.push(items + pushed, count - pushed);
}
Parameters
itemsPointer to the array of items to push. Items are moved from.
countNumber of items to attempt to push.
Returns
The number of items actually pushed (0 to count).
Note
After a successful push, each slot's sequence number is published independently, so consumers can start popping slot 0 before slot count-1 is written.

Definition at line 496 of file mpmc_ring_buffer.h.


The documentation for this class was generated from the following file: