|
dispenso 1.6.0
A library for task parallelism
|
Go to the source code of this file.
Classes | |
| class | dispenso::DistributedRWLock< N > |
A high-throughput distributed reader/writer lock for read-mostly workloads.
DistributedRWLock provides the same interface as std::shared_mutex and is fully compatible with std::shared_lock, std::unique_lock, and std::lock_guard.
Reads are near-contention-free: each thread hashes to one of N cache-line- aligned sub-locks, so concurrent readers rarely touch the same cache line. Writes are O(N) — a writer must lock all N sub-locks in sequence — and the lock is a spin lock (no OS backoff), so write contention scales badly.
The template parameter N controls the read-side fan-out vs. write cost trade-off. Writer cost is O(N), so N effectively encodes "how rare are writes": the larger N, the lower the write:read ratio must be for this lock to win. N must be a power of 2.
Shared/reader-writer locks pay extra bookkeeping per reader; that cost only amortizes when reads vastly outnumber writes. If writes are anywhere near reads (≥10% of ops), prefer std::mutex (or a basic spin lock for very fast critical sections) — both this lock and dispenso::RWLock spin without OS backoff and lose badly to a single- owner mutex once writers contend. std::shared_mutex isn't the right fallback there either: its per-reader bookkeeping costs more than just locking exclusively when writes are common.
Within the read-mostly regime:
| Workload | Recommended primitive |
|---|---|
| 1-2 contending threads | dispenso::RWLock |
| 4+ contending threads, very fast critical sect | DistributedRWLock<N> |
| Slow critical sections (allocation, IO, sleep) | std::shared_mutex |
Choosing N: the default N=16 covers most read-mostly cases. Drop to N=8 if you have only a handful of contending readers, or if writes are slightly less rare — the smaller writer cost is worth more than the extra reader fan-out at low concurrency. N=128 is almost never optimal: reader parallelism saturates well before writer cost does, so the larger N just makes writers worse.
Catastrophic miss case: at 32 threads with ~50% writes, RWLock is ~10× slower than std::shared_mutex on Linux and DistributedRWLock<16> is ~35× slower on Windows. A plain std::mutex would beat either — shared locks are simply the wrong tool when writes are common.
Ideal use case: a data structure resized only on rare events (e.g. a thread pool's per-thread ring array, which only changes on pool resize).
Definition in file distributed_rw_lock.h.