This guide walks through the core features of dispenso with working examples. Each section includes a complete, compilable example that you can build and run.
Installation
See the README for installation instructions. Dispenso requires C++14 and CMake 3.12+.
To build the examples:
mkdir build && cd build
cmake .. -DDISPENSO_BUILD_EXAMPLES=ON
make
Basic Concepts
Thread Pools
At the heart of dispenso is the ThreadPool. A thread pool manages a set of worker threads that execute tasks. You can use the global thread pool or create your own:
Note: globalThreadPool() defaults to std::thread::hardware_concurrency() - 1 worker threads, since the calling thread typically participates in computation. Use dispenso::resizeGlobalThreadPool(n) to change it.
Task Sets
A TaskSet groups related tasks and provides a way to wait for their completion:
taskSet.schedule([]() { });
taskSet.schedule([]() { });
taskSet.wait();
Your First Parallel Loop
The simplest way to parallelize work is with parallel_for. It distributes loop iterations across available threads.
Simple per-element parallel loop:
dispenso::parallel_for(0, kArraySize, [&](size_t i) { output[i] = std::sqrt(input[i]); });
Reduction with per-thread state:
std::vector<double> partialSums;
dispenso::parallel_for(
partialSums,
[]() { return 0.0; },
size_t{0},
kArraySize,
[&](double& localSum, size_t start, size_t end) {
for (size_t i = start; i < end; ++i) {
localSum += input[i];
}
});
double totalSum = 0.0;
for (double partial : partialSums) {
totalSum += partial;
}
See full example.
Key points:
- Use the simple form for independent per-element work
- Use chunked ranges when you want to control work distribution
- Per-thread state enables efficient reductions
- Options let you control parallelism and chunking strategy
Parallel Iteration with for_each
When you have a container rather than an index range, use for_each:
Parallel for_each on a vector:
std::vector<double> values = {1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0};
dispenso::for_each(values.begin(), values.end(), [](double& val) { val = std::sqrt(val); });
for_each_n with explicit count:
std::vector<int> partial = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
dispenso::for_each_n(partial.begin(), 5, [](int& n) { n += 100; });
See full example.
Key points:
- Works with any iterator type (including non-random-access iterators)
for_each_n takes an explicit count
- Pass a
TaskSet for external synchronization control
Working with Tasks
For more complex task patterns, use TaskSet and ConcurrentTaskSet directly:
Basic TaskSet:
std::atomic<int> counter(0);
for (int i = 0; i < 10; ++i) {
taskSet.schedule([&counter, i]() { counter.fetch_add(i, std::memory_order_relaxed); });
}
taskSet.wait();
ConcurrentTaskSet with nested scheduling:
std::atomic<int> total(0);
for (int i = 0; i < 5; ++i) {
taskSet.schedule([&taskSet, &total, i]() {
for (int j = 0; j < 2; ++j) {
taskSet.schedule(
[&total, i, j]() { total.fetch_add(i * 10 + j, std::memory_order_relaxed); });
}
});
}
taskSet.wait();
See full example.
Key points:
TaskSet is for single-threaded scheduling
ConcurrentTaskSet allows scheduling from multiple threads
- Both support cancellation for cooperative early termination
- The destructor waits for all tasks to complete
Futures for Async Results
When you need return values from async operations, use Future:
Basic async and get:
int result = 0;
for (int i = 1; i <= 100; ++i) {
result += i;
}
return result;
});
int result = future.
get();
const Result & get() const
Chaining with then():
return 16.0;
})
return std::sqrt(prev.get());
})
return prev.get() * 2.0;
});
when_all for multiple futures:
auto tuple = allFutures.get();
int sum = std::get<0>(tuple).get() + std::get<1>(tuple).get() + std::get<2>(tuple).get();
Future< detail::ResultOf< F, Args... > > async(std::launch policy, F &&f, Args &&... args)
Future< std::vector< typename std::iterator_traits< InputIt >::value_type > > when_all(InputIt first, InputIt last)
See full example.
Key points:
async() launches work and returns a Future
then() chains dependent computations
when_all() waits for multiple futures
make_ready_future() creates an already-completed future
Task Graphs
For complex dependency patterns, build a task graph:
Diamond dependency pattern:
setAllNodesIncomplete(graph);
executor(taskSet, graph);
void dependsOn(Ns &... nodes)
See full example.
Key points:
- Use
dependsOn() to specify prerequisites
- Multiple executors available: single-thread, parallel_for, ConcurrentTaskSet
- Graphs can be re-executed after calling
setAllNodesIncomplete()
- Subgraphs help organize large graphs
Pipelines
For streaming data through stages, use pipelines:
3-stage pipeline (generator -> transform -> sink):
std::vector<int> results;
int counter = 0;
dispenso::pipeline(
[&counter]() -> dispenso::OpResult<int> {
if (counter >= 10) {
return {};
}
return counter++;
},
[](int value) { return value * value; },
[&results](int value) { results.push_back(value); });
See full example.
Key points:
- Generator stage produces values (returns
OpResult<T> or std::optional<T>)
- Transform stages process values (can filter by returning empty result)
- Sink stage consumes final values
- Use
stage() with a limit for parallel stages
Thread-Safe Containers
ConcurrentVector
A vector that supports concurrent push_back and growth:
Concurrent push_back from multiple threads:
dispenso::parallel_for(0, 1000, [&vec](
size_t i) { vec.
push_back(
static_cast<int>(i)); });
iterator push_back(const T &val)
Iterator stability during concurrent modification:
int& firstElement = *it;
dispenso::parallel_for(0, 100, [&vec](
size_t i) { vec.
push_back(
static_cast<int>(i + 100)); });
assert(*it == 1);
assert(firstElement == 1);
See full example.
Key points:
- Iterators and references remain stable during growth
- Use
grow_by() for efficient batch insertion
- Reserve capacity upfront when size is known
- Not all operations are concurrent-safe (see docs)
Synchronization Primitives
Latch
A one-shot barrier for thread synchronization:
count_down + wait pattern:
constexpr int kNumWorkers = 3;
std::vector<int> results(kNumWorkers, 0);
for (int i = 0; i < kNumWorkers; ++i) {
taskSet.schedule([&workComplete, &results, i]() {
results[static_cast<size_t>(i)] = (i + 1) * 10;
workComplete.count_down();
});
}
workComplete.wait();
See full example.
Key points:
arrive_and_wait() decrements and blocks
count_down() decrements without blocking
wait() blocks without decrementing
- Cannot be reset (one-shot)
Resource Pooling
Manage expensive-to-create resources with ResourcePool:
Basic buffer pool with RAII:
dispenso::parallel_for(0, 100, [&bufferPool](size_t i) {
auto resource = bufferPool.acquire();
resource.get().process(static_cast<int>(i));
});
See full example.
Key points:
- Resources automatically return to pool when RAII wrapper destructs
acquire() blocks if no resources available
- Good for database connections, buffers, etc.
- Can be used to limit concurrency
Next Steps
- Browse the API Reference for complete documentation
- Check out the tests for more usage examples
- See the benchmarks for performance testing patterns