dispenso 1.5.1
A library for task parallelism
Loading...
Searching...
No Matches
Getting Started

This guide walks through the core features of dispenso with working examples. Each section includes a complete, compilable example that you can build and run.

Installation

See the README for installation instructions. Dispenso requires C++14 and CMake 3.12+.

To build the examples:

mkdir build && cd build
cmake .. -DDISPENSO_BUILD_EXAMPLES=ON
make

Basic Concepts

Thread Pools

At the heart of dispenso is the ThreadPool. A thread pool manages a set of worker threads that execute tasks. You can use the global thread pool or create your own:

// Use the global thread pool (recommended for most cases)
dispenso::ThreadPool& pool = dispenso::globalThreadPool();
// Or create a custom pool with a specific number of threads
dispenso::ThreadPool myPool(4); // 4 worker threads

Note: globalThreadPool() defaults to std::thread::hardware_concurrency() - 1 worker threads, since the calling thread typically participates in computation. Use dispenso::resizeGlobalThreadPool(n) to change it.

Task Sets

A TaskSet groups related tasks and provides a way to wait for their completion:

dispenso::TaskSet taskSet(dispenso::globalThreadPool());
taskSet.schedule([]() { /* task 1 */ });
taskSet.schedule([]() { /* task 2 */ });
taskSet.wait(); // Block until all tasks complete

Your First Parallel Loop

The simplest way to parallelize work is with parallel_for. It distributes loop iterations across available threads.

Simple per-element parallel loop:

// Process each element independently in parallel
dispenso::parallel_for(0, kArraySize, [&](size_t i) { output[i] = std::sqrt(input[i]); });

Reduction with per-thread state:

std::vector<double> partialSums;
dispenso::parallel_for(
partialSums,
[]() { return 0.0; }, // State initializer
size_t{0},
kArraySize,
[&](double& localSum, size_t start, size_t end) {
for (size_t i = start; i < end; ++i) {
localSum += input[i];
}
});
// Combine partial sums
double totalSum = 0.0;
for (double partial : partialSums) {
totalSum += partial;
}

See full example.

Key points:

  • Use the simple form for independent per-element work
  • Use chunked ranges when you want to control work distribution
  • Per-thread state enables efficient reductions
  • Options let you control parallelism and chunking strategy

Parallel Iteration with for_each

When you have a container rather than an index range, use for_each:

Parallel for_each on a vector:

std::vector<double> values = {1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0};
// Apply square root to each element in parallel
dispenso::for_each(values.begin(), values.end(), [](double& val) { val = std::sqrt(val); });

for_each_n with explicit count:

std::vector<int> partial = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
// Only process first 5 elements
dispenso::for_each_n(partial.begin(), 5, [](int& n) { n += 100; });

See full example.

Key points:

  • Works with any iterator type (including non-random-access iterators)
  • for_each_n takes an explicit count
  • Pass a TaskSet for external synchronization control

Working with Tasks

For more complex task patterns, use TaskSet and ConcurrentTaskSet directly:

Basic TaskSet:

dispenso::TaskSet taskSet(dispenso::globalThreadPool());
std::atomic<int> counter(0);
for (int i = 0; i < 10; ++i) {
taskSet.schedule([&counter, i]() { counter.fetch_add(i, std::memory_order_relaxed); });
}
taskSet.wait();

ConcurrentTaskSet with nested scheduling:

dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
std::atomic<int> total(0);
for (int i = 0; i < 5; ++i) {
taskSet.schedule([&taskSet, &total, i]() {
// Each task schedules two sub-tasks
for (int j = 0; j < 2; ++j) {
taskSet.schedule(
[&total, i, j]() { total.fetch_add(i * 10 + j, std::memory_order_relaxed); });
}
});
}
taskSet.wait();

See full example.

Key points:

  • TaskSet is for single-threaded scheduling
  • ConcurrentTaskSet allows scheduling from multiple threads
  • Both support cancellation for cooperative early termination
  • The destructor waits for all tasks to complete

Futures for Async Results

When you need return values from async operations, use Future:

Basic async and get:

dispenso::Future<int> future = dispenso::async([]() {
int result = 0;
for (int i = 1; i <= 100; ++i) {
result += i;
}
return result;
});
int result = future.get(); // blocks until ready
const Result & get() const
Definition future.h:221

Chaining with then():

dispenso::Future<double> chainedFuture = dispenso::async([]() {
return 16.0;
})
.then([](dispenso::Future<double>&& prev) {
return std::sqrt(prev.get());
})
.then([](dispenso::Future<double>&& prev) {
return prev.get() * 2.0;
});

when_all for multiple futures:

dispenso::Future<int> f1 = dispenso::async([]() { return 10; });
dispenso::Future<int> f2 = dispenso::async([]() { return 20; });
dispenso::Future<int> f3 = dispenso::async([]() { return 30; });
auto allFutures = dispenso::when_all(std::move(f1), std::move(f2), std::move(f3));
auto tuple = allFutures.get();
int sum = std::get<0>(tuple).get() + std::get<1>(tuple).get() + std::get<2>(tuple).get();
Future< detail::ResultOf< F, Args... > > async(std::launch policy, F &&f, Args &&... args)
Definition future.h:453
Future< std::vector< typename std::iterator_traits< InputIt >::value_type > > when_all(InputIt first, InputIt last)

See full example.

Key points:

  • async() launches work and returns a Future
  • then() chains dependent computations
  • when_all() waits for multiple futures
  • make_ready_future() creates an already-completed future

Task Graphs

For complex dependency patterns, build a task graph:

Diamond dependency pattern:

#include <dispenso/graph.h>
// A
// / \
// B C
// \ /
// D
dispenso::Node& A = graph.addNode([&]() { r[0] = 1.0f; });
dispenso::Node& B = graph.addNode([&]() { r[1] = r[0] * 2.0f; });
dispenso::Node& C = graph.addNode([&]() { r[2] = r[0] + 5.0f; });
dispenso::Node& D = graph.addNode([&]() { r[3] = r[1] + r[2]; });
B.dependsOn(A);
C.dependsOn(A);
D.dependsOn(B, C);
setAllNodesIncomplete(graph);
dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
executor(taskSet, graph);
N & addNode(T &&f)
Definition graph.h:605
void dependsOn(Ns &... nodes)
Definition graph.h:273

See full example.

Key points:

  • Use dependsOn() to specify prerequisites
  • Multiple executors available: single-thread, parallel_for, ConcurrentTaskSet
  • Graphs can be re-executed after calling setAllNodesIncomplete()
  • Subgraphs help organize large graphs

Pipelines

For streaming data through stages, use pipelines:

3-stage pipeline (generator -> transform -> sink):

std::vector<int> results;
int counter = 0;
dispenso::pipeline(
// Stage 1: Generator - produces values
[&counter]() -> dispenso::OpResult<int> {
if (counter >= 10) {
return {}; // Empty result signals end of input
}
return counter++;
},
// Stage 2: Transform - squares the value
[](int value) { return value * value; },
// Stage 3: Sink - collects results
[&results](int value) { results.push_back(value); });

See full example.

Key points:

  • Generator stage produces values (returns OpResult<T> or std::optional<T>)
  • Transform stages process values (can filter by returning empty result)
  • Sink stage consumes final values
  • Use stage() with a limit for parallel stages

Thread-Safe Containers

ConcurrentVector

A vector that supports concurrent push_back and growth:

Concurrent push_back from multiple threads:

dispenso::parallel_for(0, 1000, [&vec](size_t i) { vec.push_back(static_cast<int>(i)); });
iterator push_back(const T &val)

Iterator stability during concurrent modification:

vec.push_back(1);
vec.push_back(2);
vec.push_back(3);
auto it = vec.begin();
int& firstElement = *it;
// Push more elements concurrently
dispenso::parallel_for(0, 100, [&vec](size_t i) { vec.push_back(static_cast<int>(i + 100)); });
// Original iterator and reference are still valid
assert(*it == 1);
assert(firstElement == 1);

See full example.

Key points:

  • Iterators and references remain stable during growth
  • Use grow_by() for efficient batch insertion
  • Reserve capacity upfront when size is known
  • Not all operations are concurrent-safe (see docs)

Synchronization Primitives

Latch

A one-shot barrier for thread synchronization:

count_down + wait pattern:

#include <dispenso/latch.h>
constexpr int kNumWorkers = 3;
dispenso::Latch workComplete(kNumWorkers);
std::vector<int> results(kNumWorkers, 0);
dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
for (int i = 0; i < kNumWorkers; ++i) {
taskSet.schedule([&workComplete, &results, i]() {
results[static_cast<size_t>(i)] = (i + 1) * 10;
workComplete.count_down(); // Signal work is done (non-blocking)
});
}
workComplete.wait(); // Main thread waits for all workers

See full example.

Key points:

  • arrive_and_wait() decrements and blocks
  • count_down() decrements without blocking
  • wait() blocks without decrementing
  • Cannot be reset (one-shot)

Resource Pooling

Manage expensive-to-create resources with ResourcePool:

Basic buffer pool with RAII:

// Create a pool of 4 buffers
dispenso::ResourcePool<Buffer> bufferPool(4, []() { return Buffer(); });
dispenso::parallel_for(0, 100, [&bufferPool](size_t i) {
// Acquire a resource from the pool (blocks if none available)
auto resource = bufferPool.acquire();
// Use the resource
resource.get().process(static_cast<int>(i));
// Resource automatically returned to pool when 'resource' goes out of scope
});

See full example.

Key points:

  • Resources automatically return to pool when RAII wrapper destructs
  • acquire() blocks if no resources available
  • Good for database connections, buffers, etc.
  • Can be used to limit concurrency

Next Steps

  • Browse the API Reference for complete documentation
  • Check out the tests for more usage examples
  • See the benchmarks for performance testing patterns