|
dispenso 1.6.0
A library for task parallelism
|
#include <parallel_for.h>
Public Attributes | |
| uint32_t | maxThreads = std::numeric_limits<int32_t>::max() |
| bool | wait = true |
| ParForChunking | defaultChunking = ParForChunking::kStatic |
| uint32_t | minItemsPerChunk = 1 |
| uint32_t | granularity = 1 |
| bool | reuseExistingState = false |
A set of options to control parallel_for
Definition at line 93 of file parallel_for.h.
| ParForChunking dispenso::ParForOptions::defaultChunking = ParForChunking::kStatic |
Specify whether default chunking should be static or auto (dynamic load balancing). This is used when invoking the version of parallel_for that takes index parameters (vs a ChunkedRange).
Definition at line 114 of file parallel_for.h.
| uint32_t dispenso::ParForOptions::granularity = 1 |
Specify a chunk-size granularity contract. When > 1, every chunk passed to the user's lambda by parallel work is guaranteed to have (end - begin) be a multiple of granularity. If the total range size is not a multiple of granularity, the sub-granularity remainder (the "tail") will be executed serially on the calling thread after the parallel portion completes; that single tail invocation is the only call whose (end - begin) may not be a multiple of granularity.
Useful for SIMD inner loops (e.g. granularity = 8 for AVX-256 doubles), block algorithms, or any inner loop with a fixed unroll factor where partial blocks add overhead.
granularity is a contract on chunk boundaries, not on chunk size: chunks may still vary widely (e.g. 8, 16, 24, 8000) — they just won't be 9 or 17 or 8001.
Ignored (treated as 1) when an explicit chunk size is provided to ChunkedRange; in that case the user is already specifying exact chunk granularity.
Definition at line 148 of file parallel_for.h.
| uint32_t dispenso::ParForOptions::maxThreads = std::numeric_limits<int32_t>::max() |
The maximum number of threads to use. This can be used to limit the number of threads below the number associated with the TaskSet's thread pool to control the degree of concurrency. Setting maxThreads to zero or one will result in serial operation.
Definition at line 99 of file parallel_for.h.
| uint32_t dispenso::ParForOptions::minItemsPerChunk = 1 |
Specify a minimum number of items per chunk for static or auto dynamic load balancing. Cheaper workloads should have a higher number of minWorkItems. Will be ignored if an explicit chunk size is provided to ChunkedRange.
For kStatic, this is a hard floor on chunk size (only the last chunk may be smaller). For kAdaptive, it is a SOFT HINT: the owner's pop step and the stealer's split point both respect minItemsPerChunk, but after a steal the victim may be left with a sub-minItemsPerChunk remainder. The remainder is still processed by the original owner, so up to numWorkers sub-minItemsPerChunk chunks may appear in adversarial steal patterns. If you need a hard granularity guarantee, use the granularity option or pass an explicit chunk size to makeChunkedRange.
Definition at line 129 of file parallel_for.h.
| bool dispenso::ParForOptions::reuseExistingState = false |
When set to false, and StateContainers are supplied to parallel_for, re-create container from scratch each call to parallel_for. When true, reuse existing state as much as possible (only create new state if we require more than is already available in the container).
Definition at line 155 of file parallel_for.h.
| bool dispenso::ParForOptions::wait = true |
Specify whether the return of the parallel_for signifies the work is complete. If the parallel_for is initiated without providing a TaskSet, the parallel_for will always wait.
Definition at line 108 of file parallel_for.h.