|
dispenso 1.6.0
A library for task parallelism
|
#include <cmath>#include <limits>#include <dispenso/cpu_set.h>#include <dispenso/small_buffer_allocator.h>#include <dispenso/task_set.h>#include "detail/can_invoke.h"#include "detail/par_for_stripe.h"#include "detail/per_thread_info.h"#include "detail/par_for_dynamic.h"#include "detail/par_for_static.h"Go to the source code of this file.
Classes | |
| struct | dispenso::ParForOptions |
Enumerations | |
| enum class | dispenso::ParForChunking { } |
Functions | |
| template<typename IntegerA , typename IntegerB > | |
| ChunkedRange< std::common_type_t< IntegerA, IntegerB > > | dispenso::makeChunkedRange (IntegerA start, IntegerB end, ParForChunking chunking=ParForChunking::kStatic) |
| template<typename IntegerA , typename IntegerB , typename IntegerC > | |
| ChunkedRange< std::common_type_t< IntegerA, IntegerB > > | dispenso::makeChunkedRange (IntegerA start, IntegerB end, IntegerC chunkSize) |
| template<typename TaskSetT , typename IntegerT , typename F , typename StateContainer , typename StateGen > | |
| void | dispenso::parallel_for (TaskSetT &taskSet, StateContainer &states, const StateGen &defaultState, const ChunkedRange< IntegerT > &range, F &&f, ParForOptions options={}) |
| template<typename TaskSetT , typename IntegerT , typename F > | |
| void | dispenso::parallel_for (TaskSetT &taskSet, const ChunkedRange< IntegerT > &range, F &&f, ParForOptions options={}) |
| template<typename IntegerT , typename F > | |
| void | dispenso::parallel_for (const ChunkedRange< IntegerT > &range, F &&f, ParForOptions options={}) |
| template<typename F , typename IntegerT , typename StateContainer , typename StateGen > | |
| void | dispenso::parallel_for (StateContainer &states, const StateGen &defaultState, const ChunkedRange< IntegerT > &range, F &&f, ParForOptions options={}) |
| template<typename TaskSetT , typename IntegerA , typename IntegerB , typename F , std::enable_if_t< std::is_integral< IntegerA >::value, bool > = true, std::enable_if_t< std::is_integral< IntegerB >::value, bool > = true, std::enable_if_t< detail::CanInvoke< F(IntegerA)>::value, bool > = true> | |
| void | dispenso::parallel_for (TaskSetT &taskSet, IntegerA start, IntegerB end, F &&f, ParForOptions options={}) |
| template<typename IntegerA , typename IntegerB , typename F , std::enable_if_t< std::is_integral< IntegerA >::value, bool > = true, std::enable_if_t< std::is_integral< IntegerB >::value, bool > = true> | |
| void | dispenso::parallel_for (IntegerA start, IntegerB end, F &&f, ParForOptions options={}) |
| template<typename TaskSetT , typename IntegerA , typename IntegerB , typename F , typename StateContainer , typename StateGen , std::enable_if_t< std::is_integral< IntegerA >::value, bool > = true, std::enable_if_t< std::is_integral< IntegerB >::value, bool > = true, std::enable_if_t< detail::CanInvoke< F(typename StateContainer::reference, IntegerA)>::value, bool > = true> | |
| void | dispenso::parallel_for (TaskSetT &taskSet, StateContainer &states, const StateGen &defaultState, IntegerA start, IntegerB end, F &&f, ParForOptions options={}) |
| template<typename IntegerA , typename IntegerB , typename F , typename StateContainer , typename StateGen , std::enable_if_t< std::is_integral< IntegerA >::value, bool > = true, std::enable_if_t< std::is_integral< IntegerB >::value, bool > = true> | |
| void | dispenso::parallel_for (StateContainer &states, const StateGen &defaultState, IntegerA start, IntegerB end, F &&f, ParForOptions options={}) |
Functions for performing parallel for loops.
Definition in file parallel_for.h.
|
strong |
Chunking strategy. Typically if the cost of each loop iteration is roughly constant, kStatic load balancing is preferred. Additionally, when making a non-waiting parallel_for call in conjunction with other parallel_for calls or with other task submissions to a TaskSet, some dynamic load balancing is automatically introduced, and selecting kStatic load balancing here can be better. If the workload per iteration deviates a lot from constant, and some ranges may be much cheaper than others, select kAdaptive (or its alias kAuto).
kAdaptive partitions the iteration space into P contiguous stripes (one per worker), each consumed front-to-back by its owner via fetch_add on a per-stripe atomic cursor. When a worker's own stripe is exhausted, it claims chunks from peers' stripes the same way, preferring same-L3 cache-group victims first to keep stolen data warm in the shared L3 (CCD on AMD, tile/SNC on Intel). Chunk size is fixed per call (auto-derived from range size / worker count).
kAuto is kept as a compatibility alias for kAdaptive. New code should prefer kAdaptive; kAuto will be deprecated in a future release and removed in 2.0.
Definition at line 83 of file parallel_for.h.
|
inline |
Create a ChunkedRange with specific chunk size
| start | The start of the range. |
| end | The end of the range. |
| chunkSize | The chunk size. |
Definition at line 283 of file parallel_for.h.
|
inline |
Create a ChunkedRange with specified chunking strategy.
| start | The start of the range. |
| end | The end of the range. |
| chunking | The strategy to use for chunking. |
Definition at line 267 of file parallel_for.h.
| void dispenso::parallel_for | ( | const ChunkedRange< IntegerT > & | range, |
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel on the global thread pool, and wait until complete.
| range | The range defining the loop extents as well as chunking strategy. |
| f | The functor to execute in parallel. Must have a signature like void(size_t begin, size_t end). |
| options | See ParForOptions for details. options.wait will always be reset to true. |
Definition at line 725 of file parallel_for.h.
| void dispenso::parallel_for | ( | IntegerA | start, |
| IntegerB | end, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel on the global thread pool and block on loop completion.
| start | The start of the loop extents. |
| end | The end of the loop extents. |
| f | The functor to execute in parallel. Must have a signature like void(size_t index) or void(size_t begin, size_t end). |
| options | See ParForOptions for details. options.wait will always be reset to true. |
Definition at line 847 of file parallel_for.h.
| void dispenso::parallel_for | ( | StateContainer & | states, |
| const StateGen & | defaultState, | ||
| const ChunkedRange< IntegerT > & | range, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel on the global thread pool and block until loop completion.
| states | A container of State (actual type of State TBD by user). The container will be resized to hold a State object per executing thread. Container must provide emplace_back() and must be forward-iterable. Examples include std::vector, std::deque, and std::list. These are the states passed into f, and states must remain a valid object until work is completed. |
| defaultState | A functor with signature State(). It will be called to initialize the objects for states. |
| range | The range defining the loop extents as well as chunking strategy. |
| f | The functor to execute in parallel. Must have a signature like void(State &s, size_t begin, size_t end). |
| options | See ParForOptions for details. options.wait will always be reset to true. |
Definition at line 749 of file parallel_for.h.
| void dispenso::parallel_for | ( | StateContainer & | states, |
| const StateGen & | defaultState, | ||
| IntegerA | start, | ||
| IntegerB | end, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel on the global thread pool and block until loop completion.
| states | A container of State (actual type of State TBD by user). The container will be resized to hold a State object per executing thread. Container must provide emplace_back() and must be forward-iterable. Examples include std::vector, std::deque, and std::list. These are the states passed into f, and states must remain a valid object until work is completed. |
| defaultState | A functor with signature State(). It will be called to initialize the objects for states. |
| start | The start of the loop extents. |
| end | The end of the loop extents. |
| f | The functor to execute in parallel. Must have a signature like void(State &s, size_t index) or void(State &s, size_t begin, size_t end). |
| options | See ParForOptions for details. options.wait will always be reset to true. |
Definition at line 989 of file parallel_for.h.
| void dispenso::parallel_for | ( | TaskSetT & | taskSet, |
| const ChunkedRange< IntegerT > & | range, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel.
| taskSet | The task set to schedule the loop on. |
| range | The range defining the loop extents as well as chunking strategy. |
| f | The functor to execute in parallel. Must have a signature like void(size_t begin, size_t end). |
| options | See ParForOptions for details. |
Definition at line 699 of file parallel_for.h.
| void dispenso::parallel_for | ( | TaskSetT & | taskSet, |
| IntegerA | start, | ||
| IntegerB | end, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel.
| taskSet | The task set to schedule the loop on. |
| start | The start of the loop extents. |
| end | The end of the loop extents. |
| f | The functor to execute in parallel. Must have a signature like void(size_t index) or void(size_t begin, size_t end). |
| options | See ParForOptions for details. |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Definition at line 783 of file parallel_for.h.
| void dispenso::parallel_for | ( | TaskSetT & | taskSet, |
| StateContainer & | states, | ||
| const StateGen & | defaultState, | ||
| const ChunkedRange< IntegerT > & | range, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel.
| taskSet | The task set to schedule the loop on. |
| states | A container of State (actual type of State TBD by user). The container will be resized to hold a State object per executing thread. Container must provide emplace_back() and must be forward-iterable. Examples include std::vector, std::deque, and std::list. These are the states passed into f, and states must remain a valid object until work is completed. When options.wait is false, "until
work is completed" extends beyond the return of this function — the caller must ensure states outlives all scheduled work (e.g. by calling taskSet.wait() before destroying it). |
| defaultState | A functor with signature State(). It will be called to initialize the objects for states. |
| range | The range defining the loop extents as well as chunking strategy. |
| f | The functor to execute in parallel. Must have a signature like void(State &s, size_t begin, size_t end). |
| options | See ParForOptions for details. |
Definition at line 555 of file parallel_for.h.
| void dispenso::parallel_for | ( | TaskSetT & | taskSet, |
| StateContainer & | states, | ||
| const StateGen & | defaultState, | ||
| IntegerA | start, | ||
| IntegerB | end, | ||
| F && | f, | ||
| ParForOptions | options = {} ) |
Execute loop over the range in parallel.
| taskSet | The task set to schedule the loop on. |
| states | A container of State (actual type of State TBD by user). The container will be resized to hold a State object per executing thread. Container must provide emplace_back() and must be forward-iterable. Examples include std::vector, std::deque, and std::list. These are the states passed into f, and states must remain a valid object until work is completed. |
| defaultState | A functor with signature State(). It will be called to initialize the objects for states. |
| start | The start of the loop extents. |
| end | The end of the loop extents. |
| f | The functor to execute in parallel. Must have a signature like void(State &s, size_t index) or void(State &s, size_t begin, size_t end). |
| options | See ParForOptions for details. |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Definition at line 894 of file parallel_for.h.