|
dispenso 1.4.1
A library for task parallelism
|
#include <dispenso/detail/math.h>#include <dispenso/detail/op_result.h>#include <dispenso/platform.h>Go to the source code of this file.
Typedefs | |
| template<typename T > | |
| using | dispenso::AlignedDeleter = detail::AlignedFreeDeleter<T> |
| Deleter for smart pointers that use aligned memory allocation. | |
| template<typename T > | |
| using | dispenso::AlignedBuffer = detail::AlignedBuffer<T> |
| Buffer with proper alignment for type T. | |
| template<typename T > | |
| using | dispenso::AlignedAtomic = detail::AlignedAtomic<T> |
| Cache-line aligned atomic pointer. | |
| using | dispenso::StaticChunking = detail::StaticChunking |
| Information for statically chunking a range across threads. | |
Functions | |
| void * | dispenso::alignedMalloc (size_t bytes, size_t alignment) |
| Allocate memory with a specified alignment. | |
| void * | dispenso::alignedMalloc (size_t bytes) |
| Allocate memory aligned to cache line size. | |
| void | dispenso::alignedFree (void *ptr) |
| Free memory allocated by alignedMalloc. | |
| constexpr uintptr_t | dispenso::alignToCacheLine (uintptr_t val) |
| Align a value up to the next cache line boundary. | |
| void | dispenso::cpuRelax () |
| CPU relaxation hint for spin loops. | |
| constexpr uint64_t | dispenso::nextPow2 (uint64_t v) |
| Round up to the next power of 2. | |
| constexpr uint32_t | dispenso::log2const (uint64_t v) |
| Compute log base 2 of a value (compile-time). | |
| uint32_t | dispenso::log2 (uint64_t v) |
| Compute log base 2 of a value (runtime). | |
| StaticChunking | dispenso::staticChunkSize (ssize_t items, ssize_t chunks) |
| Compute optimal static chunking for load balancing. | |
A collection of utility functions and types for memory alignment, bit manipulation, and performance optimization.
kCacheLineSize used by several utilities here is defined in <dispenso/platform.h>, which is included by this header. Definition in file util.h.
| using dispenso::AlignedAtomic = detail::AlignedAtomic<T> |
Cache-line aligned atomic pointer.
An atomic pointer aligned to cache line boundary to avoid false sharing. Inherits from std::atomic<T*>.
| T | The pointed-to type |
Example:
| using dispenso::AlignedBuffer = detail::AlignedBuffer<T> |
Buffer with proper alignment for type T.
Provides uninitialized storage with proper alignment for type T. Useful for manual object lifetime management or placement new scenarios.
| T | The type to provide storage for |
Example:
| using dispenso::AlignedDeleter = detail::AlignedFreeDeleter<T> |
Deleter for smart pointers that use aligned memory allocation.
This deleter calls the destructor and frees memory allocated with alignedMalloc. It can be used with std::unique_ptr and std::shared_ptr.
| T | The type being deleted |
Example:
| using dispenso::StaticChunking = detail::StaticChunking |
Information for statically chunking a range across threads.
When dividing work into static chunks, using a simple chunk size plus remainder can lead to poor load balancing. This struct provides the optimal chunking strategy where some tasks get ceil(items/chunks) work and others get floor(items/chunks).
|
inline |
|
inline |
Allocate memory aligned to cache line size.
This is a convenience overload that aligns to kCacheLineSize (typically 64 bytes), which helps avoid false sharing in concurrent data structures.
| bytes | The number of bytes to allocate |
|
inline |
Allocate memory with a specified alignment.
This function allocates memory aligned to the specified boundary. The alignment must be a power of 2 and at least sizeof(uintptr_t).
| bytes | The number of bytes to allocate |
| alignment | The alignment requirement in bytes (must be power of 2) |
Example:
|
inlineconstexpr |
Align a value up to the next cache line boundary.
Rounds up the input value to the next multiple of kCacheLineSize. Useful for manual memory layout to avoid false sharing.
| val | The value to align |
Example:
|
inline |
CPU relaxation hint for spin loops.
Emits a platform-specific instruction (PAUSE on x86, YIELD on ARM) to improve spin loop performance and reduce power consumption. Use this in busy-wait loops to be friendlier to hyper-threading and the CPU pipeline.
Example:
|
inline |
Compute log base 2 of a value (runtime).
Computes floor(log2(v)) using platform-specific intrinsics for optimal performance. On x86/x64, uses __builtin_clzll or __lzcnt64. Falls back to constexpr version on other platforms.
| v | Input value (must be > 0) |
Example:
|
inlineconstexpr |
Compute log base 2 of a value (compile-time).
Computes floor(log2(v)) at compile time. Useful for template metaprogramming and constexpr contexts.
| v | Input value (must be > 0) |
Example:
|
constexpr |
Round up to the next power of 2.
Computes the smallest power of 2 that is greater than or equal to the input value.
| v | Input value |
Example:
|
inline |
Compute optimal static chunking for load balancing.
Divides items into chunks such that the work is distributed as evenly as possible. Returns chunking info where some tasks get ceil(items/chunks) and others get floor(items/chunks).
| items | Total number of items to process |
| chunks | Number of chunks to divide into (must be > 0) |
Example: