|
dispenso 1.6.0
A library for task parallelism
|
A set of CPU IDs for affinity manipulation and topology queries. More...
#include <cpu_set.h>
Public Member Functions | |
| DISPENSO_DLL_ACCESS | CpuSet () |
| Constructs an empty CPU set. | |
| DISPENSO_DLL_ACCESS void | clear () |
| Removes all CPUs from the set. | |
| DISPENSO_DLL_ACCESS void | add (int32_t hardwareThread) |
| Adds a single CPU to the set. | |
| DISPENSO_DLL_ACCESS void | addRange (int32_t start, int32_t end) |
| Adds a range of CPUs to the set. | |
| DISPENSO_DLL_ACCESS void | remove (int32_t hardwareThread) |
| Removes a single CPU from the set. | |
| DISPENSO_DLL_ACCESS void | removeRange (int32_t start, int32_t end) |
| Removes a range of CPUs from the set. | |
| DISPENSO_DLL_ACCESS bool | contains (int32_t hardwareThread) const |
| Tests whether a CPU is in the set. | |
| DISPENSO_DLL_ACCESS int32_t | count () const |
| Returns the number of CPUs in the set. | |
| DISPENSO_DLL_ACCESS bool | bindCurrentThread () const |
| Binds the calling thread to the CPUs in this set. | |
Static Public Member Functions | |
| static DISPENSO_DLL_ACCESS int32_t | totalNumaNodes () |
| Returns the total number of NUMA nodes detected. | |
| static DISPENSO_DLL_ACCESS int32_t | currentHardwareThread () |
| Returns the CPU ID of the calling thread's current core. | |
| static int32_t | currentHardwareThreadApprox () |
| Approximate CPU ID for the calling thread, refreshed periodically. | |
| static DISPENSO_DLL_ACCESS const CpuSet & | node (int32_t numaNode) |
| Returns the CPU set for a specific NUMA node. | |
| static DISPENSO_DLL_ACCESS const CpuSet & | all () |
| Returns a CPU set containing all online CPUs. | |
| static DISPENSO_DLL_ACCESS int32_t | availableCount () |
| Returns the number of hardware threads available to this process. | |
| static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & | l2CacheGroups () |
| Returns the L2 cache sharing groups. | |
| static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & | l3CacheGroups () |
| Returns the L3 cache sharing groups. | |
| static DISPENSO_DLL_ACCESS std::vector< ThreadGroup > | buildThreadGroups (int32_t maxGroupSize=kDefaultMaxGroupSize) |
| Builds scheduling thread groups from cache topology. | |
A set of CPU IDs for affinity manipulation and topology queries.
CpuSet provides:
CpuSet instances are NOT thread-safe. Static query methods (totalNumaNodes(), node(), all(), etc.) are safe to call from any thread after static initialization.
A CpuSet represents CPU IDs in [0, 1024): the Linux/FreeBSD backing is a fixed cpu_set_t (CPU_SETSIZE, typically 1024) and the portable Windows/macOS backing is a 1024-bit array. CPU IDs at or beyond that bound are silently ignored by add() / addRange() / contains() (never undefined behavior), so the library degrades gracefully on machines with >= 1024 logical CPUs or sparse IDs that high: such CPUs simply are not bound or grouped. CpuSets are allocated only at startup (topology singletons, per-ThreadGroup masks), so raising the limit later (dynamic cpu_set_t on Linux via CPU_ALLOC, a wider bitset elsewhere) would be cheap; it is tracked in the roadmap rather than done now since no current hardware approaches it.
| DISPENSO_DLL_ACCESS void dispenso::CpuSet::add | ( | int32_t | hardwareThread | ) |
Adds a single CPU to the set.
| hardwareThread | The CPU ID to add. |
| DISPENSO_DLL_ACCESS void dispenso::CpuSet::addRange | ( | int32_t | start, |
| int32_t | end ) |
Adds a range of CPUs to the set.
| start | First CPU ID (inclusive). |
| end | One past the last CPU ID (exclusive). |
|
static |
Returns a CPU set containing all online CPUs.
On Linux, derived from the union of all NUMA node sets (which correctly handles non-contiguous CPU ID ranges). Falls back to [0, hardware_concurrency()) on unsupported platforms.
|
static |
Returns the number of hardware threads available to this process.
On Linux, queries the process CPU affinity mask (respects taskset/cgroup). On Windows single-group systems, queries the process affinity mask. On Windows multi-group systems, sums active processors per group (does not reflect process-level restrictions). On other platforms, falls back to std::thread::hardware_concurrency().
| DISPENSO_DLL_ACCESS bool dispenso::CpuSet::bindCurrentThread | ( | ) | const |
Binds the calling thread to the CPUs in this set.
On Linux/FreeBSD, calls pthread_setaffinity_np. On unsupported platforms (macOS, Windows without GROUP_AFFINITY support), returns false.
|
static |
Builds scheduling thread groups from cache topology.
Groups are built bottom-up:
maxGroupSize CPUsOn systems without cache topology detection (macOS, Windows currently), falls back to contiguous chunking of all online CPUs.
| maxGroupSize | Maximum CPUs per group. Defaults to DISPENSO_MAX_GROUP_SIZE. Clamped to at least the largest L2 atom size (e.g. 8 on Power10 with SMT8) to avoid splitting SMT siblings. |
| DISPENSO_DLL_ACCESS bool dispenso::CpuSet::contains | ( | int32_t | hardwareThread | ) | const |
Tests whether a CPU is in the set.
| hardwareThread | The CPU ID to test. |
|
static |
Returns the CPU ID of the calling thread's current core.
On Linux, uses the vDSO-accelerated getcpu(). On unsupported platforms, returns -1.
|
inlinestatic |
Approximate CPU ID for the calling thread, refreshed periodically.
Caches the result of currentHardwareThread() in a thread-local and re-queries every kRefreshPeriod calls. Designed for hot paths that want locality hints without paying the ~15ns vDSO cost per call: the cached path is a single TLS read + increment + branch (~1-2ns).
Trade-off: between refreshes the returned CPU is stale if the OS has migrated the thread. With kRefreshPeriod=32 and a typical OS time-slice of 1-10ms, staleness is bounded to a small fraction of a slice.
|
static |
Returns the L2 cache sharing groups.
Each group contains the CPU IDs that share an L2 cache instance. On x86 with SMT, each group is typically an SMT sibling pair. On Power (SMT8), each group is 8 threads.
The groups are sorted by their first CPU ID.
|
static |
Returns the L3 cache sharing groups.
Each group contains the CPU IDs that share an L3 cache instance. On AMD EPYC, each group corresponds to a CCD (Core Complex Die). On Intel, each group corresponds to a tile or SNC cluster.
The groups are sorted by their first CPU ID.
|
static |
Returns the CPU set for a specific NUMA node.
| numaNode | The NUMA node index (0 to totalNumaNodes()-1). |
| DISPENSO_DLL_ACCESS void dispenso::CpuSet::remove | ( | int32_t | hardwareThread | ) |
Removes a single CPU from the set.
| hardwareThread | The CPU ID to remove. |
| DISPENSO_DLL_ACCESS void dispenso::CpuSet::removeRange | ( | int32_t | start, |
| int32_t | end ) |
Removes a range of CPUs from the set.
| start | First CPU ID (inclusive). |
| end | One past the last CPU ID (exclusive). |
|
static |
Returns the total number of NUMA nodes detected.
Always returns at least 1 (single-node fallback when detection fails or the platform has no NUMA).