dispenso 1.6.0
A library for task parallelism
Loading...
Searching...
No Matches
dispenso::CpuSet Class Reference

A set of CPU IDs for affinity manipulation and topology queries. More...

#include <cpu_set.h>

Public Member Functions

DISPENSO_DLL_ACCESS CpuSet ()
 Constructs an empty CPU set.
 
DISPENSO_DLL_ACCESS void clear ()
 Removes all CPUs from the set.
 
DISPENSO_DLL_ACCESS void add (int32_t hardwareThread)
 Adds a single CPU to the set.
 
DISPENSO_DLL_ACCESS void addRange (int32_t start, int32_t end)
 Adds a range of CPUs to the set.
 
DISPENSO_DLL_ACCESS void remove (int32_t hardwareThread)
 Removes a single CPU from the set.
 
DISPENSO_DLL_ACCESS void removeRange (int32_t start, int32_t end)
 Removes a range of CPUs from the set.
 
DISPENSO_DLL_ACCESS bool contains (int32_t hardwareThread) const
 Tests whether a CPU is in the set.
 
DISPENSO_DLL_ACCESS int32_t count () const
 Returns the number of CPUs in the set.
 
DISPENSO_DLL_ACCESS bool bindCurrentThread () const
 Binds the calling thread to the CPUs in this set.
 

Static Public Member Functions

static DISPENSO_DLL_ACCESS int32_t totalNumaNodes ()
 Returns the total number of NUMA nodes detected.
 
static DISPENSO_DLL_ACCESS int32_t currentHardwareThread ()
 Returns the CPU ID of the calling thread's current core.
 
static int32_t currentHardwareThreadApprox ()
 Approximate CPU ID for the calling thread, refreshed periodically.
 
static DISPENSO_DLL_ACCESS const CpuSetnode (int32_t numaNode)
 Returns the CPU set for a specific NUMA node.
 
static DISPENSO_DLL_ACCESS const CpuSetall ()
 Returns a CPU set containing all online CPUs.
 
static DISPENSO_DLL_ACCESS int32_t availableCount ()
 Returns the number of hardware threads available to this process.
 
static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & l2CacheGroups ()
 Returns the L2 cache sharing groups.
 
static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & l3CacheGroups ()
 Returns the L3 cache sharing groups.
 
static DISPENSO_DLL_ACCESS std::vector< ThreadGroupbuildThreadGroups (int32_t maxGroupSize=kDefaultMaxGroupSize)
 Builds scheduling thread groups from cache topology.
 

Detailed Description

A set of CPU IDs for affinity manipulation and topology queries.

CpuSet provides:

  • Set manipulation: add, remove, query individual CPUs or ranges
  • Thread binding: pin the calling thread to the CPUs in the set
  • Topology queries: NUMA node enumeration, cache sharing groups

Thread Safety

CpuSet instances are NOT thread-safe. Static query methods (totalNumaNodes(), node(), all(), etc.) are safe to call from any thread after static initialization.

Capacity

A CpuSet represents CPU IDs in [0, 1024): the Linux/FreeBSD backing is a fixed cpu_set_t (CPU_SETSIZE, typically 1024) and the portable Windows/macOS backing is a 1024-bit array. CPU IDs at or beyond that bound are silently ignored by add() / addRange() / contains() (never undefined behavior), so the library degrades gracefully on machines with >= 1024 logical CPUs or sparse IDs that high: such CPUs simply are not bound or grouped. CpuSets are allocated only at startup (topology singletons, per-ThreadGroup masks), so raising the limit later (dynamic cpu_set_t on Linux via CPU_ALLOC, a wider bitset elsewhere) would be cheap; it is tracked in the roadmap rather than done now since no current hardware approaches it.

Example

// Pin current thread to CPUs 0-7
set.addRange(0, 8);
// Query NUMA topology
for (int i = 0; i < nodes; ++i) {
const auto& nodeSet = dispenso::CpuSet::node(i);
// nodeSet contains CPUs on NUMA node i
}
// Query L3 cache groups for thread group assignment
// Each group contains CPUs sharing an L3 cache (CCD on AMD, tile on Intel)
A set of CPU IDs for affinity manipulation and topology queries.
Definition cpu_set.h:139
static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & l3CacheGroups()
Returns the L3 cache sharing groups.
static DISPENSO_DLL_ACCESS const CpuSet & node(int32_t numaNode)
Returns the CPU set for a specific NUMA node.
DISPENSO_DLL_ACCESS void addRange(int32_t start, int32_t end)
Adds a range of CPUs to the set.
DISPENSO_DLL_ACCESS bool bindCurrentThread() const
Binds the calling thread to the CPUs in this set.
static DISPENSO_DLL_ACCESS int32_t totalNumaNodes()
Returns the total number of NUMA nodes detected.

Definition at line 139 of file cpu_set.h.

Member Function Documentation

◆ add()

DISPENSO_DLL_ACCESS void dispenso::CpuSet::add ( int32_t hardwareThread)

Adds a single CPU to the set.

Parameters
hardwareThreadThe CPU ID to add.

◆ addRange()

DISPENSO_DLL_ACCESS void dispenso::CpuSet::addRange ( int32_t start,
int32_t end )

Adds a range of CPUs to the set.

Parameters
startFirst CPU ID (inclusive).
endOne past the last CPU ID (exclusive).

◆ all()

static DISPENSO_DLL_ACCESS const CpuSet & dispenso::CpuSet::all ( )
static

Returns a CPU set containing all online CPUs.

On Linux, derived from the union of all NUMA node sets (which correctly handles non-contiguous CPU ID ranges). Falls back to [0, hardware_concurrency()) on unsupported platforms.

◆ availableCount()

static DISPENSO_DLL_ACCESS int32_t dispenso::CpuSet::availableCount ( )
static

Returns the number of hardware threads available to this process.

On Linux, queries the process CPU affinity mask (respects taskset/cgroup). On Windows single-group systems, queries the process affinity mask. On Windows multi-group systems, sums active processors per group (does not reflect process-level restrictions). On other platforms, falls back to std::thread::hardware_concurrency().

◆ bindCurrentThread()

DISPENSO_DLL_ACCESS bool dispenso::CpuSet::bindCurrentThread ( ) const

Binds the calling thread to the CPUs in this set.

On Linux/FreeBSD, calls pthread_setaffinity_np. On unsupported platforms (macOS, Windows without GROUP_AFFINITY support), returns false.

Returns
true if binding succeeded, false on failure or unsupported platform.

◆ buildThreadGroups()

static DISPENSO_DLL_ACCESS std::vector< ThreadGroup > dispenso::CpuSet::buildThreadGroups ( int32_t maxGroupSize = kDefaultMaxGroupSize)
static

Builds scheduling thread groups from cache topology.

Groups are built bottom-up:

  1. L2 cache groups are the atoms (SMT siblings are never split)
  2. L2 atoms are packed into groups within L3 boundaries
  3. Each group has at most maxGroupSize CPUs
  4. Groups are L3-coherent (never cross L3 cache boundaries)

On systems without cache topology detection (macOS, Windows currently), falls back to contiguous chunking of all online CPUs.

Parameters
maxGroupSizeMaximum CPUs per group. Defaults to DISPENSO_MAX_GROUP_SIZE. Clamped to at least the largest L2 atom size (e.g. 8 on Power10 with SMT8) to avoid splitting SMT siblings.
Returns
Vector of ThreadGroups, sorted by first CPU ID in each group.

Example

// On a 192-thread dual-socket EPYC (12 CCDs x 16 threads):
// groups.size() == 12, each with 16 CPUs matching a CCD
// Custom group size for finer-grained waking:
// groups.size() == 24, each with 8 CPUs (4 cores)
static DISPENSO_DLL_ACCESS std::vector< ThreadGroup > buildThreadGroups(int32_t maxGroupSize=kDefaultMaxGroupSize)
Builds scheduling thread groups from cache topology.

◆ contains()

DISPENSO_DLL_ACCESS bool dispenso::CpuSet::contains ( int32_t hardwareThread) const

Tests whether a CPU is in the set.

Parameters
hardwareThreadThe CPU ID to test.
Returns
true if the CPU is in the set.

◆ currentHardwareThread()

static DISPENSO_DLL_ACCESS int32_t dispenso::CpuSet::currentHardwareThread ( )
static

Returns the CPU ID of the calling thread's current core.

On Linux, uses the vDSO-accelerated getcpu(). On unsupported platforms, returns -1.

Note
The result is instantaneous and may be stale by the time it is used (the OS may migrate the thread). Useful as a hint for scheduling decisions, not as a hard guarantee.

◆ currentHardwareThreadApprox()

static int32_t dispenso::CpuSet::currentHardwareThreadApprox ( )
inlinestatic

Approximate CPU ID for the calling thread, refreshed periodically.

Caches the result of currentHardwareThread() in a thread-local and re-queries every kRefreshPeriod calls. Designed for hot paths that want locality hints without paying the ~15ns vDSO cost per call: the cached path is a single TLS read + increment + branch (~1-2ns).

Trade-off: between refreshes the returned CPU is stale if the OS has migrated the thread. With kRefreshPeriod=32 and a typical OS time-slice of 1-10ms, staleness is bounded to a small fraction of a slice.

Returns
A possibly-stale CPU ID, or -1 if the platform doesn't support CPU queries.

Definition at line 234 of file cpu_set.h.

◆ l2CacheGroups()

static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & dispenso::CpuSet::l2CacheGroups ( )
static

Returns the L2 cache sharing groups.

Each group contains the CPU IDs that share an L2 cache instance. On x86 with SMT, each group is typically an SMT sibling pair. On Power (SMT8), each group is 8 threads.

The groups are sorted by their first CPU ID.

Returns
A reference to the cached vector of L2 groups. Empty if detection is not supported on the current platform.

◆ l3CacheGroups()

static DISPENSO_DLL_ACCESS const std::vector< CacheGroup > & dispenso::CpuSet::l3CacheGroups ( )
static

Returns the L3 cache sharing groups.

Each group contains the CPU IDs that share an L3 cache instance. On AMD EPYC, each group corresponds to a CCD (Core Complex Die). On Intel, each group corresponds to a tile or SNC cluster.

The groups are sorted by their first CPU ID.

Returns
A reference to the cached vector of L3 groups. Empty if detection is not supported on the current platform.

◆ node()

static DISPENSO_DLL_ACCESS const CpuSet & dispenso::CpuSet::node ( int32_t numaNode)
static

Returns the CPU set for a specific NUMA node.

Parameters
numaNodeThe NUMA node index (0 to totalNumaNodes()-1).
Returns
A reference to the cached CpuSet for that node.

◆ remove()

DISPENSO_DLL_ACCESS void dispenso::CpuSet::remove ( int32_t hardwareThread)

Removes a single CPU from the set.

Parameters
hardwareThreadThe CPU ID to remove.

◆ removeRange()

DISPENSO_DLL_ACCESS void dispenso::CpuSet::removeRange ( int32_t start,
int32_t end )

Removes a range of CPUs from the set.

Parameters
startFirst CPU ID (inclusive).
endOne past the last CPU ID (exclusive).

◆ totalNumaNodes()

static DISPENSO_DLL_ACCESS int32_t dispenso::CpuSet::totalNumaNodes ( )
static

Returns the total number of NUMA nodes detected.

Always returns at least 1 (single-node fallback when detection fails or the platform has no NUMA).


The documentation for this class was generated from the following file: