Runtime Stats

Runtime stats are used to collect the per-query velox runtime events for offline query analysis purpose. The collected stats can provide insights into the operator level query execution internals, such as how much time a query operator spent in disk spilling. The collected stats are organized in a free-form key-value for easy extension. The key is the event name and the value is defined as RuntimeCounter which is used to store and aggregate a particular event occurrences during the operator execution. RuntimeCounter has three types: kNone used to record event count, kNanos used to record event time in nanoseconds and kBytes used to record memory or storage size in bytes. It records the count of events, and the min/max/sum of the event values. The stats are stored in OperatorStats structure. The query system can aggregate the operator level stats collected from each driver by pipeline and task for analysis.

Memory Arbitration

These stats are reported by all operators.

Stats

Unit

Description

memoryReclaimCount

The number of times that the memory arbitration to reclaim memory from an spillable operator. This stats only applies for spillable operators.

memoryReclaimWallNanos

nanos

The memory reclaim execution time of an operator during the memory arbitration. It collects time spent on disk spilling or file write. This stats only applies for spillable operators.

reclaimedMemoryBytes

bytes

The reclaimed memory bytes of an operator during the memory arbitration. This stats only applies for spillable operators.

globalArbitrationCount

The number of times a request for more memory hit the arbitrator’s capacity limit and initiated a global arbitration attempt where memory is reclaimed from viable candidates chosen among all running queries based on a criterion.

localArbitrationCount

The number of times a request for more memory hit the query memory limit and initiated a local arbitration attempt where memory is reclaimed from the requestor itself.

localArbitrationQueueWallNanos

The time of an operator waiting in local arbitration queue.

localArbitrationLockWaitWallNanos

The time of an operator waiting to acquire the local arbitration lock.

globalArbitrationLockWaitWallNanos

The time of an operator waiting to acquire the global arbitration lock.

HashBuild, HashAggregation

These stats are reported only by HashBuild and HashAggregation operators.

Stats

Unit

Description

hashtable.capacity

Number of slots across all buckets in the hash table.

hashtable.numRehashes

Number of rehash() calls.

hashtable.numDistinct

Number of distinct keys in the hash table.

hashtable.numTombstones

Number of tombstone slots in the hash table.

hashtable.buildWallNanos

nanos

Time spent on building the hash table from rows collected by all the hash build operators. This stat is only reported by the HashBuild operator.

TableWriter

These stats are reported only by TableWriter operator

Stats

Unit

Description

earlyFlushedRawBytes

bytes

Number of bytes pre-maturely flushed from file writers because of memory reclaiming.

Spilling

These stats are reported by operators that support spilling.

Stats

Unit

Description

spillNotSupported

nanos

The number of a spillable operators that don’t support spill because of spill limitation. For instance, a window operator do not support spill if there is no partitioning.

spillFillWallNanos

nanos

The time spent on filling rows for spilling.

spillSortWallNanos

nanos

The time spent on sorting rows for spilling.

spillExtractVectorWallNanos

nanos

The time spent on extracting Vector from RowContainer for spilling.

spillSerializationWallNanos

nanos

The time spent on serializing rows for spilling.

spillFlushWallNanos

nanos

The time spent on copy out serialized rows for disk write. If compression is enabled, this includes the compression time.

spillWrites

The number of spill writer flushes, equivalent to number of write calls to underlying filesystem.

spillWriteWallNanos

nanos

The time spent on writing spilled rows to disk.

spillRuns

The number of times that spilling runs on an operator.

exceededMaxSpillLevel

The number of times that an operator exceeds the max spill limit.

spillReadBytes

bytes

The number of bytes read from spilled files.

spillReads

The number of spill reader reads, equivalent to the number of read calls to the underlying filesystem.

spillReadWallNanos

nanos

The time spent on read data from spilled files.

spillDeserializationWallNanos

nanos

The time spent on deserializing rows read from spilled files.

Shuffle

These stats are reported by shuffle operators.

Stats

Unit

Description

shuffleSerdeKind

Indicates the vector serde kind used by an operator for shuffle with 1 for Presto, 2 for CompactRow, 3 for UnsafeRow. It is reported by Exchange, MergeExchange and PartitionedOutput operators for now.