Configuration properties

Memory Management

max_partial_aggregation_memory

  • Type: integer

  • Default value: 16MB

Maximum amount of memory in bytes for partial aggregation results. Increasing this value can result in less network transfer and lower CPU utilization by allowing more groups to be kept locally before being flushed, at the cost of additional memory usage.

max_extended_partial_aggregation_memory

  • Type: integer

  • Default value: 16MB

Maximum amount of memory in bytes for partial aggregation results if cardinality reduction is below partial_aggregation_reduction_ratio_threshold. Every time partial aggregate results size reaches max_partial_aggregation_memory bytes, the results are flushed. If cardinality reduction is below partial_aggregation_reduction_ratio_threshold, i.e. number of result rows / number of input rows > partial_aggregation_reduction_ratio_threshold, memory limit for partial aggregation is automatically doubled up to max_extended_partial_aggregation_memory. This adaptation is disabled by default, since the value of max_extended_partial_aggregation_memory equals the value of max_partial_aggregation_memory. Specify higher value for max_extended_partial_aggregation_memory to enable.

partial_aggregation_reduction_ratio_threshold

  • Type: double

  • Default value: 0.5

Cardinality reduction threshold for partial aggregation to enable adaptive memory limit increase up to max_extended_partial_aggregation_memory. Valid values are between 0 and 1. If partial aggregation results reach max_partial_aggregation_memory limit and number of result rows / number of input rows > partial_aggregation_reduction_ratio_threshold the limit is automatically doubled up to max_extended_partial_aggregation_memory.

Spilling

spill_enabled

  • Type: boolean

  • Default value: false

Spill memory to disk to avoid exceeding memory limits for the query.

spiller-spill-path

  • Type: string

  • Default value: /tmp

Directory where spilled content is written.

aggregation_spill_enabled

  • Type: boolean

  • Default value: false

When spill_enabled is true, determines whether to spill memory to disk for aggregations to avoid exceeding memory limits for the query.

join_spill_enabled

  • Type: boolean

  • Default value: false

When spill_enabled is true, determines whether to spill memory to disk for hash joins to avoid exceeding memory limits for the query.

order_by_spill_enabled

  • Type: boolean

  • Default value: false

When spill_enabled is true, determines whether to spill memory to disk for order by to avoid exceeding memory limits for the query.

aggregation_spill_memory_threshold

  • Type: integer

  • Default value: 0

Maximum amount of memory in bytes that a final aggregation can use before spilling. 0 means unlimited.

join_spill_memory_threshold

  • Type: integer

  • Default value: 0

Maximum amount of memory in bytes that a hash join build side can use before spilling. 0 means unlimited.

order_by_spill_memory_threshold

  • Type: integer

  • Default value: 0

Maximum amount of memory in bytes that an order by can use before spilling. 0 means unlimited.

spillable-reservation-growth-pct

  • Type: integer

  • Default value: 25

The spillable memory reservation growth percentage of the current memory reservation size. Suppose a growth percentage of N and the current memory reservation size of M, the next memory reservation size will be M * (1 + N / 100). After growing the memory reservation K times, the memory reservation size will be M * (1 + N / 100) ^ K. Hence the memory reservation grows along a series of powers of (1 + N / 100). If the memory reservation fails, it starts spilling.

max-spill-level

  • Type: integer

  • Default value: 4

The maximum allowed spilling level with zero being the initial spilling level. Applies to hash join build spilling which might use recursive spilling when the build table is very large. -1 means unlimited. In this case an extremely large query might run out of spilling partition bits. The max spill level can be used to prevent a query from using too much io and cpu resources.

max-spill-file-size

  • Type: integer

  • Default value: 0

The maximum allowed spill file size. Zero means unlimited.

min-spill-run-size

  • Type: integer

  • Default value: 256MB

The minimum spill run size (bytes) limit used to select partitions for spilling. The spiller tries to spill a previously spilled partitions if its data size exceeds this limit, otherwise it spills the partition with most data. If the limit is zero, then the spiller always spills a previously spilled partition if it has any data. This is to avoid spill from a partition with a small amount of data which might result in generating too many small spilled files.

Hive Connector

max_partitions_per_writers

  • Type: integer

  • Default value: 100

Maximum number of partitions per writer.

insert_existing_partitions_behavior

  • Type: string

  • Allowed values: OVERWRITE, ERROR

  • Default value: ERROR

The behavior on insert existing partitions. This property only derives the update mode field of the table writer operator output. OVERWRITE sets the update mode to indicate overwriting a partition if exists. ERROR sets the update mode to indicate error throwing if writing to an existing partition.

Spark-specific configuration

spark.legacy-size-of-null

  • Type: bool

  • Default value: true

If false, size function returns null for null input.