Configuration properties¶

Generic Configuration¶

Expression Evaluation Configuration¶

Property Name	Type	Default Value	Description
expression.eval_simplified	boolean	false	Whether to use the simplified expression evaluation path.
expression.track_cpu_usage	boolean	false	Whether to track CPU usage for individual expressions (supported by call and cast expressions). Can be expensive when processing small batches, e.g. < 10K rows.
legacy_cast	bool	false	Enables legacy CAST semantics if set to true. CAST(timestamp AS varchar) uses ‘T’ as separator between date and time (instead of a space), and the year part is not padded.
cast_match_struct_by_name	bool	false	This flag makes the Row conversion to by applied in a way that the casting row field are matched by name instead of position.
expression.max_array_size_in_reduce	integer	100000	`Reduce` function will throw an error if encountered an array of size greater than this.
expression.max_compiled_regexes	integer	100	Controls maximum number of compiled regular expression patterns per batch.
debug_disable_expression_with_peeling	bool	false	Disable optimization in expression evaluation to peel common dictionary layer from inputs. Should only be used for debugging.
debug_disable_common_sub_expressions	bool	false	Disable optimization in expression evaluation to re-use cached results for common sub-expressions. Should only be used for debugging.
debug_disable_expression_with_memoization	bool	false	Disable optimization in expression evaluation to re-use cached results between subsequent input batches that are dictionary encoded and have the same alphabet(underlying flat vector). Should only be used for debugging.
debug_disable_expression_with_lazy_inputs	bool	false	Disable optimization in expression evaluation to delay loading of lazy inputs unless required. Should only be used for debugging.
debug_lambda_function_evaluation_batch_size	integer	10000	Some lambda functions over arrays and maps are evaluated in batches of the underlying elements that comprise the arrays/maps. This is done to make the batch size managable as array vectors can have thousands of elements each and hit scaling limits as implementations typically expect BaseVectors to a couple of thousand entries. This lets up tune those batch sizes. Setting this to zero is setting unlimited batch size.
debug_bing_tile_children_max_zoom_shift	integer	5	The UDF bing_tile_children generates the children of a Bing tile based on a specified target zoom level. The number of children produced is determined by the difference between the target zoom level and the zoom level of the input tile. This configuration limits the number of children by capping the maximum zoom level difference, with a default value set to 5. This cap is necessary to prevent excessively large array outputs, which can exceed the size limits of the elements vector in the Velox array vector.

Memory Management¶

Property Name	Type	Default Value	Description
max_partial_aggregation_memory	integer	16MB	Maximum amount of memory in bytes for partial aggregation results. Increasing this value can result in less network transfer and lower CPU utilization by allowing more groups to be kept locally before being flushed, at the cost of additional memory usage.
max_extended_partial_aggregation_memory	integer	16MB	Maximum amount of memory in bytes for partial aggregation results if cardinality reduction is below partial_aggregation_reduction_ratio_threshold. Every time partial aggregate results size reaches max_partial_aggregation_memory bytes, the results are flushed. If cardinality reduction is below partial_aggregation_reduction_ratio_threshold, i.e. number of result rows / number of input rows > partial_aggregation_reduction_ratio_threshold, memory limit for partial aggregation is automatically doubled up to max_extended_partial_aggregation_memory. This adaptation is disabled by default, since the value of max_extended_partial_aggregation_memory equals the value of max_partial_aggregation_memory. Specify higher value for max_extended_partial_aggregation_memory to enable.
query_memory_reclaimer_priority	integer	2147483647	Priority of the query in the memory pool reclaimer. Lower value means higher priority. This is used in global arbitration victim selection.

Spilling¶

Property Name	Type	Default Value	Description
spill_enabled	boolean	false	Spill memory to disk to avoid exceeding memory limits for the query.
aggregation_spill_enabled	boolean	true	When spill_enabled is true, determines whether HashAggregation operator can spill to disk under memory pressure.
join_spill_enabled	boolean	true	When spill_enabled is true, determines whether HashBuild and HashProbe operators can spill to disk under memory pressure.
local_merge_enabled	boolean	false	When spill_enabled is true, determines whether LocalMerge operators can spill to disk to cap memory usage.
mixed_grouped_mode_hash_join_spill_enabled	boolean	false	When both spill_enabled and join_spill_enabled are true, determines if HashProbe and HashBuild are able to spill under mixed grouped execution mode.
order_by_spill_enabled	boolean	true	When spill_enabled is true, determines whether OrderBy operator can spill to disk under memory pressure.
window_spill_enabled	boolean	true	When spill_enabled is true, determines whether Window operator can spill to disk under memory pressure.
row_number_spill_enabled	boolean	true	When spill_enabled is true, determines whether RowNumber operator can spill to disk under memory pressure.
topn_row_number_spill_enabled	boolean	true	When spill_enabled is true, determines whether TopNRowNumber operator can spill to disk under memory pressure.
writer_spill_enabled	boolean	true	When writer_spill_enabled is true, determines whether TableWriter operator can flush the buffered data to disk under memory pressure.
aggregation_spill_memory_threshold	integer	0	Maximum amount of memory in bytes that a final aggregation can use before spilling. 0 means unlimited.
join_spill_memory_threshold	integer	0	Maximum amount of memory in bytes that a hash join build side can use before spilling. 0 means unlimited.
order_by_spill_memory_threshold	integer	0	Maximum amount of memory in bytes that an order by can use before spilling. 0 means unlimited.
writer_flush_threshold_bytes	integer	96MB	Minimum memory footprint size required to reclaim memory from a file writer by flushing its buffered data to disk.
min_spillable_reservation_pct	integer	5	The minimal available spillable memory reservation in percentage of the current memory usage. Suppose the current memory usage size of M, available memory reservation size of N and min reservation percentage of P, if M * P / 100 > N, then spiller operator needs to grow the memory reservation with percentage of ‘spillable_reservation_growth_pct’ (see below). This ensures we have sufficient amount of memory reservation to process the large input outlier.
spillable_reservation_growth_pct	integer	10	The spillable memory reservation growth percentage of the current memory usage. Suppose a growth percentage of N and the current memory usage size of M, the next memory reservation size will be M * (1 + N / 100). After growing the memory reservation K times, the memory reservation size will be M * (1 + N / 100) ^ K. Hence the memory reservation grows along a series of powers of (1 + N / 100). If the memory reservation fails, it starts spilling.
max_spill_level	integer	1	The maximum allowed spilling level with zero being the initial spilling level. Applies to hash join build spilling which might use recursive spilling when the build table is very large. -1 means unlimited. In this case an extremely large query might run out of spilling partition bits. The max spill level can be used to prevent a query from using too much io and cpu resources.
max_spill_run_rows	integer	12582912	The max number of rows to fill and spill for each spill run. This is used to cap the memory used for spilling. If it is zero, then there is no limit and spilling might run out of memory. Based on offline test results, the default value is set to 12 million rows which uses `~128MB` memory when to fill a spill run. Relation between spill rows and memory usage are as follows: 12 million rows: `128 MB`, 30 million rows: `256 MB`, 60 million rows: `512 MB`
max_spill_file_size	integer	0	The maximum allowed spill file size. Zero means unlimited.
max_spill_bytes	integer	107374182400	The max spill bytes limit set for each query. This is used to cap the storage used for spilling. If it is zero, then there is no limit and spilling might exhaust the storage or takes too long to run. The default value is set to 100 GB.
spill_write_buffer_size	integer	4MB	The maximum size in bytes to buffer the serialized spill data before write to disk for IO efficiency. If set to zero, buffering is disabled.
spill_read_buffer_size	integer	1MB	The buffer size in bytes to read from one spilled file. If the underlying filesystem supports async read, we do read-ahead with double buffering, which doubles the buffer used to read from each spill file.
min_spill_run_size	integer	256MB	The minimum spill run size (bytes) limit used to select partitions for spilling. The spiller tries to spill a previously spilled partitions if its data size exceeds this limit, otherwise it spills the partition with most data. If the limit is zero, then the spiller always spills a previously spilled partition if it has any data. This is to avoid spill from a partition with a small amount of data which might result in generating too many small spilled files.
spill_compression_codec	string	none	Specifies the compression algorithm type to compress the spilled data before write to disk to trade CPU for IO efficiency. The supported compression codecs are: zlib, snappy, lzo, zstd, lz4 and gzip. none means no compression.
spill_prefixsort_enabled	bool	false	Enable the prefix sort or fallback to timsort in spill. The prefix sort is faster than std::sort but requires the memory to build normalized prefix keys, which might have potential risk of running out of server memory.
spiller_start_partition_bit	integer	29	The start partition bit which is used with spiller_num_partition_bits together to calculate the spilling partition number.
spiller_num_partition_bits	integer	3	The number of bits (N) used to calculate the spilling partition number for hash join and RowNumber: 2 ^ N. At the moment the maximum value is 3, meaning we only support up to 8-way spill partitioning.ing.
testing.spill_pct	integer	0	Percentage of aggregation or join input batches that will be forced to spill for testing. 0 means no extra spilling.

Aggregation¶

Property Name	Type	Default Value	Description
abandon_partial_aggregation_min_rows	integer	100,000	Number of input rows to receive before starting to check whether to abandon partial aggregation.
abandon_partial_aggregation_min_pct	integer	80	Abandons partial aggregation if number of groups equals or exceeds this percentage of the number of input rows.
streaming_aggregation_min_output_batch_rows	integer	0	In streaming aggregation, wait until we have enough number of output rows to produce a batch of size specified by this. If set to 0, then Operator::outputBatchRows will be used as the min output batch rows.

Table Scan¶

Property Name	Type	Default Value	Description
max_split_preload_per_driver	integer	2	Maximum number of splits to preload per driver. Set to 0 to disable preloading.
table_scan_scaled_processing_enabled	bool	false	If true, enables the scaled table scan processing. For each table scan plan node, a scan controller is used to control the number of running scan threads based on the query memory usage. It keeps increasing the number of running threads until the query memory usage exceeds the threshold defined by ‘table_scan_scale_up_memory_usage_ratio’.
table_scan_scale_up_memory_usage_ratio	double	0.5	The query memory usage ratio used by scan controller to decide if it can increase the number of running scan threads. When the query memory usage is below this ratio, the scan controller scale up the scan processing by increasing the number of running scan threads, and stop once exceeds this ratio. The value is in the range of [0, 1]. This only applies if ‘table_scan_scaled_processing_enabled’ is true.

Table Writer¶

Property Name	Type	Default Value	Description
task_writer_count	integer	1	The number of parallel table writer threads per task.
task_partitioned_writer_count	integer	task_writer_count	The number of parallel table writer threads per task for partitioned table writes. If not set, use ‘task_writer_count’ as default.
scaled_writer_rebalance_max_memory_usage_ratio	double	0.7	The max ratio of a query used memory to its max capacity, and the scale writer exchange stops scaling writer processing if the query’s current memory usage exceeds this ratio. The value is in the range of (0, 1].
scaled_writer_max_partitions_per_writer	integer	128	The max number of logical table partitions that can be assigned to a single table writer thread. The logical table partition is used by local exchange writer for writer scaling, and multiple physical table partitions can be mapped to the same logical table partition based on the hash value of calculated partitioned ids.
scaled_writer_min_partition_processed_bytes_rebalance_threshold	integer	128MB	Minimum amount of data processed by a logical table partition to trigger writer scaling if it is detected as overloaded by scale wrirer exchange.
scaled_writer_min_processed_bytes_rebalance_threshold	integer	256MB	Minimum amount of data processed by all the logical table partitions to trigger skewed partition rebalancing by scale writer exchange.

Hive Connector¶

Hive Connector config is initialized on velox runtime startup and is shared among queries as the default config. Each query can override the config by setting corresponding query session properties such as in Prestissimo.

Configuration Property Name	Session Property Name	Type	Default Value	Description
hive.max-partitions-per-writers		integer	100	Maximum number of (bucketed) partitions per a single table writer instance.
hive.max-bucket-count	hive.max_bucket_count	integer	100000	Maximum number of buckets that a table writer is allowed to write to.
insert-existing-partitions-behavior	insert_existing_partitions_behavior	string	ERROR	Allowed values: `OVERWRITE`, `ERROR`. The behavior on insert existing partitions. This property only derives the update mode field of the table writer operator output. `OVERWRITE` sets the update mode to indicate overwriting a partition if exists. `ERROR` sets the update mode to indicate error throwing if writing to an existing partition.
hive.immutable-partitions		bool	false	True if appending data to an existing unpartitioned table is allowed. Currently this configuration does not support appending to existing partitions.
file-column-names-read-as-lower-case		bool	false	True if reading the source file column names as lower case, and planner should guarantee the input column name and filter is also lower case to achive case-insensitive read.
partition_path_as_lower_case		bool	true	If true, the partition directory will be converted to lowercase when executing a table write operation.
allow-null-partition-keys	allow_null_partition_keys	bool	true	Determines whether null values for partition keys are allowed or not. If not, fails with “Partition key must not be null” error message when writing data with null partition key. Null check for partitioning key should be used only when partitions are generated dynamically during query execution. For queries that write to fixed partitions, this check should happen much earlier before the Velox execution even starts.
ignore_missing_files		bool	false	If true, splits that refer to missing files don’t generate errors and are processed as empty splits.
max-coalesced-bytes		integer	128MB	Maximum size in bytes to coalesce requests to be fetched in a single request.
max-coalesced-distance		integer	512KB	Maximum distance in capacity units between chunks to be fetched that may be coalesced into a single request.
load-quantum	load-quantum	integer	8MB	Define the size of each coalesce load request. E.g. in Parquet scan, if it’s bigger than rowgroup size then the whole row group can be fetched together. Otherwise, the row group will be fetched column chunk by column chunk
num-cached-file-handles		integer	20000	Maximum number of entries in the file handle cache. The value must be non-negative. Zero value indicates infinite cache capacity.
file-handle-cache-enabled		bool	true	Enables caching of file handles if true. Disables caching if false. File handle cache should be disabled if files are not immutable, i.e. file content may change while file path stays the same.
sort-writer-max-output-rows	sort_writer_max_output_rows	integer	1024	Maximum number of rows for sort writer in one batch of output. This is to limit the memory usage of sort writer.
sort-writer-max-output-bytes	sort_writer_max_output_bytes	string	10MB	Maximum bytes for sort writer in one batch of output. This is to limit the memory usage of sort writer.
file-preload-threshold		integer	8MB	Usually Velox fetches the meta data firstly then fetch the rest of file. But if the file is very small, Velox can fetch the whole file directly to avoid multiple IO requests. The parameter controls the threshold when whole file is fetched.
footer-estimated-size		integer	1MB	Define the estimation of footer size in ORC and Parquet format. The footer data includes version, schema, and meta data for every columns which may or may not need to be fetched later. The parameter controls the size when footer is fetched each time. Bigger value can decrease the IO requests but may fetch more useless meta data.
cache.no_retention	cache.no_retention	bool	false	If true, evict out a query scanned data out of in-memory cache right after the access, and also skip staging to the ssd cache. This helps to prevent the cache space pollution from the one-time table scan by large batch query when mixed running with interactive query which has high data locality.
hive.reader.stats_based_filter_reorder_disabaled	hive.reader.stats_based_filter_reorder_disabaled	bool	false	If true, disable the stats based filter reordering during the read processing, and the filter execution order is totally determined by the filter type. Otherwise, the file reader will dynamically adjust the filter execution order based on the past filter execution stats.
hive.reader.timestamp-partition-value-as-local-time	hive.reader.timestamp_partition_value_as_local_time	bool	true	Reads timestamp partition value as local time if true. Otherwise, reads as UTC.

`ORC File Format Configuration`¶

Configuration Property Name	Session Property Name	Type	Default Value	Description
hive.orc.writer.stripe-max-size	orc_optimized_writer_max_stripe_size	string	64M	Maximum stripe size in orc writer.
hive.orc.writer.dictionary-max-memory	orc_optimized_writer_max_dictionary_memory	string	16M	Maximum dictionary memory that can be used in orc writer.
hive.orc.writer.integer-dictionary-encoding-enabled	orc_optimized_writer_integer_dictionary_encoding_enabled	bool	true	Whether or not dictionary encoding of integer types should be used by the ORC writer.
hive.orc.writer.string-dictionary-encoding-enabled	orc_optimized_writer_string_dictionary_encoding_enabled	bool	true	Whether or not dictionary encoding of string types should be used by the ORC writer.
hive.orc.writer.linear-stripe-size-heuristics	orc_writer_linear_stripe_size_heuristics	bool	true	Enables historical based stripe size estimation after compression.
hive.orc.writer.min-compression-size	orc_writer_min_compression_size	integer	1024	Minimal number of items in an encoded stream.
hive.orc.writer.compression-level	orc_optimized_writer_compression_level	tinyint	3 for ZSTD and 4 for ZLIB	The compression level to use with ZLIB and ZSTD.

`Parquet File Format Configuration`¶

Configuration Property Name	Session Property Name	Type	Default Value	Description
hive.parquet.writer.enable-dictionary	hive.parquet.writer.enable_dictionary	bool	true	Whether to enable dictionary encoding when writing into Parquet through the Arrow bridge.
hive.parquet.writer.dictionary-page-size-limit	hive.parquet.writer.dictionary_page_size_limit	string	1MB	Dictionary Page size used when writing into Parquet through Arrow bridge. This setting is applicable only when dictionary encoding is enabled.
hive.parquet.writer.timestamp-unit	hive.parquet.writer.timestamp_unit	tinyint	9	Timestamp unit used when writing timestamps into Parquet through Arrow bridge. Valid values are 3 (millisecond), 6 (microsecond), and 9 (nanosecond).
hive.parquet.writer.datapage-version	hive.parquet.writer.datapage_version	string	V1	Data Page version used when writing into Parquet through Arrow bridge. Valid values are “V1” and “V2”.
hive.parquet.writer.page-size	hive.parquet.writer.page_size	string	1MB	Data Page size used when writing into Parquet through Arrow bridge.
hive.parquet.writer.batch-size	hive.parquet.writer.batch_size	integer	1024	Batch size used when writing into Parquet through Arrow bridge.
hive.parquet.writer.created-by		string	parquet-cpp-velox version 0.0.0	Created-by value used when writing to Parquet.

`Amazon S3 Configuration`¶

Property Name	Type	Default Value	Description
hive.s3.use-instance-credentials	bool	false	Use the EC2 metadata service to retrieve API credentials. This works with IAM roles in EC2.
hive.s3.aws-access-key	string		Default AWS access key to use.
hive.s3.aws-secret-key	string		Default AWS secret key to use.
hive.s3.endpoint	string		The S3 storage endpoint server. This can be used to connect to an S3-compatible storage system instead of AWS.
hive.s3.endpoint.region	string	us-east-1	The S3 storage endpoint server region. Default is set by the AWS SDK. If not configured, region will be attempted to be parsed from the hive.s3.endpoint value.
hive.s3.path-style-access	bool	false	Use path-style access for all requests to the S3-compatible storage. This is for S3-compatible storage that doesn’t support virtual-hosted-style access.
hive.s3.ssl.enabled	bool	true	Use HTTPS to communicate with the S3 API.
hive.s3.log-level	string	FATAL	Allowed values: “OFF”, “FATAL”, “ERROR”, “WARN”, “INFO”, “DEBUG”, “TRACE”. Granularity of logging generated by the AWS C++ SDK library.
hive.s3.log-location	string	“”	Specifies the path where the log files are created. Generated log files start with “aws_sdk_” and use the default AWS S3 logger. Example: setting “/tmp” results in files “/tmp/aws_sdk_*”.
hive.s3.payload-signing-policy	string	Never	Allowed values: “Always”, “RequestDependent”, “Never”. When set to “Always”, the payload checksum is included in the signature calculation. When set to “RequestDependent”, the payload checksum is included based on the value returned by “AmazonWebServiceRequest::SignBody()”.
hive.s3.iam-role	string		IAM role to assume.
hive.s3.iam-role-session-name	string	velox-session	Session name associated with the IAM role.
hive.s3.use-proxy-from-env	bool	false	Utilize the configuration of the environment variables http_proxy, https_proxy, and no_proxy for use with the S3 API.
hive.s3.connect-timeout	string		Socket connect timeout.
hive.s3.socket-timeout	string		Socket read timeout.
hive.s3.max-connections	integer		Maximum concurrent TCP connections for a single http client.
hive.s3.max-attempts	integer		Maximum attempts for connections to a single http client, work together with retry-mode. By default, it’s 3 for standard/adaptive mode and 10 for legacy mode.
hive.s3.retry-mode	string		Allowed values: “standard”, “adaptive”, “legacy”. By default it’s empty, S3 client will be created with RetryStrategy. Legacy mode only enables throttled retry for transient errors. Standard mode is built on top of legacy mode and has throttled retry enabled for throttling errors apart from transient errors. Adaptive retry mode dynamically limits the rate of AWS requests to maximize success rate.
hive.s3.aws-credentials-provider	string		A custom credential provider, if specified, will be used to create the client in favor of other authentication mechanisms. The provider must be registered using “registerAWSCredentialsProvider” before it can be used.

Bucket Level Configuration¶

All “hive.s3.*” config (except “hive.s3.log-level”) can be set on a per-bucket basis. The bucket-specific option is set by replacing the “hive.s3.” prefix on a config with “hive.s3.bucket.BUCKETNAME.”, where BUCKETNAME is the name of the bucket. e.g. the endpoint for a bucket named “velox” can be specified by the config “hive.s3.bucket.velox.endpoint”. When connecting to a bucket, all options explicitly set will override the base “hive.s3.” values. These semantics are similar to the Apache Hadoop-Aws module.

`Google Cloud Storage Configuration`¶

Property Name	Type	Description
hive.gcs.endpoint	string	The GCS storage URI.
hive.gcs.json-key-file-path	string	The GCS service account configuration JSON key file.
hive.gcs.max-retry-count	integer	The GCS maximum retry counter of transient errors.
hive.gcs.max-retry-time	string	The GCS maximum time allowed to retry transient errors.

`Azure Blob Storage Configuration`¶

Property Name	Type	Default Value	Description
fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net	string	SharedKey	Specifies the authentication mechanism to use for Azure storage accounts. Allowed values: “SharedKey”, “OAuth”, “SAS”. “SharedKey”: Uses the storage account name and key for authentication. “OAuth”: Utilizes OAuth tokens for secure authentication. “SAS”: Employs Shared Access Signatures for granular access control.
fs.azure.account.key.<storage-account>.dfs.core.windows.net	string		The credentials to access the specific Azure Blob Storage account, replace <storage-account> with the name of your Azure Storage account. This property aligns with how Spark configures Azure account key credentials for accessing Azure storage, by setting this property multiple times with different storage account names, you can access multiple Azure storage accounts.
fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net	string		Specifies a fixed SAS (Shared Access Signature) token for accessing Azure storage. This token provides scoped and time-limited access to specific resources. Use this property when a pre-generated SAS token is used for authentication.
fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net	string		Specifies the client ID of the Azure AD application used for OAuth 2.0 authentication. This client ID is required when using OAuth as the authentication type.
fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net	string		Specifies the client secret of the Azure AD application used for OAuth 2.0 authentication. This secret is required in conjunction with the client ID to authenticate the application.
fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net	string		Specifies the OAuth 2.0 token endpoint URL for the Azure AD application. This endpoint is used to acquire access tokens for authenticating with Azure storage. The URL follows the format: https://login.microsoftonline.com/<tenant-id>/oauth2/token.

Presto-specific Configuration¶

Property Name	Type	Default Value	Description
presto.array_agg.ignore_nulls	bool	false	If true, `array_agg` function ignores null inputs.

Spark-specific Configuration¶

Property Name	Type	Default Value	Description
spark.legacy_size_of_null	bool	true	If false, `size` function returns null for null input.
spark.bloom_filter.expected_num_items	integer	1000000	The default number of expected items for the bloom filter in `bloom_filter_agg()` function.
spark.bloom_filter.num_bits	integer	8388608	The default number of bits to use for the bloom filter in `bloom_filter_agg()` function.
spark.bloom_filter.max_num_bits	integer	4194304	The maximum number of bits to use for the bloom filter in `bloom_filter_agg()` function, the value of this config can not exceed the default value.
spark.partition_id	integer		The current task’s Spark partition ID. It’s set by the query engine (Spark) prior to task execution.
spark.legacy_date_formatter	bool	false	If true, Simple Date Format is used for time formatting and parsing. Joda date formatter is used by default. Joda date formatter performs strict checking of its input and uses different pattern string. For example, the 2015-07-22 10:00:00 timestamp cannot be parsed if pattern is yyyy-MM-dd because the parser does not consume whole input. Another example is that the ‘W’ pattern, which means week in month, is not supported. For more differences, see #10354.
spark.legacy_statistical_aggregate	bool	false	If true, Spark statistical aggregation functions including skewness, kurtosis, stddev, stddev_samp, variance, var_samp, covar_samp and corr will return NaN instead of NULL when dividing by zero during expression evaluation.
spark.json_ignore_null_fields	bool	true	If true, ignore null fields when generating JSON string. If false, null fields are included with a null value.

Tracing¶

Property Name	Type	Default Value	Description
query_trace_enabled	bool	false	If true, enable query tracing.
query_trace_dir	string		The root directory to store the tracing data and metadata for a query.
query_trace_node_id	string		The plan node id whose input data will be trace. If it is empty, then we only trace the query metadata which includes the query plan and configs etc.
query_trace_task_reg_exp	string		The regexp of traced task id. We only enable trace on a task if its id matches.
query_trace_max_bytes	integer	0	The max trace bytes limit. Tracing is disabled if zero.
query_trace_dry_run	boolean	false	If true, we only collect the input trace for a given operator but without the actual execution. This is used for crash debugging.

Configuration properties¶

Generic Configuration¶

Expression Evaluation Configuration¶

Memory Management¶

Spilling¶

Aggregation¶

Table Scan¶

Table Writer¶

Hive Connector¶

`ORC File Format Configuration`¶

`Parquet File Format Configuration`¶

`Amazon S3 Configuration`¶

Bucket Level Configuration¶

`Google Cloud Storage Configuration`¶

`Azure Blob Storage Configuration`¶

Presto-specific Configuration¶

Spark-specific Configuration¶

Tracing¶

Table of Contents

Previous topic

Next topic

This Page

Configuration properties¶

Generic Configuration¶

Expression Evaluation Configuration¶

Memory Management¶

Spilling¶

Aggregation¶

Table Scan¶

Table Writer¶

Hive Connector¶

ORC File Format Configuration¶

Parquet File Format Configuration¶

Amazon S3 Configuration¶

Bucket Level Configuration¶

Google Cloud Storage Configuration¶

Azure Blob Storage Configuration¶

Presto-specific Configuration¶

Spark-specific Configuration¶

Tracing¶

`ORC File Format Configuration`¶

`Parquet File Format Configuration`¶

`Amazon S3 Configuration`¶

`Google Cloud Storage Configuration`¶

`Azure Blob Storage Configuration`¶