November 2023 Update¶
Core Library¶
Add spilling support for aggregations over distinct or sorted inputs. #7305, #7526
Add support to lazily create the spill directory. #7660
Add config
merge_exchange.max_buffer_size
to limit the total memory used by exchange clients. #7410Add configs
sort_writer_max_output_rows
andsort_writer_max_output_bytes
to limit memory usage of sort writer. #7339Add termination time to TaskStats. This is the time when the downstream workers finish consuming results. Clients such as Prestissimo can use this metric to clean up tasks. #7479
Add Presto Type Parser based on Flex and Bison. This can be used by a Presto verifier to parse types in the response from the Presto server #7568
Add support for named row fields in type signature and binding. Example:
row(foo bigint)
in a signature only binds to inputs whose type is row with a single BIGINT field named foo. #7523Add support to shrink cache if clients such as Prestissimo detect high memory usage on a worker. #7547, #7645
Fix distinct aggregations with global grouping sets on empty input. Instead of empty results, for global grouping sets, the expected result is a row per global grouping set with the groupId as the key value. #7353
Fix incorrect runtime stats reporting when memory arbitration is triggered. #7394
Fix
Timestamp::toMillis()
to overflow only if the final result overflows. #7506
Presto Functions¶
Add
cosine_similarity()
scalar function.Add support for INTERVAL DAY TO SECOND type input to
plus()
,minus()
,multiply()
functions.Add support for combination of TIMESTAMP, INTERVAL DAY TO SECOND type inputs to
plus()
,minus()
functions.Add support for INTERVAL DAY TO SECOND, DOUBLE input arguments to
divide()
function.Add support to allow non-constant IN list in IN Presto predicate. #7497
Register
array_frequency()
function for all primitive types.Fix bitwise shift functions to accept shift value 0.
Fix url_extract_* functions to return null on malformed inputs and support absolute URIs.
Fix
from_utf8()
handling of invalid UTF-8 codepoint. #7442Fix
entropy()
aggregate function to return 0.0 on null inputs.Fix
array_sort()
function from producing invalid dictionary vectors. #7800Fix
lead()
,lag()
window functions to return null when the offset is null. #7254Fix DECIMAL to VARCHAR cast by adding trailing zeros when the value is 0. #7588
Spark Functions¶
Add
month()
,quarter()
,unscaled_value()
,regex_replace()
scalar functions.Add
make_decimal()
,decimal_round()
special form functions.Add support for DECIMAL compare with arguments of different precision and scale. #6207
Add support for complex type inputs to
map()
function.Fix
dayofmonth()
anddayofyear()
to allow only DATE type as input and return an INTEGER type.Fix
map()
function from throwing an exception when used inside an if or switch statement. #7727
Hive Connector¶
Add DirectBufferedInput: a selective BufferedInput without caching. #7217
Add support for reading UNSIGNED INTEGER types in Parquet format. #6728
Add spill support for DWRF sort writer. #7326
Add
file_handle_cache_enabled
Hive Config to enable or disable caching file handles.Add documentation for
num_cached_file_handles
configuration property.Add support for DECIMAL and VARCHAR types in BenchmarkParquetReader. #6275
Arrow¶
Add support to export constant vector as Arrow REE array. #7327, #7398
Add support for TIMESTAMP type in Arrow bridge. #7435
Fix Arrow bridge to ensure the null_count is always set and add support for null constants. #7411
Performance and Correctness¶
Add PrestoQueryRunner that can be used to verify test results against Presto. #7628
Add support for plans with TableScan in Join Fuzzer. #7571
Add support for custom input generators in Aggregation Fuzzer. #7594
Add support for aggregations over sorted inputs in AggregationFuzzer #7620
Add support for custom result verifiers in AggregationFuzzer. #7674
Add custom verifiers for
approx_percentile()
andapprox_distinct()
in AggregationFuzzer. #7654Optimize map subscript by caching input keys in a hash map. #7191
Optimize FlatVector<StringView>::copy() slow path using a DecodedVector and pre-allocated the string buffer. #7357
Optimize element_at for maps with complex type keys by sorting the keys and using binary search. #7365
Optimize
concat()
by adding a fast path for primitive values. #7393Optimize
json_parse()
function exception handling by switching to simdjson. #7658Optimize add_items for VARCHAR type by avoiding a deep copy. #7395
Optimize remaining filter by lazily evaluating multi-referenced fields. #7433
Optimize
TopN::addInput()
by deferring copying of the non-key columns. #7172Optimize by sorting the inputs once when multiple aggregations share sorting keys and orders. #7452
Optimize Exchange operator by allowing merging of small batches of data into larger vectors. #7404
Build¶
Credits¶
Alex Hornby, Amit Dutta, Andrii Rosa, Austin Dickey Bikramjeet Vig, Cheng Huang, Chengcheng Jin, Christopher Ponce de Leon, Daniel Munoz, Deepak Majeti, Ge Gao, Genevieve (Genna) Helsel, Harvey Hunt, Jake Jung, Jia, Jia Ke, Jialiang Tan, Jimmy Lu, John Elliott, Karteekmurthys, Ke, Kevin Wilfong, Krishna Pai, Laith Sakka, Masha Basmanova, Orri Erling, PHILO-HE, Patrick Sullivan, Pedro Eugenio Rocha Pedreira, Pramod, Richard Barnes, Schierbeck, Cody, Sergey Pershin, Wei He, Zhenyuan Zhao, aditi-pandit, curt, duanmeng, joey.ljy, lingbin, rui-mo, usurai, vibhatha, wypb, xiaoxmeng, xumingming, yangchuan, yaqi-zhao, yingsu00, yiweiHeOSS, youxiduo, zhli, 高阳阳