November 2023 Update¶

Core Library¶

Add spilling support for aggregations over distinct or sorted inputs. #7305, #7526
Add support to lazily create the spill directory. #7660
Add config merge_exchange.max_buffer_size to limit the total memory used by exchange clients. #7410
Add configs sort_writer_max_output_rows and sort_writer_max_output_bytes to limit memory usage of sort writer. #7339
Add termination time to TaskStats. This is the time when the downstream workers finish consuming results. Clients such as Prestissimo can use this metric to clean up tasks. #7479
Add Presto Type Parser based on Flex and Bison. This can be used by a Presto verifier to parse types in the response from the Presto server #7568
Add support for named row fields in type signature and binding. Example: row(foo bigint) in a signature only binds to inputs whose type is row with a single BIGINT field named foo. #7523
Add support to shrink cache if clients such as Prestissimo detect high memory usage on a worker. #7547, #7645
Fix distinct aggregations with global grouping sets on empty input. Instead of empty results, for global grouping sets, the expected result is a row per global grouping set with the groupId as the key value. #7353
Fix incorrect runtime stats reporting when memory arbitration is triggered. #7394
Fix Timestamp::toMillis() to overflow only if the final result overflows. #7506

Presto Functions¶

Add cosine_similarity() scalar function.
Add support for INTERVAL DAY TO SECOND type input to plus(), minus(), multiply() functions.
Add support for combination of TIMESTAMP, INTERVAL DAY TO SECOND type inputs to plus(), minus() functions.
Add support for INTERVAL DAY TO SECOND, DOUBLE input arguments to divide() function.
Add support to allow non-constant IN list in IN Presto predicate. #7497
Register array_frequency() function for all primitive types.
Fix bitwise shift functions to accept shift value 0.
Fix url_extract_* functions to return null on malformed inputs and support absolute URIs.
Fix from_utf8() handling of invalid UTF-8 codepoint. #7442
Fix entropy() aggregate function to return 0.0 on null inputs.
Fix array_sort() function from producing invalid dictionary vectors. #7800
Fix lead(), lag() window functions to return null when the offset is null. #7254
Fix DECIMAL to VARCHAR cast by adding trailing zeros when the value is 0. #7588

Spark Functions¶

Add month(), quarter(), unscaled_value(), regex_replace() scalar functions.
Add make_decimal(), decimal_round() special form functions.
Add support for DECIMAL compare with arguments of different precision and scale. #6207
Add support for complex type inputs to map() function.
Fix dayofmonth() and dayofyear() to allow only DATE type as input and return an INTEGER type.
Fix map() function from throwing an exception when used inside an if or switch statement. #7727

Hive Connector¶

Add DirectBufferedInput: a selective BufferedInput without caching. #7217
Add support for reading UNSIGNED INTEGER types in Parquet format. #6728
Add spill support for DWRF sort writer. #7326
Add file_handle_cache_enabled Hive Config to enable or disable caching file handles.
Add documentation for num_cached_file_handles configuration property.
Add support for DECIMAL and VARCHAR types in BenchmarkParquetReader. #6275

Arrow¶

Add support to export constant vector as Arrow REE array. #7327, #7398
Add support for TIMESTAMP type in Arrow bridge. #7435
Fix Arrow bridge to ensure the null_count is always set and add support for null constants. #7411

Performance and Correctness¶

Add PrestoQueryRunner that can be used to verify test results against Presto. #7628
Add support for plans with TableScan in Join Fuzzer. #7571
Add support for custom input generators in Aggregation Fuzzer. #7594
Add support for aggregations over sorted inputs in AggregationFuzzer #7620
Add support for custom result verifiers in AggregationFuzzer. #7674
Add custom verifiers for approx_percentile() and approx_distinct() in AggregationFuzzer. #7654
Optimize map subscript by caching input keys in a hash map. #7191
Optimize FlatVector<StringView>::copy() slow path using a DecodedVector and pre-allocated the string buffer. #7357
Optimize element_at for maps with complex type keys by sorting the keys and using binary search. #7365
Optimize concat() by adding a fast path for primitive values. #7393
Optimize json_parse() function exception handling by switching to simdjson. #7658
Optimize add_items for VARCHAR type by avoiding a deep copy. #7395
Optimize remaining filter by lazily evaluating multi-referenced fields. #7433
Optimize TopN::addInput() by deferring copying of the non-key columns. #7172
Optimize by sorting the inputs once when multiple aggregations share sorting keys and orders. #7452
Optimize Exchange operator by allowing merging of small batches of data into larger vectors. #7404

Build¶

Add DuckDB version 0.8.1 as an external dependency and remove DuckDB amalgamation. #6725
Add libcpr a lightweight http client. #7385
Upgrade Arrow dependency to 14.0.1 from 13.0.0.

Credits¶

Alex Hornby, Amit Dutta, Andrii Rosa, Austin Dickey Bikramjeet Vig, Cheng Huang, Chengcheng Jin, Christopher Ponce de Leon, Daniel Munoz, Deepak Majeti, Ge Gao, Genevieve (Genna) Helsel, Harvey Hunt, Jake Jung, Jia, Jia Ke, Jialiang Tan, Jimmy Lu, John Elliott, Karteekmurthys, Ke, Kevin Wilfong, Krishna Pai, Laith Sakka, Masha Basmanova, Orri Erling, PHILO-HE, Patrick Sullivan, Pedro Eugenio Rocha Pedreira, Pramod, Richard Barnes, Schierbeck, Cody, Sergey Pershin, Wei He, Zhenyuan Zhao, aditi-pandit, curt, duanmeng, joey.ljy, lingbin, rui-mo, usurai, vibhatha, wypb, xiaoxmeng, xumingming, yangchuan, yaqi-zhao, yingsu00, yiweiHeOSS, youxiduo, zhli, 高阳阳

November 2023 Update¶

Core Library¶

Presto Functions¶

Spark Functions¶

Hive Connector¶

Arrow¶

Performance and Correctness¶

Build¶

Credits¶

Table of Contents

Previous topic

Next topic

This Page