December 2023 Update¶
Documentation¶
Add documentation for Runtime Metrics.
Core Library¶
Add support for
k range frames
inWindow
operator.Add support for aggregations over sorted inputs to
StreamingAggregation
.Add support for TTL in AsyncDataCache and SsdCache. #6412
Add support for TypeSignature Parser using Flex and Bison. This is used to parse function signatures.
Add support for spilling during the output processing stage of the
OrderBy
operator.Add support for metrics related to memory arbitration and spilling. #7940, #8025
Add config
max_spill_bytes
to bound the storage used for spilling. The default value is set to 100GB. If it is set to zero, then there is no limit.Add
Status
class that can be used to carry the success or error state of an operation. This is similar to arrow::Status.Add Expand operator.
Add config
max_arbitrary_buffer_size
to set the maximum size in bytes for a task’s buffered output when the output is distributed randomly among consumers. The producer drivers are blocked when the buffer size exceeds this config.Fix reclaiming memory from hash build operators in grouped execution mode. #8178
Fix non-termination of hash join in certain conditions. #7925, #8012.
Fix non-termination of the distinct aggregation in certain conditions. #7968.
Fix
LimitNode
offset and count values from overflowing.
Presto Functions¶
Add support for TIMESTAMP WITH TIME ZONE input type to
format_datetime()
function.Add support for UNKNOWN key type to
map_keys()
andmap_values()
functions.Add support for DECIMAL types to
approx_distinct()
aggregate function.Add support for
cast(double|real as varchar)
to return scientific notation when magnitude of the input value is greater than or equal to 10^7, or less than 10^-3.Fix
find_first()
to return NULL when the input ArrayVector is NULL but has non-zero offsets and sizes.Fix
find_first()
to support input ArrayVectors that are only NULL or empty.Fix
find_first()
to return NULL for inputs NULL array and 0 index.Fix
find_first()
to throw an error for inputs empty array and invalid start index.Fix
array_sort()
to fail gracefully if the specified comparator lambda is not supported.Fix
transform_keys()
to check new keys for NULLs.Fix
set_union()
,set_agg()
to preserve the order of inputs.Fix
map()
to produce the correct output if input arrays have NULL rows but with invalid offsets and sizes.Fix accuracy of the DECIMAL type average computation. #7944
Spark Functions¶
Add
str_to_map()
,next_day()
,atan2()
functions.Add support for DECIMAL types to
add()
andsubtract()
functions.
Hive Connector¶
Add support for multiple S3 FileSystems. #7388
Add support to write dictionary and constant encoded vectors to Parquet by flattening them.
Add support to specify a schema when writing Parquet files. #6074
Add config
max_split_preload_per_driver
and remove thesplit_preload_per_driver
flag.Fix memory leak in HdfsBuilder.
Arrow¶
Fix exporting an REE array by setting the child name to the canonical name defined in the Arrow spec. #7802
Performance and Correctness¶
Add support for lambda functions to ExpressionFuzzer.
Add ExchangeFuzzer.
Build¶
Add support for docker image with Presto.
Add support for azure-storage-files-datalake version 12.8.0.
Allow specifying a custom curl version for the cpr library. #7853
Update aws-sdk-cpp version to 1.11.169 (from 1.10.57).
Credits¶
Aditi Pandit, Amit Dutta, Bikramjeet Vig, Chengcheng Jin, Christian Zentgraf, Daniel Munoz, Deepak Majeti, Ge Gao, Harvey Hunt, HolyLow, Hongze Zhang, Jacob Wujciak-Jens, Jia, Jia Ke, Jialiang Tan, Jimmy Lu, Jubin Chheda, Karteekmurthys, Ke, Kevin Wilfong, Krishna Pai, Krishna-Prasad-P-V, Laith Sakka, Ma-Jian1, Masha Basmanova, Orri Erling, PHILO-HE, Patrick Sullivan, Pedro Eugenio Rocha Pedreira, Pedro Pedreira, Pramod,Ravi Rahman, Richard Barnes, Sergey Pershin, Srikrishna Gopu, Wei He, Xiaoxuan Meng, Yangyang Gao, Yedidya Feldblum, Zac, aditi-pandit, binwei, duanmeng, hengjiang.ly, joey.ljy, rui-mo, shangjing.cxw, soumyaduriseti, xiaoxmeng, xiyu.zk, xumingming, yan ma, yangchuan ,yingsu00, zhli, zhli1142015, 高阳阳