January 2024 Update¶
Documentation¶
Add documentation about Hash Table.
Add documentation about memory management.
Core Library¶
Add metrics to track time spent in memory arbitration. #8497, #8482
Add metric to track average buffer time for exchange. #8534
Optimize count(distinct x) when x is of complex type. #8560
Optimize latency for exchange that uses arbitrary buffer. #8532, #8480
Optimize MallocAllocator to reduce lock contention. #8477
Fix aggregation over all-null keys with ignoreNullKeys = true. #8422
Fix race condition in task completion that caused Output buffers for task not found failures. #8357
Fix evaluation of CAST expression under TRY. #8365
Fix FlatVector<StringView>::copy for vectors with more than 2GB of data. #8516
Fix crash in FlatVector<bool>::ensureWritable. #8450
Fix interaction of spilling and yielding in Hash Join operator. #8520
Fix rawInputPositions metrics in Exchange operator. #8370
Presto Functions¶
Add
from_ieee754_64()
,multimap_from_entries()
,ngrams()
functions.Add support for VARBINARY inputs to
reverse()
function.Add support for arrays of complex types to
array_min()
andarray_max()
functions.Add support for casting DOUBLE and VARCHAR as DECIMAL.
Add support for UNKNOWN key to
map_agg()
function.Add support for timezone offsets to
timezone_hour()
andtimezone_minute()
functions. #8269Optimize cast from JSON by using simdjson. #8216
Fix handling of timestamps with timezone in
date_diff()
function. #8540Fix
json_parse()
for inputs with very large numbers. #8455Fix kAtLeastN and kExactlyN fast paths in LIKE for inputs with multi-byte characters. #8150
Fix
approx_distinct()
aggregate function for TIMESTAMP inputs. #8164
Spark Functions¶
Add
from_unixtime()
,find_in_set()
,get_timestamp()
,hour()
,hex()
,isnan()
,replace()
functions.Add support for TINYINT and SMALLINT inputs to
date_add()
anddate_sub()
functions.Add support for casting DOUBLE and VARCHAR as DECIMAL.
Hive Connector¶
Performance and Correctness¶
Add support for aggregations over distinct inputs to AggregationFuzzer.
Reduce memory usage of histogram metrics. #8458
Add Join Fuzzer run to CI that runs on each PR.
Add Aggregation Fuzzer run using Presto as source of truth to experimental CI.
Build System¶
Upgrade folly to v2023.12.04.00 (from v2022.11.14.00).
Upgrade fmt to 10.1.1 (from 8.0.1).
Credits¶
Amit Dutta, Benwei Shi, Bikramjeet Vig, Chen Zhang, Chengcheng Jin, Christian Zentgraf, Deepak Majeti, Ge Gao, Hongze Zhang, Jacob Wujciak-Jens, Jia Ke, Jialiang Tan, Jimmy Lu, Ke, Kevin Wilfong, Krishna Pai, Laith Sakka, Lu Niu, Ma, Rong, Masha Basmanova, Mike Lui, Orri Erling, PHILO-HE, Pedro Eugenio Rocha Pedreira, Pratik Joseph Dabre, Ravi Rahman, Richard Barnes, Schierbeck, Cody, Sergey Pershin, Sitao Lv, Taras Galkovskyi, Wei He, Yedidya Feldblum, Yuan Zhou, Yuping Fan, Zac Wen, aditi-pandit, binwei, duanmeng, hengjiang.ly, icejoywoo, lingbin, mwish, rui-mo, wypb, xiaoxmeng, xumingming, yangchuan, yingsu00, youxiduo, yuling.sh, zhli1142015, zky.zhoukeyong, zwangsheng