July 2024 Update¶
This month has 170 commits from 47 authors. Below are some of the highlights.
Documentation¶
Document Spark Types.
Document Empty Window Frames.
Core Library¶
Add support for Wave Driver and Aggregate operator on Nvidia GPUs.
Add support for multiple dynamic filters pushed to the same column by merging them.
Add support for polymorphic dwio::common::WriterOptions.
Add checkpoint support for SSD cache writes.
Add normalization rules for timezone offsets #10611
Add support for TPC-H Query 4 in TPCHQueryBuilder.
Fix SSD cache delta stats reported. #10408
Fix SSD cache recovery by preserving file id map.
Fix incorrect result of approx_percentile in window operations. #10368
Fix deadlock in MemoryPool when toString` is invoked. #10400
Fix try_cast to not suppress VeloxRuntimeErrors.
Fix AssignUniqueId` to avoid generating duplicate IDs. #10492
Fix range and multi-range filters to handle NaNs correctly. #10533
Fix memory leak in HashBuild operator.
Fix HashBuild memory over-use. #10534
Fix deadlock in join when arbitration is triggered.
Fix timezone conversions to use seconds instead of time point to reduce overflows.
Fix spilling of grouping set. #10548
Presto Functions¶
Add
secure_random()
,inverse_weibull_cdf()
,inverse_cauchy_cdf()
inverse_laplace_cdf()
functions.Fix NaN handling in comparison functions. #10165
Fix Nan handling in
min()
,max()
,min_by()
,max_by()
aggregate functions. #10583, #10586Fix
is_finite()
for NaN input. #10599Fix timezone handling in
to_iso8601()
for TIMESTAMP input. #10576
Spark Functions¶
Add
raise_error()
,levenshtein()
,repeat()
,json_object_keys()
,mask()
scalar functions.Add
min()
,max()
,collect_set()
aggregate functions.Add SIMD support for hash and comparison functions. #10301, #10273
Add support for fast path comparison function in join filter.
Fix CAST(STRING AS BOOLEAN) to only allow inputs supported by Spark.
Hive Connector¶
Add support for reading dictionary encoded INT96 TIMESTAMP types in Parquet.
Add support for reading BINARY as STRING in parquet.
Fix selective MAP and ARRAY column readers by passing requested type instead of file type.
Fix DELTA_BINARY_PACKED Parquet decoder. #10485
Fix Iceberg read with positional delete files. #10505
Performance and Correctness¶
Build System¶
Add support for monolithic build.
Add support for ARM64 CPU Neoverse N1/N2/V1 architectures.
Credits¶
5 Amit Dutta - Meta
6 Bikramjeet Vig - Meta
2 Bradley Dice - Nvidia
4 Christian Zentgraf - IBM
3 Daniel Hunte - Meta
9 Deepak Majeti - IBM
1 Guilherme Kunigami - Meta
1 Huameng (Michael) Jiang - Meta
1 Jacob Khaliqi - IBM
2 Jacob Wujciak-Jens - Voltron Data
1 Jia Ke - Intel
11 Jialiang Tan - Meta
8 Jimmy Lu - Meta
1 JineHelin404
4 Ke
1 Kevin Pis
7 Kevin Wilfong - Meta
6 Krishna Pai - Meta
2 Masha Basmanova - Meta
2 Orri Erling - Meta
1 PHILO-HE - Intel
15 Pedro Eugenio Rocha Pedreira - Meta
1 Reetika Agrawal - IBM
2 Richard Barnes - Meta
1 Shengxuan Liu - ByteDance
7 Wei He - Meta
4 Wills Feng - IBM
1 Xuedong Luan
1 YIMINGXU
1 Yuqi Gu
3 Zac Wen - Meta
1 Zuyu ZHANG
4 Aditi Pandit - IBM
1 Duanmeng - Tencent
2 gaoyangxiaozhu
3 hengjiang.ly - Alibaba Inc
2 joey.ljy - Alibaba Inc
1 kevincmchen
7 lingbin - Alibaba Inc
1 mwish
13 Rui Mo - Intel
5 xiaoxmeng - Meta
2 Ying Su - IBM
2 youxiduo
12 Zhen Li - Microsoft