July 2025 Update¶
This update was generated with the assistance of AI. While we strive for accuracy, please note that AI-generated content may not always be error-free. We encourage you to verify any information that you deem important.
Core Library¶
Switch to C++20 standard. #10866
Add ParallelProject node and operator from Verax. #14220
Add barriered execution support to AssignUniqueId. #14224
Add support for left join semantics to Unnest. #14095
Add support for converting OPAQUE vectors to variant. #14235
Add basic support for coercions to help with function type resolution. #14113
Optimize streaming aggregation by removing max output batch size limit, reducing peak memory 4x. #14238
Fix deadlock when dropping non-existent child memory pools. #14202
Fix Variant::hash and BaseVector::hashValueAt for arrays and maps. #14019
Fix ConstantTypedExpr equals/hash/toString methods for various data types. #14055
Fix AssignUniqueId needsInput logic. #14127
Presto Functions¶
Add
dot_product()
function for embedding similarity calculations.Add seeded version of
xxhash64()
function.Add SFM sketch functions:
merge()
,noisy_approx_set_sfm_from_index_and_zeros()
,noisy_approx_distinct_sfm()
andnoisy_approx_set_sfm()
aggregate functions.Add
quantile_at_value()
andscale_qdigest()
functions.Add
geometry_nearest_points()
,ST_NumPoints()
,ST_EnvelopeAsPts()
,ST_Points()
functions.Add
ST_Buffer()
,ST_CoordDim()
,ST_Envelope()
,ST_ExteriorRing()
functions.Add
ST_ConvexHull()
,ST_Dimension()
,ST_NumInteriorRing()
,ST_NumGeometries()
functions.Add
ST_GeometryN()
,ST_InteriorRingN()
,ST_StartPoint()
,ST_EndPoint()
functions.Add
ST_PointN()
,ST_Length()
,ST_IsClosed()
,ST_Empty()
,ST_IsRing()
functions.Add
ST_Polygon()
function.Fix Geometry serialization/deserialization errors for GeometryCollections with empty geometries. #14243
Optimize
flatten()
as a VectorFunction to enable zero copy. #14215
Spark Functions¶
Add support for decimal type in
from_json()
function.Add
abs()
function to handle ANSI mode differences from Presto.Fix
corr()
aggregate function to return NaN instead of NULL when variance is zero. #13956Fix
covar_samp()
aggregate function to return NaN instead of Inf when c2 is infinite. #13990Fix
get_json_object()
function to normalize JSON paths properly. #13854
Connectors¶
Add metadata support and filter pushdown to TpchConnector. #14099
Add HDFS filesystem operations: remove, rmdir, rename, mkdir. #13948
Add S3 filesystem operations: exists and list. #13893
Add TokenProvider support to ConnectorQueryCtx for authentication. #13919
Add support for timestamp as Hive partition ID. #13494
Add text format write support for complex types: ROW, MAP, and ARRAY. #14064
Add escape character support for text parsing. #14130
Add backward compatibility support for TIMESTAMP in TextReader. #14063
Fix HiveDataSink to materialize input before writes to prevent lazy vector errors. #14085
Performance and Correctness¶
Credits¶
Amit Dutta, Bikramjeet Vig, Bowen Wu, Chengcheng Jin, Christian Zentgraf, Elodie Li, Eric Jia, Heidi Han, Henry Edwin Dikeman, Hongze Zhang, Jacob Khaliqi, Jacob Wujciak-Jens, James Gill, Jialiang Tan, Jimmy Lu, Joe Abraham, Ke Jia, Ke Wang, Kevin Wilfong, Konstantinos Karatsenidis, Krishna Pai, Libin Bai, Manikanta Loya, Masha Basmanova, Natasha Sehgal, Oliver Xu, Orri Erling, Patrick Sullivan, Pedro Eugenio Rocha Pedreira, Peter Enescu, Pramod Satya, Raaghav Ravishankar, Rajeev Dharmendra Singh, Rui Mo, Sutou Kouhei, Tony Liu, Vincent Crabtree, Xiao Du, Xiaoxuan Meng, Yi Cheng Lee, Yuxuan Chen, Zhen Li, Zhiying Liang, aditi-pandit, lingbin, nimesh.k, wecharyu, wraymo, zhli1142015, zml1206