February 2024 Update

Core Library

  • Add support for aggregations over distinct inputs to StreamingAggregation.

  • Add support for deserializing a single column in Presto page format.

  • Add support for deserializing an all-null column serialized as UNKNOWN type in Presto page format.

  • Add stats for null skew in join operator.

  • Convert TIMESTAMP_WITH_TIME_ZONE type to a primitive type.

  • Add background profiler that starts Linux perf on the Velox process.

  • Fix out of range in dynamic array error in Task::toJson.

  • Delete unused max_arbitrary_buffer_size config.

Presto Functions

Spark Functions

Hive Connector

  • Add ignore_missing_files config.

  • Add write support to ABFS file system.

  • Add support for proxy to S3 file system.

Arrow

  • Add support to export UNKNOWN type to Arrow array.

  • Add support to convert Arrow REE arrays to Velox Vectors.

Performance and Correctness

  • Add FieldReference benchmark.

  • Add Window fuzzer.

  • Fix Too many open files error in Join fuzzer.

Build System

  • Add VELOX_BUILD_MINIMAL_WITH_DWIO CMake option.

  • Move documentation, header and format check to Github Action.

Credits

Aaron Feldman, Ankita Victor, Bikramjeet Vig, Christian Zentgraf, Daniel Munoz, David McKnight, Deepak Majeti, Ge Gao, Hongze Zhang, Jacob Wujciak-Jens, Jia Ke, Jialiang Tan, Jimmy Lu, Kevin Wilfong, Krishna Pai, Lu Niu, Masha Basmanova, Nick Terrell, Orri Erling, PHILO-HE, Pedro Pedreira, Pramod, Pranjal Shankhdhar, Richard Barnes, Schierbeck, Cody, Sergey Pershin, Wei He, Yedidya Feldblum, Zac Wen, Zhenyuan Zhao, aditi-pandit, duanmeng, gayangya, hengjiang.ly, hitarth, lingbin, mwish, rrando901, rui-mo, xiaodou, xiaoxmeng, xumingming, yingsu00, zhli1142015, 高阳阳