Iceberg Functions¶
Here is a list of all scalar Iceberg functions available in Velox. Function names link to function description.
These functions are used in partition transform. Refer to Iceberg documenation for details.
- bucket(numBuckets, input) integer¶
Returns an integer between 0 and numBuckets - 1, indicating the assigned bucket. Bucket partitioning is based on a 32-bit hash of the input, specifically using the x86 variant of the Murmur3 hash function with a seed of 0.
The function can be expressed in pseudo-code as below.
def bucket(numBuckets, input)= (murmur3_x86_32_hash(input) & Integer.MAX_VALUE) % numBuckets
The
numBucketsis of type INTEGER and must be greater than 0. Otherwise, an exception is thrown. Supported types forinputare INTEGER, BIGINT, DECIMAL, DATE, TIMESTAMP, VARCHAR, VARBINARY.SELECT bucket(128, 'abcd'); -- 4 SELECT bucket(100, 34L); -- 79
- days(input) date¶
Returns the date.
SELECT days(DATE '2017-12-01'); -- 2017-12-01 SELECT days(TIMESTAMP '2017-12-01 10:12:55.038194'); -- 2017-12-01 SELECT days(DATE '1969-12-31'); -- 1969-12-31
- hours(input) integer¶
Returns the number of hours since epoch (1970-01-01 00:00:00). Returns 0 for ‘1970-01-01 00:00:00’ timestamps. Returns negative value for timestamps before ‘1970-01-01 00:00:00’.
SELECT hours(TIMESTAMP '2017-12-01 10:12:55.038194'); -- 420034 SELECT hours(TIMESTAMP '1969-12-31 23:59:58.999999'); -- -1
- months(input) integer¶
Returns the number of months since epoch (1970-01-01). Returns 0 for ‘1970-01-01’ date and timestamps. Returns negative value for dates and timestamps before ‘1970-01-01’.
SELECT months(DATE '2017-12-01'); -- 575 SELECT months(TIMESTAMP '2017-12-01 10:12:55.038194'); -- 575 SELECT months(DATE '1960-01-01'); -- -120
- truncate(width, input) same type as input¶
Returns the truncated value of the input based on the specified width. For numeric values, truncate to the nearest lower multiple of
width, the truncate function is: input - (((input % width) + width) % width). Thewidthis used to truncate decimal values is applied using unscaled value to avoid additional (and potentially conflicting) parameters. For string values, it truncates a valid UTF-8 string with no more thanwidthcode points. In contrast to strings, binary values do not have an assumed encoding and are truncated towidthbytes.Argument
widthmust be a positive integer. Supported types forinputare: SHORTINT, TYNYINT, SMALLINT, INTEGER, BIGINT, DECIMAL, VARCHAR, VARBINARY.SELECT truncate(10, 11); -- 10 SELECT truncate(10, -11); -- -20 SELECT truncate(7, 22); -- 21 SELECT truncate(0, 11); -- error: Reason: (0 vs. 0) Invalid truncate width\nExpression: width <= 0 SELECT truncate(-3, 11); -- error: Reason: (-3 vs. 0) Invalid truncate width\nExpression: width <= 0 SELECT truncate(4, 'iceberg'); -- 'iceb' SELECT truncate(1, '测试'); -- 测 SELECT truncate(6, '测试'); -- 测试 SELECT truncate(6, cast('测试' as binary)); -- 测试_
- years(input) integer¶
Returns the number of years since epoch (1970-01-01). Returns 0 for ‘1970-01-01’ date and timestamps. Returns negative value for dates and timestamps before ‘1970-01-01’.
SELECT years(DATE '2017-12-01'); -- 47 SELECT years(TIMESTAMP '2017-12-01 10:12:55.038194'); -- 47 SELECT years(DATE '1960-01-01'); -- -10