===================== View and Writer Types ===================== View types and writer types are used as the input and output parameter types respectively for complex and string types in the simple function interface of both scalar and aggregate functions. Inputs (View Types) ------------------- Input complex types are represented in the simple function interface using light-weight lazy access abstractions that enable efficient direct access to the underlying data in Velox vectors. As mentioned earlier, the helper aliases arg_type and null_free_arg_type can be used in function's signatures to map Velox types to the corresponding input types. The table below shows the actual types that are used to represent inputs of different complex types. ============================== ========================= ============================== C++ Argument Type C++ Actual Argument Type Corresponding `std` type ============================== ========================= ============================== arg_type> NullableArrayView> std::vector> arg_type> NullableMapView std::map> arg_type> NullableRowView std::tuple... null_free_arg_type> NullFreeArrayView std::vector null_free_arg_type> NullFreeMapView std::map null_free_arg_type>> NullFreeRowView std::tuple ============================== ========================= ============================== The view types are designed to have interfaces similar to those of std::containers, in fact in most cases they can be used as a drop in replacement. The table above shows the mapping between the Velox type and the corresponding std type. For example: a *Map, Array>* corresponds to const *std::map, std::vector>*. All views types are cheap to copy objects, for example the size of ArrayView is 16 bytes at max. **OptionalAccessor**: OptionalAccessor is an *std::optional* like object that provides lazy access to the nullity and value of the underlying Velox vector at a specific index. Currently, it is used to represent elements of nullable input arrays and values of nullable input maps. Note that keys in the map are assumed to be always not nullable in Velox. The object supports the following methods: - arg_type value() : unchecked access to the underlying value. - arg_type operator \*() : unchecked access to the underlying value. - bool has_value() : return true if the value is not null. - bool operator() : return true if the value is not null. The nullity and the value accesses are decoupled, and hence if someone knows inputs are null-free, accessing the value does not have the overhead of checking the nullity. So is checking the nullity. Note that, unlike std::container, function calls to value() and operator* are r-values (temporaries) and not l-values, they can bind to const references and l-values but not references. OptionalAccessor is assignable to and comparable with std::optional> for primitive types. The following expressions are valid, where array[0] is an optional accessor. .. code-block:: c++ std::optional = array[0]; if(array[0] == std::nullopt) ... if(std::nullopt == array[0]) ... if(array[0]== std::optional{1}) ... **NullableArrayView and NullFreeArrayView** NullableArrayView and NullFreeArrayView have interfaces similar to that of *std::vector>* and *std::vector*, the code below shows the function arraySum, a range loop is used to iterate over the values. .. code-block:: c++ template struct ArraySum { VELOX_DEFINE_FUNCTION_TYPES(T); bool call(const int64_t& output, const arg_type>& array) { output = 0; for(const auto& element : array) { if (element.has_value()) { output += element.value(); } } return true; } }; ArrayView supports the following: - size_t **size** () : return the number of elements in the array. - **operator[]** (size_t index) : access element at index. It returns either null_free_arg_type or OptionalAccessor. - ArrayView::Iterator **begin** () : iterator to the first element. - ArrayView::Iterator **end** () : iterator indicating end of iteration. - bool **mayHaveNulls** () : constant time check on the underlying vector nullity. When it returns false, there are definitely no nulls, a true does not guarantee null existence. - ArrayView::SkipNullsContainer **skipNulls** () : return an iterable container that provides direct access to non-null values in the underlying array. For example, the function above can be written as: .. code-block:: c++ template struct ArraySum { VELOX_DEFINE_FUNCTION_TYPES(T); bool call(const int64_t& output, const arg_type>& array) { output = 0; for (const auto& value : array.skipNulls()) { output += value; } return true; } }; The skipNulls iterator will check the nullity at each index and skip nulls, a more performant implementation would skip reading the nullity when mayHaveNulls() is false. .. code-block:: c++ template struct ArraySum { VELOX_DEFINE_FUNCTION_TYPES(T); bool call(const int64_t& output, const arg_type>& array) { output = 0; if (array.mayHaveNulls()) { for(const auto& value : array.skipNulls()) { output += value; } return true; } // No nulls, skip reading nullity. for (const auto& element : array) { output += element.value(); } return true; } }; Note: calls to operator[], iterator de-referencing, and iterator pointer de-referencing are r-values (temporaries), versus l-values in STD containers. Hence those can be bound to const references or l-values but not normal references. **NullableMapView and NullFreeMapView** NullableMapView and NullFreeMapView has an interfaces similar to std::map> and std::map, the code below shows an example function mapSum, sums up the keys and values. .. code-block:: c++ template struct MapSum{ bool call(const int64_t& output, const arg_type>& map) { output = 0; for (const auto& [key, value] : map) { output += key; if (value.has_value()) { value += value.value(); } } return true; } }; MapView supports the following: - MapView::Element **begin** () : iterator to the first map element. - MapView::Element **end** () : iterator that indicates end of iteration. - size_t **size** () : number of elements in the map. - MapView::Iterator **find** (const key_t& key): performs a linear search for the key, and returns iterator to the element if found otherwise returns end(). Only supported for primitive key types. - MapView::Iterator **operator[]** (const key_t& key): same as find, throws an exception if element not found. - MapView::Element MapView::Element is the type returned by dereferencing MapView::Iterator. It has two members: - first : arg_type | null_free_arg_type - second: OptionalAccessor | null_free_arg_type - MapView::Element participates in struct binding: auto [v, k] = \*map.begin(); Note: iterator de-referencing and iterator pointer de-referencing result in temporaries. Hence those can be bound to const references or value variables but not normal references. Generic input types are implemented using GenericView that supports the following: - uint64_t **hash** () const : returns a hash of the value; used to define std::hash(); allows GenericView's to be stored in folly::F14 sets and maps as well as STL's sets and maps. - bool **isNull** () const : returns true if the value is NULL - bool **operator==** (const GenericView& other) const : equality comparison with another GenericView - std::optional **compare** (const GenericView& other, const CompareFlags flags) const : comparison with another GenericView - TypeKind **kind** () const : returns TypeKind of the value - const TypePtr& **type** () const : returns Velox type of the value - std::string **toString** () const : returns string representaion of the value for logging and debugging - template typename VectorReader::exec_in_t **castTo** () const : cast to concrete view type - template std::optional::exec_in_t> **tryCastTo** () const : best-effort attempt to cast to a concrete view type **Temporaries lifetime C++** While c++ allows temporaries(r-values) to bound to const references by extending their lifetime, one must be careful and know that only the assigned temporary lifetime is extended but not all temporaries in the RHS expression chain. In other words, the lifetime of any temporary within an expression is not extended. For example, for the expression const auto& x = map.begin()->first. c++ does not extend the lifetime of the result of map.begin() since it's not what is being assigned. And in such a case, the assignment has undefined behavior. .. code-block:: c++ // Safe assignments. single rhs temporary. const auto& a = array[0]; const auto& b = *a; const auto& c = map.begin(); const auto& d = c->first; // Unsafe assignments. (undefined behaviours) const auto& a = map.begin()->first; const auto& b = **it; // Safe and cheap to assign to value. const auto a = map.begin()->first; const auto b = **it; Note that in the range-loop, the range expression is assigned to a universal reference. Thus, the above concern applies to it. .. code-block:: c++ // Unsafe range loop. for(const auto& e : **it){..} // Safe range loop. auto itt = *it; for(const auto& e : *itt){..} .. _outputs-write: Outputs (Writer Types) ---------------------- Outputs of complex types are represented using special writers that are designed in a way that minimizes data copying by writing directly to Velox vectors. **ArrayWriter** - out_type& **add_item** () : add non-null item and return the writer of the added value. - **add_null** (): add null item. - **reserve** (vector_size_t size): make sure space for `size` items is allocated in the underlying vector. - vector_size_t **size** (): return the length of the array. - **resize** (vector_size_t size): change the size of the array reserving space for the new elements if needed. - void **add_items** (const T& data): append data from any container with std::vector-like interface. - void **copy_from** (const T& data): assign data to match that of any container with std::vector-like interface. - void **add_items** (const NullFreeArrayView& data): append data from array view (faster than item by item). - void **copy_from** (const NullFreeArrayView& data): assign data from array view (faster than item by item). - void **add_items** (const NullableArrayView& data): append data from array view (faster than item by item). - void **copy_from** (const NullableArrayView& data): assign data from array view (faster than item by item). When V is primitive, the following functions are available, making the writer usable as std::vector. - **push_back** (std::optional): add item or null. - PrimitiveWriter **operator[]** (vector_size_t index): return a primitive writer that is assignable to std::optional for the item at index (should be called after a resize). - PrimitiveWriter **back** (): return a primitive writer that is assignable to std::optional for the item at index length -1. **MapWriter** - **reserve** (vector_size_t size): make sure space for `size` entries is allocated in the underlying vector. - std::tuple&, out_type&> **add_item()** : add non-null item and return the writers of key and value as tuple. - out_type& **add_null()** : add null item and return the key writer. - vector_size_t **size** (): return the length of the map. - void **add_items** (const T& data): append data from any container with std::vector> like interface. - void **copy_from** (const NullFreeMapView& data): assign data from map view (faster than item by item). - void **copy_from** (const NullableMapView& data): assign data from map view (faster than item by item). When K and V are primitives, the following functions are available, making the writer usable as std::vector>. - **resize** (vector_size_t size): change the size. - **emplace** (K, std::optional): add element to the map. - std::tuple> **operator[]** (vector_size_t index): returns pair of writers for element at index. Key writer is assignable to K. while value writer is assignable to std::optional. **RowWriter** - template **set_null_at** (): set null for row item at index I. - template **get_writer_at** (): set not null for row item at index I, and return writer to the row element at index I. When all types T... are primitives, the following functions are available. - void **operator=** (const std::tuple& inputs): assignable to std::tuple. - void **operator=** (const std::tuple...>& inputs): assignable to std::tuple...>. - void **copy_from** (const std::tuple& inputs): similar as the above. When a given Ti is primitive, the following is valid. - PrimitiveWriter exec::get(RowWriter): return a primitive writer for item at index I that is assignable to std::optional. **PrimitiveWriter** Assignable to std::optional allows writing null or value to the primitive. Returned by complex writers when writing nullable primitives. **StringWriter<>** - void **reserve** (size_t newCapacity) : Reserve a space for the output string with size of at least newCapacity. - void **resize** (size_t newCapacity) : Set the size of the string. - char* **data** (): returns pointer to the first char of the string, can be written to directly (safe to write to index at capacity()-1). - vector_size_t **capacity** (): returns the capacity of the string. - vector_size_t **size** (): returns the size of the string. - **operator+=** (const T& input): append data from char* or any type with data() and size(). - **append** (const T& input): append data from char* or any type with data() and size(). - **copy_from** (const T& input): append data from char* or any type with data() and size(). When Zero-copy optimization is enabled (see zero-copy-string-result section above), the following functions can be used. - void **setEmpty** (): set to empty string. - void **setNoCopy** (const StringView& value): set string to an input string without performing deep copy. **GenericWriter** - TypeKind **kind** () const : returns TypeKind of the value - const TypePtr& **type** () const : returns Velox type of the value - void **copy_from** (const GenericView& view) : assign data from another GenericView - template typename VectorWriter::exec_out_t& **castTo** () : cast to concrete writer type - template typename VectorWriter::exec_out_t* **tryCastTo** () : best-effort attempt to cast to a concrete writer type Limitations ----------- 1. If a function throws an exception while writing a complex type, then the output of the row being written as well as the output of the next row are undefined. Hence, it's recommended to avoid throwing exceptions after writing has started for a complex output within the function.