String Functions¶
Note
These functions assume that the input strings contain valid UTF-8 encoded Unicode code points. There are no explicit checks for valid UTF-8 and the functions may return incorrect results on invalid UTF-8.
Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.
The lower() and upper() functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
- chr(n) varchar¶
Returns the Unicode code point
nas a single character string.
- codepoint(string) integer¶
Returns the Unicode code point of the only character of
string.
- concat(string1, ..., stringN) varchar¶
Returns the concatenation of
string1,string2,...,stringN. This function provides the same functionality as the SQL-standard concatenation operator (||).
- ends_with(string, substring) boolean¶
Returns whether
stringends_with withsubstring.
- from_utf8(binary) varchar¶
Decodes a UTF-8 encoded string from
binary. Invalid UTF-8 sequences are replaced with the Unicode replacement characterU+FFFD. Each invalid UTF-8 codepoint, including those of multi-byte long, is replaced with one replacement character.
- from_utf8(binary, replace) varchar
Decodes a UTF-8 encoded string from
binary. Invalid UTF-8 sequences are replaced with replace. Each invalid UTF-8 codepoint, including those of multi-byte long, is replaced with one replacement character. The replace argument can be either Unicode code point (bigint), a single character or empty string. When replace is an empty string invalid characters are removed.
- hamming_distance(string1, string2) bigint¶
Returns the Hamming distance of
string1andstring2, i.e. the number of positions at which the corresponding characters are different. Note that the two strings must have the same length.
- length(string) bigint¶
Returns the length of
stringin characters.
- levenshtein_distance(string_1, string_2) bigint¶
Returns the Levenshtein edit distance of 2 strings. I.e. the minimum number of single-character edits (insertions, deletions or substitutions) needed to convert
string_1tostring_2.
- lower(string) varchar¶
Converts
stringto lowercase.
- lpad(string, size, padstring) varchar¶
Left pads
stringtosizecharacters withpadstring. Ifsizeis less than the length ofstring, the result is truncated tosizecharacters.sizemust not be negative andpadstringmust be non-empty.
- ltrim(string) varchar¶
Removes leading whitespace from string. See
trim()for the set of recognized whitespace characters.
- ltrim(string, chars) varchar
Removes the longest substring containing only characters in
charsfrom the beginning of thestring.SELECT ltrim('test', 't'); -- est SELECT ltrim('tetris', 'te'); -- ris
- replaceFirst(string, search, replace) varchar¶
Removes the first instances of
searchwithreplaceinstring.If
searchis an empty string, insertsreplacein front ofstring.
- replace(string, search) varchar¶
Removes all instances of
searchfromstring.
- replace(string, search, replace) varchar
Replaces all instances of
searchwithreplaceinstring.If
searchis an empty string, insertsreplacein front of every character and at the end of thestring.
- reverse(string) varchar
Returns input string with characters in reverse order.
- reverse(varbinary) varbinary
Returns input binary with bytes in reversed order.
- rpad(string, size, padstring) varchar¶
Right pads
stringtosizecharacters withpadstring. Ifsizeis less than the length ofstring, the result is truncated tosizecharacters.sizemust not be negative andpadstringmust be non-empty.
- rtrim(string) varchar¶
Removes trailing whitespace from string. See
trim()for the set of recognized whitespace characters.
- rtrim(string, chars) varchar
Removes the longest substring containing only characters in
charsfrom the end of thestring.SELECT rtrim('test', 't'); -- tes SELECT rtrim('test...', '.'); -- test
- split(string, delimiter)¶
Splits
stringondelimiterand returns an array.
- split(string, delimiter, limit)
Splits
stringondelimiterand returns an array of size at mostlimit.The last element in the array always contains everything left in the string.
limitmust be a positive number.
- split_part(string, delimiter, index) string¶
Splits
stringondelimiterand returns the part at indexindex.Field indexes start with 1. If the index is larger than the number of fields, then null is returned.
- split_to_map(string, entryDelimiter, keyValueDelimiter) map<varchar, varchar>¶
Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. Note thatentryDelimiterandkeyValueDelimiterare interpreted literally, i.e., as full string matches.entryDelimiterandkeyValueDelimitermust not be empty and must not be the same.entryDelimiteris allowed to be the trailing character.Raises an error if there are duplicate keys.
- split_to_map(string, entryDelimiter, keyValueDelimiter, function(K, V1, V2, R)) map<varchar, varchar>¶
Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. Note thatentryDelimiterandkeyValueDelimiterare interpreted literally, i.e., as full string matches.function(K,V1,V2,R)is used to decide whether to keep first or last value for duplicate keys. (k, v1, v2) -> v1 keeps first value. (k, v1, v2) -> v2 keeps last value. Arbitrary functions are not supported.SELECT(split_to_map('a:1;b:2;a:3', ';', ':', (k, v1, v2) -> v1)); -- {"a": "1", "b": "2"} SELECT(split_to_map('a:1;b:2;a:3', ';', ':', (k, v1, v2) -> v2)); -- {"a": "3", "b": "2"}
- split_to_multimap(string, entryDelimiter, keyValueDelimiter) map<varchar, array<varchar>>¶
Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map containing an array of values for each unique key.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. The values for each key will be in the same order as they appeared instring.Note that
entryDelimiterandkeyValueDelimiterare interpreted literally, i.e., as full string matches.
- starts_with(string, substring) boolean¶
Returns whether
stringstarts withsubstring.
- strpos(string, substring) bigint¶
Returns the starting position of the first instance of
substringinstring. Positions start with1. If not found,0is returned.
- strpos(string, substring, instance) bigint
Returns the position of the N-th
instanceofsubstringinstring.instancemust be a positive number. Positions start with1. If not found,0is returned. It takes into account overlapping strings when counting occurrences.SELECT strpos('aaa', 'aa', 2); -- 2
- strrpos(string, substring) bigint¶
Returns the starting position of the last instance of
substringinstring. Positions start with1. If not found,0is returned.
- strrpos(string, substring, instance) bigint
Returns the position of the N-th
instanceofsubstringinstringstarting from the end of the string.instancemust be a positive number. Positions start with1. If not found,0is returned. It takes into account overlapping strings when counting occurrences.SELECT strrpos('aaa', 'aa', 2); -- 1
- trail(string, N) varchar¶
Returns the last
Ncharacters of the inputstringup to at most the length ofstring.
- substr(string, start) varchar¶
- substring(string, start) varchar
Returns the rest of
stringfrom the starting positionstart. Positions start with1. A negative starting position is interpreted as being relative to the end of the string. Returns empty string if absolute value ofstartis greater then length of thestring.
- substr(string, start, length) varchar
- substring(string, start, length) varchar
Returns a substring from
stringof lengthlengthfrom the starting positionstart. Positions start with1. A negative starting position is interpreted as being relative to the end of the string. Returns empty string if absolute value of'startis greater then length of thestring.
- trim(string) varchar¶
Removes starting and ending whitespaces from
string.Recognized whitespace characters:
Code
Description
Code
Description
9
TAB (horizontal tab)
U+1680
Ogham Space Mark
10
LF (NL line feed, new line)
U+2000
En Quad
11
VT (vertical tab)
U+2001
Em Quad
12
FF (NP form feed, new page)
U+2002
En Space
13
CR (carriage return)
U+2003
Em Space
28
FS (file separator)
U+2004
Three-Per-Em Space
29
GS (group separator)
U+2005
Four-Per-Em Space
30
RS (record separator)
U+2006
Four-Per-Em Space
31
US (unit separator)
U+2008
Punctuation Space
32
Space
U+2009
Thin Space
_
_
U+200a
Hair Space
_
_
U+200a
Hair Space
_
_
U+2028
Line Separator
_
_
U+2029
Paragraph Separator
_
_
U+205f
Medium Mathematical Space
_
_
U+3000
Ideographic Space
- trim(string, chars) varchar
Removes the longest substring containing only characters in
charsfrom the beginning and end of thestring.SELECT trim('test', 't'); -- es SELECT trim('.t.e.s.t.', '.t'); -- e.s
- upper(string) varchar¶
Converts
stringto uppercase.
- word_stem(word) varchar¶
Returns the stem of
wordin the English language. If thewordis not an English word, thewordin lowercase is returned.
- word_stem(word, lang) varchar¶
Returns the stem of
wordin thelanglanguage. This function supports the following languages:lang
Language
caCatalandaDanishdeGermanenEnglishesSpanisheuBasquefiFinnishfrFrenchhuHungarianhyArmenianirIrishitItalianltLithuaniannlDutchnoNorwegianptPortugueseroRomanianruRussiansvSwedishtrTurkishIf the specified
langis not supported, this function throws a user error.
Unicode Functions¶
- normalize(string) varchar¶
Transforms
stringwith NFC normalization form.
- normalize(string, form) varchar¶
Reference: https://unicode.org/reports/tr15/#Norm_Forms Transforms
stringwith the specified normalization form.formmust be be one of the following keywords:Form
Description
NFDCanonical Decomposition
NFCCanonical Decomposition, followed by Canonical Composition
NFKDCompatibility Decomposition
NFKCCompatibility Decomposition, followed by Canonical Composition
Note
This SQL-standard function has special syntax and requires specifying
formas a keyword, not as a string.
- to_utf8(string) varbinary¶
Encodes
stringinto a UTF-8 varbinary representation.