String Functions

Unless specified otherwise, all functions return NULL if at least one of the arguments is NULL.

ascii(string) integer

Returns unicode code point of the first character of string. Returns 0 if string is empty.

bit_length(string/binary) integer

Returns the bit length for the specified string column.

SELECT bit_length('123'); -- 24
chr(n) varchar

Returns the Unicode code point n as a single character string. If n < 0, the result is an empty string. If n >= 256, the result is equivalent to chr(n % 256).

contains(left, right) boolean

Returns true if ‘right’ is found in ‘left’. Otherwise, returns false.

SELECT contains('Spark SQL', 'Spark'); -- true
SELECT contains('Spark SQL', 'SPARK'); -- false
SELECT contains('Spark SQL', null); -- NULL
SELECT contains(x'537061726b2053514c', x'537061726b'); -- true
conv(number, fromBase, toBase) varchar

Converts number represented as a string from fromBase to toBase. fromBase must be an INTEGER value between 2 and 36 inclusively. toBase must be an INTEGER value between 2 and 36 inclusively or between -36 and -2 inclusively. Otherwise, returns NULL. Returns a signed number if toBase is negative. Otherwise, returns an unsigned one. Returns NULL if number is empty. Skips leading spaces. number may contain other characters not valid for fromBase. All characters starting from the first invalid character till the end of the string are ignored. Only converts valid characters even though fromBase = toBase. Returns ‘0’ if no valid character is found.

SELECT conv('100', 2, 10); -- '4'
SELECT conv('-10', 16, -10); -- '-16'
SELECT conv("-1", 10, 16); -- 'FFFFFFFFFFFFFFFF'
SELECT conv("123", 10, 39); -- NULL
SELECT conv('', 16, 10); -- NULL
SELECT conv(' ', 2, 10); -- NULL
SELECT conv("11", 10, 16); -- 'B'
SELECT conv("11ABC", 10, 16); -- 'B'
SELECT conv("11abc", 10, 10); -- '11'
SELECT conv('H016F', 16, 10); -- '0'
endswith(left, right) boolean

Returns true if ‘left’ ends with ‘right’. Otherwise, returns false.

SELECT endswith('js SQL', 'SQL'); -- true
SELECT endswith('js SQL', 'js'); -- false
SELECT endswith('js SQL', NULL); -- NULL
find_in_set(str, strArray) integer

Returns 1-based index of the given string str in the comma-delimited list strArray. Returns 0, if the string was not found or if the given string str contains a comma.

SELECT find_in_set('ab', 'abc,b,ab,c,def'); -- 3
SELECT find_in_set('ab,', 'abc,b,ab,c,def'); -- 0
SELECT find_in_set('dfg', 'abc,b,ab,c,def'); -- 0
SELECT find_in_set('', ''); -- 1
SELECT find_in_set('', '123,'); -- 2
SELECT find_in_set('', ',123'); -- 1
SELECT find_in_set(NULL, ',123'); -- NULL
SELECT find_in_set("abc", NULL); -- NULL
instr(string, substring) integer

Returns the starting position of the first instance of substring in string. Positions start with 1. Returns 0 if ‘substring’ is not found.

left(string, length) string

Returns the leftmost length characters from the string. If length is less or equal than 0 the result is an empty string.

length(string) integer

Returns the length of string in characters.

lower(string) string

Returns string with all characters changed to lowercase.

SELECT lower('SparkSql'); -- sparksql
lpad(string, len, pad) string

Returns string, left-padded with pad to a length of len. If string is longer than len, the return value is shortened to len characters or bytes. If pad is not specified, string will be padded to the left with space characters if it is a character string, and with zeros if it is a byte sequence.

SELECT lpad('hi', 5, '??'); -- ???hi
SELECT lpad('hi', 1, '??'); -- h
SELECT lpad('hi', 4); --   hi
ltrim(string) varchar

Removes leading 0x20(space) characters from string.

SELECT ltrim('  data  '); -- "data  "
ltrim(trimCharacters, string) varchar

Removes specified leading characters from string. The specified character is any character contained in trimCharacters. trimCharacters can be empty and may contain duplicate characters.

SELECT ltrim('ps', 'spark'); -- "ark"
overlay(input, replace, pos, len) same as input

Replace a substring of input starting at pos character with replace and going for rest len characters of input. Types of input and replace must be the same. Supported types are: VARCHAR and VARBINARY. When input types are VARCHAR, len and pos are specified in characters, otherwise, bytes. Result is constructed from three parts. First part is first pos - 1 characters of input when pos if greater then zero, otherwise, empty string. Second part is replace. Third part is rest of input from indices pos + len which is 1-based, if len is negative, it will be set to size of replace, if pos + len is negative, it refers to -(pos + len)th element before the end of input.

SELECT overlay('Spark SQL', '_', 6, -1); -- "Spark_SQL"
SELECT overlay('Spark SQL', 'CORE', 7, -1); -- "Spark CORE"
SELECT overlay('Spark SQL', 'ANSI ', 7, 0); -- "Spark ANSI SQL"
SELECT overlay('Spark SQL', 'tructured', 2, 4); -- "Structured SQL"
SELECT overlay('Spark SQL', '_', -6, 3); -- "_Sql"
replace(input, replaced) varchar

Removes all instances of replaced from input. If replaced is an empty string, returns the original input string.

SELECT replace('ABCabc', ''); -- ABCabc
SELECT replace('ABCabc', 'bc'); -- ABCc
replace(input, replaced, replacement) varchar

Replaces all instances of replaced with replacement in input. If replaced is an empty string, returns the original input string.

SELECT replace('ABCabc', '', 'DEF'); -- ABCabc
SELECT replace('ABCabc', 'abc', ''); -- ABC
SELECT replace('ABCabc', 'abc', 'DEF'); -- ABCDEF
reverse(string) varchar

Returns input string with characters in reverse order.

rpad(string, len, pad) string

Returns string, right-padded with pad to a length of len. If string is longer than len, the return value is shortened to len characters. If pad is not specified, string will be padded to the right with space characters if it is a character string, and with zeros if it is a binary string.

SELECT lpad('hi', 5, '??'); -- ???hi
SELECT lpad('hi', 1, '??'); -- h
SELECT lpad('hi', 4); -- hi
rtrim(string) varchar

Removes trailing 0x20(space) characters from string.

SELECT rtrim('  data  '); -- "  data"
rtrim(trimCharacters, string) varchar

Removes specified trailing characters from string. The specified character is any character contained in trimCharacters. trimCharacters can be empty and may contain duplicate characters.

SELECT rtrim('kr', 'spark'); -- "spa"
soundex(string) string

Returns Soundex code of the string. If first character of string is not a letter, string is returned.

SELECT soundex('Miller'); -- "M460"
split(string, delimiter)

Splits string on delimiter and returns an array.

SELECT split('oneAtwoBthreeC', '[ABC]'); -- ["one","two","three",""]
SELECT split('one', ''); -- ["o", "n", "e", ""]
SELECT split('one', '1'); -- ["one"]
split(string, delimiter, limit)

Splits string on delimiter and returns an array of size at most limit.

SELECT split('oneAtwoBthreeC', '[ABC]', -1); -- ["one","two","three",""]
SELECT split('oneAtwoBthreeC', '[ABC]', 0); -- ["one", "two", "three", ""]
SELECT split('oneAtwoBthreeC', '[ABC]', 2); -- ["one","twoBthreeC"]
startswith(left, right) boolean

Returns true if ‘left’ starts with ‘right’. Otherwise, returns false.

SELECT startswith('js SQL', 'js'); -- true
SELECT startswith('js SQL', 'SQL'); -- false
SELECT startswith('js SQL', null); -- NULL
str_to_map(string, entryDelimiter, keyValueDelimiter)

Returns a map by splitting string into entries with entryDelimiter and splitting each entry into key/value with keyValueDelimiter. entryDelimiter and keyValueDelimiter must be constant strings with single ascii character. Allows keyValueDelimiter not found when splitting an entry. Throws exception when duplicate map keys are found for single row’s result, consistent with Spark’s default behavior.

SELECT str_to_map('a:1,b:2,c:3', ',', ':'); -- {"a":"1","b":"2","c":"3"}
SELECT str_to_map('a', ',', ':'); -- {"a":NULL}
SELECT str_to_map('', ',', ':'); -- {"":NULL}
SELECT str_to_map('a:1,b:2,c:3', ',', ','); -- {"a:1":NULL,"b:2":NULL,"c:3":NULL}
substring(string, start) varchar

Returns the rest of string from the starting position start. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string. When the starting position is 0, the meaning is to refer to the first character.Type of ‘start’ must be an INTEGER.

substring(string, start, length) varchar

Returns a substring from string of length length from the starting position start. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string. When the starting position is 0, the meaning is to refer to the first character. Type of ‘start’ must be an INTEGER.

SELECT substring('Spark SQL', 0, 2); -- Sp
SELECT substring('Spark SQL', 1, 2); -- Sp
SELECT substring('Spark SQL', 5, 0); -- ""
SELECT substring('Spark SQL', 5, -1); -- ""
SELECT substring('Spark SQL', 5, 10000); -- "k SQL"
SELECT substring('Spark SQL', -9, 3); -- "Spa"
SELECT substring('Spark SQL', -10, 3); -- "Sp"
SELECT substring('Spark SQL', -20, 3); -- ""
substring_index(string, delim, count) [same as string]

Returns the substring from string before count occurrences of the delimiter delim. Here the string can be VARCHAR or VARBINARY and return type matches type of string. If count is positive, returns everything to the left of the final delimiter (counting from the left). If count is negative, returns everything to the right of the final delimiter (counting from the right). If count is 0, returns empty string. If delim is not found or found fewer times than count, returns the original input string. delim is case-sensitive. It also takes into account overlapping strings.

SELECT substring_index('Spark.SQL', '.', 1); -- "Spark"
SELECT substring_index('Spark.SQL', '.', 0); -- ""
SELECT substring_index('Spark.SQL', '.', -1); -- "SQL"
SELECT substring_index('TEST.Spark.SQL', '.',2); -- "TEST.Spark"
SELECT substring_index('TEST.Spark.SQL', '', 0); -- ""
SELECT substring_index('TEST.Spark.SQL', '.', -2); -- "Spark.SQL"
SELECT substring_index('TEST.Spark.SQL', '.', 10); -- "TEST.Spark.SQL"
SELECT substring_index('TEST.Spark.SQL', '.', -12); -- "TEST.Spark.SQL"
SELECT substring_index('aaaaa', 'aa', 2); -- "a"
SELECT substring_index('aaaaa', 'aa', -4); -- "aaa"
SELECT substring_index('aaaaa', 'aa', 0); -- ""
SELECT substring_index('aaaaa', 'aa', 5); -- "aaaaa"
SELECT substring_index('aaaaa', 'aa', -5); -- "aaaaa"
translate(string, match, replace) varchar

Returns a new translated string. It translates the character in string by a character in replace. The character in replace is corresponding to the character in match. The translation will happen when any character in string matching with a character in match. If match's character size is larger than replace's, the extra characters in match will be removed from string. In addition, this function only considers the first occurrence of a character in match and uses its corresponding character in replace for translation.

SELECT translate('spark', 'sa', '12');  -- "1p2rk"
SELECT translate('spark', 'sa', '1');   -- "1prk"
SELECT translate('spark', 'ss', '12');  -- "1park"
trim(string) varchar

Removes leading and trailing 0x20(space) characters from string.

SELECT trim('  data  '); -- "data"
trim(trimCharacters, string) varchar

Removes specified leading and trailing characters from string. The specified character is any character contained in trimCharacters. trimCharacters can be empty and may contain duplicate characters.

SELECT trim('sprk', 'spark'); -- "a"
upper(string) string

Returns string with all characters changed to uppercase.

SELECT upper('SparkSql'); -- SPARKSQL