Updated on 2023-10-25 GMT+08:00

Overview

Table 1 lists the string functions supported by DLI.

Table 1 String functions

Function

Syntax

Value Type

Description

ascii

ascii(string <str>)

BIGINT

Returns the numeric value of the first character in a string.

concat

concat(array<T> <a>, array<T> <b>[,...]), concat(string <str1>, string <str2>[,...])

ARRAY or STRING

Returns a string concatenated from multiple input strings. This function can take any number of input strings.

concat_ws

concat_ws(string <separator>, string <str1>, string <str2>[,...]), concat_ws(string <separator>, array<string> <a>)

ARRAY or STRUCT

Returns a string concatenated from multiple input strings that are separated by specified separators.

char_matchcount

char_matchcount(string <str1>, string <str2>)

BIGINT

Returns the number of characters in str1 that appear in str2.

encode

encode(string <str>, string <charset>)

BINARY

Returns strs encoded in charset format.

find_in_set

find_in_set(string <str1>, string <str2>)

BIGINT

Returns the position (stating from 1) of str1 in str2 separated by commas (,).

get_json_object

get_json_object(string <json>, string <path>)

STRING

Parses the JSON object in a specified JSON path. The function will return NULL if the JSON object is invalid.

instr

instr(string <str>, string <substr>)

INT

Returns the index of substr that appears earliest in str. Returns NULL if either of the arguments are NULL and returns 0 if substr does not exist in str. Note that the first character in str has index 1.

instr1

instr1(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])

BIGINT

Returns the position of str2 in str1.

initcap

initcap(string A)

STRING

Converts the first letter of each word of a string to upper case and all other letters to lower case.

keyvalue

keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)

STRING

Splits str by split1, converts each group into a key-value pair by split2, and returns the value corresponding to the key.

length

length(string <str>)

BIGINT

Returns the length of a string.

lengthb

lengthb(string <str>)

STRING

Returns the length of a specified string in bytes.

levenshtein

levenshtein(string A, string B)

INT

Returns the Levenshtein distance between two strings, for example, levenshtein('kitten','sitting') = 3.

locate

locate(string <substr>, string <str>[, bigint <start_pos>])

BIGINT

Returns the position of substr in str.

lower/lcase

lower(string A) , lcase(string A)

STRING

Converts all characters of a string to the lower case.

lpad

lpad(string <str1>, int <length>, string <str2>)

STRING

Returns a string of a specified length. If the length of the given string (str1) is shorter than the specified length (length), the given string is left-padded with str2 to the specified length.

ltrim

ltrim([<trimChars>,] string <str>)

STRING

Trims spaces from the left hand side of a string.

parse_url

parse_url(string urlString, string partToExtract [, string keyToExtract])

STRING

Returns the specified part of a given URL. Valid values of partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'.

When the second parameter is set to QUERY, the third parameter can be used to extract the value of a specific parameter. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'.

printf

printf(String format, Obj... args)

STRING

Prints the input in a specific format.

regexp_count

regexp_count(string <source>, string <pattern>[, bigint <start_position>])

BIGINT

Returns the number of substrings that match a specified pattern in the source, starting from the start_position position.

regexp_extract

regexp_extract(string <source>, string <pattern>[, bigint <groupid>])

STRING

Matches the string source based on the pattern grouping rule and returns the string content that matches groupid.

replace

replace(string <str>, string <old>, string <new>)

STRING

Replaces the substring that matches a specified string in a string with another string.

regexp_replace

  • For Spark 2.4.5: regexp_replace(string <source>, string <pattern>, string <replace_string>)
  • For Spark 3.3.1: regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])

STRING

  • For Spark 2.4.5: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string.
  • For Spark 3.3.1: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string.

regexp_replace1

regexp_replace1(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])

STRING

Replaces the substring that matches pattern for the occurrence time in the source string with the specified string replace_string and returns the result string.

regexp_instr

regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])

BIGINT

Returns the start or end position of the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.

regexp_substr

regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])

STRING

Returns the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.

repeat

repeat(string <str>, bigint <n>)

STRING

Repeats a string for N times.

reverse

reverse(string <str>)

STRING

Returns a string in reverse order.

rpad

rpad(string <str1>, int <length>, string <str2>)

STRING

Right-pads str1 with str2 to the specified length.

rtrim

rtrim([<trimChars>, ]string <str>),

rtrim(trailing [<trimChars>] from <str>)

STRING

Trims spaces from the right hand side of a string.

soundex

soundex(string <str>)

STRING

Returns the soundex string from str, for example, soundex('Miller') = M460.

space

space(bigint <n>)

STRING

Returns a specified number of spaces.

substr/substring

substr(string <str>, bigint <start_position>[, bigint <length>]), substring(string <str>, bigint <start_position>[, bigint <length>])

STRING

Returns the substring of str, starting from start_position and with a length of length.

substring_index

substring_index(string <str>, string <separator>, int <count>)

STRING

Truncates the string before the count separator of str. If the value of count is positive, the string is truncated from the left. If the value of count is negative, the string is truncated from the right.

split_part

split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])

STRING

Splits a specified string based on a specified separator and returns a substring from the start to end position.

translate

translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)

STRING

Translates the input string by replacing the characters or string specified by from with the characters or string specified by to. For example, replaces bcd in abcde with BCD using translate("abcde", "bcd", "BCD").

trim

trim([<trimChars>,]string <str>),

trim([BOTH] [<trimChars>] from <str>)

STRING

Trims spaces from both ends of a string.

upper/ucase

upper(string A), ucase(string A)

STRING

Converts all characters of a string to the upper case.