URL Functions

Introduction

The URL extraction functions extract components from HTTP URLs (or any valid URIs conforming to RFC 3986). The following syntax is supported:

[protocol:][//host[:port]][path][?query][#fragment]

The extracted components do not contain URI syntax separators such as : , ? and #.

Consider for example the below URI:

http://www.ics.uci.edu/pub/ietf/uri/?k1=v1#Related

scheme    = http
authority = www.ics.uci.edu
path      = /pub/ietf/uri/
query     = k1=v1
fragment  = Related

Invalid URI’s

Well formed URI’s should not contain ascii whitespace. Percent-encoded URI’s should be followed by two hexadecimal digits after the percent character “%”. All the url extract functions will return null when passed an invalid uri.

# Examples of url functions with Invalid URI's.

# Invalid URI due to whitespace
SELECT url_extract_path('foo '); -- NULL (1 row)
SELECT url_extract_host('http://www.foo.com '); -- NULL (1 row)

# Invalid URI due to improper escaping of '%'
SELECT url_extract_path('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%'); -- NULL (1 row)
SELECT url_extract_host('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%'); -- NULL (1 row)

Encoding Functions

url_encode(value) varchar

Escapes value by encoding it so that it can be safely included in URL query parameter names and values:

  • Alphanumeric characters are not encoded.

  • The characters ., -, * and _ are not encoded.

  • The ASCII space character is encoded as +.

  • All other characters are converted to UTF-8 and the bytes are encoded as the string %XX where XX is the uppercase hexadecimal value of the UTF-8 byte.

url_decode(value) varchar

Unescapes the URL encoded value. This function is the inverse of url_encode().