URL Functions¶
Introduction¶
The URL extraction functions extract components from HTTP URLs (or any valid URIs conforming to RFC 3986). The following syntax is supported:
[protocol:][//host[:port]][path][?query][#fragment]
The extracted components do not contain URI syntax separators such as : , ? and #.
Consider for example the below URI:
http://www.ics.uci.edu/pub/ietf/uri/?k1=v1#Related
scheme = http
authority = www.ics.uci.edu
path = /pub/ietf/uri/
query = k1=v1
fragment = Related
Invalid URI’s¶
Well formed URI’s should not contain ascii whitespace. Percent-encoded URI’s should be followed by two hexadecimal digits after the percent character “%”. All the url extract functions will return null when passed an invalid uri.
# Examples of url functions with Invalid URI's.
# Invalid URI due to whitespace
SELECT url_extract_path('foo '); -- NULL (1 row)
SELECT url_extract_host('http://www.foo.com '); -- NULL (1 row)
# Invalid URI due to improper escaping of '%'
SELECT url_extract_path('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%'); -- NULL (1 row)
SELECT url_extract_host('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%'); -- NULL (1 row)
Extraction Functions¶
- url_extract_fragment(url) varchar¶
Returns the fragment identifier from
url.
- url_extract_host(url) varchar¶
Returns the host from
url.
- url_extract_parameter(url, name) varchar¶
Returns the value of the first query string parameter named
namefromurl. Parameter extraction is handled in the typical manner as specified by RFC 1866#section-8.2.1.
- url_extract_path(url) varchar¶
Returns the path from
url.
- url_extract_port(url) bigint¶
Returns the port number from
url. Returns NULL if port is missing.
- url_extract_protocol(url) varchar¶
Returns the protocol from
url.
- url_extract_query(url) varchar¶
Returns the query string from
url.
Encoding Functions¶
- url_encode(value) varchar¶
Escapes
valueby encoding it so that it can be safely included in URL query parameter names and values:Alphanumeric characters are not encoded.
The characters
.,-,*and_are not encoded.The ASCII space character is encoded as
+.All other characters are converted to UTF-8 and the bytes are encoded as the string
%XXwhereXXis the uppercase hexadecimal value of the UTF-8 byte.
- url_decode(value) varchar¶
Unescapes the URL encoded
value. This function is the inverse ofurl_encode().