Class URI
java.lang.Object
org.apache.sling.resourceresolver.impl.helper.URI
- All Implemented Interfaces:
Serializable
,Cloneable
,Comparable<URI>
The interface for the URI(Uniform Resource Identifiers) version of RFC 2396.
This class has the purpose of supportting of parsing a URI reference to
extend any specific protocols, the character encoding of the protocol to be
transported and the charset of the document.
A URI is always in an "escaped" form, since escaping or unescaping a
completed URI might change its semantics.
Implementers should be careful not to escape or unescape the same string more
than once, since unescaping an already unescaped string might lead to
misinterpreting a percent data character as another escaped character, or
vice versa in the case of escaping an already escaped string.
In order to avoid these problems, data types used as follows:
So, a URI is a sequence of characters as an array of a char type, which is not always represented as a sequence of octets as an array of byte. URI Syntactic ComponentsURI character sequence: char octet sequence: byte original character sequence: String
The following examples illustrate URI that are in common use.- In general, written as follows: Absolute URI = <scheme><scheme-specific-part> Generic URI = <scheme>://<authority><path>?<query> - Syntax absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments
ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET ProtocolPlease, notice that there are many modifications from URL(RFC 1738) and relative URL(RFC 1808). The expressions for a URI
For escaped URI forms - URI(char[]) // constructor - char[] getRawXxx() // method - String getEscapedXxx() // method - String toString() // method For unescaped URI forms - URI(String) // constructor - String getXXX() // methodThis class is a slightly modified version of the URI class distributed with Http Client 3.1. The changes involve removing dependencies to other Http Client classes and the Commons Codec library. To this avail the following methods have been added to this class:
- getBytes, getAsciiString, getString, getAsciiBytes has been copied from the Http Client 3.1 EncodingUtils class.
- encodeUrl and decodeUrl have been copied from the Commons Codec URLCodec class.
URIException
.- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
The charset-changed normal operation to represent to be required to alert to user the fact the default charset is changed.static class
A mapping to determine the (somewhat arbitrarily) preferred charset for a given locale. -
Field Summary
Modifier and TypeFieldDescriptionprotected char[]
The authority.protected char[]
The fragment.protected char[]
The host.protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected char[]
The opaque.protected char[]
The path.protected int
The port.protected char[]
The query.protected char[]
The scheme.protected char[]
This Uniform Resource Identifier (URI).protected char[]
The userinfo.protected static final BitSet
URI absolute path.protected static final BitSet
BitSet for absoluteURI.static final BitSet
Those characters that are allowed for the abs_path.static final BitSet
Those characters that are allowed for the authority component.static final BitSet
Those characters that are allowed for the fragment component.static final BitSet
Those characters that are allowed for the host component.static final BitSet
Those characters that are allowed for the IPv6reference component.static final BitSet
Those characters that are allowed for the opaque_part.static final BitSet
Those characters that are allowed for the query component.static final BitSet
Those characters that are allowed for the reg_name.static final BitSet
Those characters that are allowed for the rel_path.static final BitSet
Those characters that are allowed for the userinfo component.static final BitSet
Those characters that are allowed for the authority component.static final BitSet
Those characters that are allowed within the path.static final BitSet
Those characters that are allowed within the query component.static final BitSet
Those characters that are allowed for within the userinfo component.protected static final BitSet
BitSet for alpha.protected static final BitSet
BitSet for alphanum (join of alpha & digit).protected static final BitSet
BitSet for authority.static final BitSet
BitSet for control.protected static String
The default charset of the document.protected static String
protected static String
protected static String
The default charset of the protocol.static final BitSet
BitSet for delims.protected static final BitSet
BitSet for digit.static final BitSet
Disallowed opaque_part before escaping.static final BitSet
Disallowed rel_path before escaping.protected static final BitSet
BitSet for domainlabel.protected static final BitSet
BitSet for escaped.protected static final BitSet
BitSet for fragment (alias for uric).protected int
Cache the hash code for this URI.protected static final BitSet
BitSet for hex.protected static final BitSet
BitSet for hier_part.protected static final BitSet
BitSet for host.protected static final BitSet
BitSet for hostname.protected static final BitSet
BitSet for hostport.protected static final BitSet
Bitset that combines digit and dot fo IPv$address.protected static final BitSet
RFC 2373.protected static final BitSet
RFC 2732, 2373.protected static final BitSet
BitSet for mark.protected static final BitSet
BitSet for net_path.protected static final BitSet
URI bitset that combines uric_no_slash and uric.protected static final BitSet
BitSet for param (alias for pchar).protected static final BitSet
URI bitset that combines absolute path and opaque part.protected static final BitSet
BitSet for path segments.protected static final BitSet
BitSet for pchar.protected static final BitSet
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.protected static final BitSet
Port, a logical alias for digit.protected String
The charset of the protocol used by this URI instance.protected static final BitSet
BitSet for query (alias for uric).protected static final BitSet
BitSet for reg_name.protected static final BitSet
BitSet for rel_path.protected static final BitSet
BitSet for rel_segment.protected static final BitSet
BitSet for relativeURI.protected static final BitSet
BitSet for reserved.protected static final char[]
The root path.protected static final BitSet
BitSet for scheme.protected static final BitSet
BitSet for segment.protected static final BitSet
Bitset for server.static final BitSet
BitSet for space.protected static final BitSet
BitSet for toplabel.protected static final BitSet
Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved.static final BitSet
BitSet for unwise.protected static final BitSet
BitSet for URI-reference.protected static final BitSet
BitSet for uric.protected static final BitSet
URI bitset for encoding typical non-slash characters.protected static final BitSet
Bitset for userinfo.static final BitSet
BitSet for within the userinfo component like user and password. -
Constructor Summary
ModifierConstructorDescriptionprotected
URI()
Create an instance as an internal useConstruct a URI from a string with the given charset.Construct a URI from a string with the given charset.Construct a general URI from the given components.Construct a general URI from the given components.Construct a general URI from the given components.Construct a general URI from the given components.URI
(String scheme, String userinfo, String host, int port, String path, String query, String fragment) Construct a general URI from the given components.Construct a general URI from the given components.Construct a general URI from the given components.Construct a general URI with the given relative URI string.Construct a general URI with the given relative URI. -
Method Summary
Modifier and TypeMethodDescriptionclone()
Create and return a copy of this object, the URI-reference containing the userinfo component.int
Compare this URI to another object.protected static String
Decodes URI encoded string.protected static String
Decodes URI encoded string.protected static char[]
Encodes URI string.protected boolean
equals
(char[] first, char[] second) Test if the first array is equal to the second array.boolean
Test an object if this URI is equal to another.Get the level above the this hierarchy level.static byte[]
getAsciiBytes
(String data) Converts the specified string to byte array of ASCII characters.Get the authority.Get the current hierarchy level.static String
Get the recommended default charset of the document.static String
Get the default charset of the document by locale.static String
Get the default charset of the document by platform.static String
Get the default charset of the protocol.Get the level above the this hierarchy level.Get the escaped authority.Get the escaped current hierarchy level.Get the escaped fragment.Get the escaped basename of the path.Get the escaped path.Get the escaped query.Get the escaped query.It can be gotten the URI character sequence.Get the escaped URI reference string.Get the escaped userinfo.Get the fragment.getHost()
Get the host.getName()
Get the basename of the path.getPath()
Get the path.Get the path and query.int
getPort()
Get the port.Get the protocol charset used by this current URI instance.getQuery()
Get the query.char[]
Get the level above the this hierarchy level.char[]
Get the raw-escaped authority.char[]
Get the raw-escaped current hierarchy level.protected char[]
getRawCurrentHierPath
(char[] path) Get the raw-escaped current hierarchy level in the given path.char[]
Get the raw-escaped fragment.char[]
Get the host.char[]
Get the raw-escaped basename of the path.char[]
Get the raw-escaped path.char[]
Get the raw-escaped path and query.char[]
Get the raw-escaped query.char[]
Get the scheme.char[]
It can be gotten the URI character sequence.char[]
Get the URI reference character sequence.char[]
Get the raw-escaped userinfo.Get the scheme.static String
Converts the byte array of HTTP content characters to a string.getURI()
It can be gotten the URI character sequence.Get the original URI reference string.Get the userinfo.boolean
Tell whether or not this URI has authority.boolean
Tell whether or not this URI has fragment.int
hashCode()
Return a hash code for this URI.boolean
hasQuery()
Tell whether or not this URI has query.boolean
Tell whether or not this URI has userinfo.protected int
indexFirstOf
(char[] s, char delim) Get the earlier index that to be searched for the first occurrance in one of any of the given array.protected int
indexFirstOf
(char[] s, char delim, int offset) Get the earlier index that to be searched for the first occurrance in one of any of the given array.protected int
indexFirstOf
(String s, String delims) Get the earlier index that to be searched for the first occurrance in one of any of the given string.protected int
indexFirstOf
(String s, String delims, int offset) Get the earlier index that to be searched for the first occurrance in one of any of the given string.boolean
Tell whether or not this URI is absolute.boolean
Tell whether or not the relativeURI or hier_part of this URI is abs_path.boolean
Tell whether or not the absoluteURI of this URI is hier_part.boolean
Tell whether or not the host part of this URI is hostname.boolean
Tell whether or not the host part of this URI is IPv4address.boolean
Tell whether or not the host part of this URI is IPv6reference.boolean
Tell whether or not the relativeURI or heir_part of this URI is net_path.boolean
Tell whether or not the absoluteURI of this URI is opaque_part.boolean
Tell whether or not the authority component of this URI is reg_name.boolean
Tell whether or not this URI is relative.boolean
Tell whether or not the relativeURI of this URI is rel_path.boolean
isServer()
Tell whether or not the authority component of this URI is server.void
Normalizes the path part of this URI.protected char[]
normalize
(char[] path) Normalize the given hier path part.protected void
parseAuthority
(String original, boolean escaped) Parse the authority component.protected void
parseUriReference
(String original, boolean escaped) In order to avoid any possilbity of conflict with non-ASCII characters, Parse a URI reference as aString
with the character encoding of the local system or the document.protected boolean
prevalidate
(String component, BitSet disallowed) Pre-validate the unescaped URI string within a specific component.protected char[]
removeFragmentIdentifier
(char[] component) Remove the fragment identifier of the given component.protected char[]
resolvePath
(char[] basePath, char[] relPath) Resolve the base and relative path.static void
setDefaultDocumentCharset
(String charset) Set the default charset of the document.static void
setDefaultProtocolCharset
(String charset) Set the default charset of the protocol.void
setEscapedAuthority
(String escapedAuthority) Set the authority.void
setEscapedFragment
(String escapedFragment) Set the escaped fragment string.void
setEscapedPath
(String escapedPath) Set the escaped path.void
setEscapedQuery
(String escapedQuery) Set the escaped query string.void
setFragment
(String fragment) Set the fragment.void
Set the path.void
Set the query.void
setRawAuthority
(char[] escapedAuthority) Set the authority.void
setRawFragment
(char[] escapedFragment) Set the raw-escaped fragment.void
setRawPath
(char[] escapedPath) Set the raw-escaped path.void
setRawQuery
(char[] escapedQuery) Set the raw-escaped query.protected void
setURI()
Once it's parsed successfully, set this URI.toString()
Get the escaped URI string.protected boolean
Validate the URI characters within a specific component.protected boolean
Validate the URI characters within a specific component.
-
Field Details
-
hash
protected int hashCache the hash code for this URI. -
_uri
protected char[] _uriThis Uniform Resource Identifier (URI). The URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. -
protocolCharset
The charset of the protocol used by this URI instance. -
defaultProtocolCharset
The default charset of the protocol. RFC 2277, 2396 -
defaultDocumentCharset
The default charset of the document. RFC 2277, 2396 The platform's charset is used for the document by default. -
defaultDocumentCharsetByLocale
-
defaultDocumentCharsetByPlatform
-
_scheme
protected char[] _schemeThe scheme. -
_opaque
protected char[] _opaqueThe opaque. -
_authority
protected char[] _authorityThe authority. -
_userinfo
protected char[] _userinfoThe userinfo. -
_host
protected char[] _hostThe host. -
_port
protected int _portThe port. -
_path
protected char[] _pathThe path. -
_query
protected char[] _queryThe query. -
_fragment
protected char[] _fragmentThe fragment. -
rootPath
protected static final char[] rootPathThe root path. -
percent
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. -
digit
BitSet for digit.digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
-
alpha
BitSet for alpha.alpha = lowalpha | upalpha
-
alphanum
BitSet for alphanum (join of alpha & digit).alphanum = alpha | digit
-
hex
BitSet for hex.hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
-
escaped
BitSet for escaped.escaped = "%" hex hex
-
mark
BitSet for mark.mark = "-" | "_" | "." | "!" | "˜" | "*" | "'" | "(" | ")"
-
unreserved
Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved.unreserved = alphanum | mark
-
reserved
BitSet for reserved.reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
-
uric
BitSet for uric.uric = reserved | unreserved | escaped
-
fragment
BitSet for fragment (alias for uric).fragment = *uric
-
query
BitSet for query (alias for uric).query = *uric
-
pchar
BitSet for pchar.pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
-
param
BitSet for param (alias for pchar).param = *pchar
-
segment
BitSet for segment.segment = *pchar *( ";" param )
-
path_segments
BitSet for path segments.path_segments = segment *( "/" segment )
-
abs_path
URI absolute path.abs_path = "/" path_segments
-
uric_no_slash
URI bitset for encoding typical non-slash characters.uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
-
opaque_part
URI bitset that combines uric_no_slash and uric.opaque_part = uric_no_slash * uric
-
path
URI bitset that combines absolute path and opaque part.path = [ abs_path | opaque_part ]
-
port
Port, a logical alias for digit. -
IPv4address
Bitset that combines digit and dot fo IPv$address.IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
-
IPv6address
RFC 2373.IPv6address = hexpart [ ":" IPv4address ]
-
IPv6reference
RFC 2732, 2373.IPv6reference = "[" IPv6address "]"
-
toplabel
BitSet for toplabel.toplabel = alpha | alpha *( alphanum | "-" ) alphanum
-
domainlabel
BitSet for domainlabel.domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
-
hostname
BitSet for hostname.hostname = *( domainlabel "." ) toplabel [ "." ]
-
host
BitSet for host.host = hostname | IPv4address | IPv6reference
-
hostport
BitSet for hostport.hostport = host [ ":" port ]
-
userinfo
Bitset for userinfo.userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," )
-
within_userinfo
BitSet for within the userinfo component like user and password. -
server
Bitset for server.server = [ [ userinfo "@" ] hostport ]
-
reg_name
BitSet for reg_name.reg_name = 1 * (unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+")
-
authority
BitSet for authority.authority = server | reg_name
-
scheme
BitSet for scheme.scheme = alpha * (alpha | digit | "+" | "-" | ".")
-
rel_segment
BitSet for rel_segment.rel_segment = 1 * (unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | ",")
-
rel_path
BitSet for rel_path.rel_path = rel_segment[abs_path]
-
net_path
BitSet for net_path.net_path = "//" authority [ abs_path ]
-
hier_part
BitSet for hier_part.hier_part = ( net_path | abs_path ) [ "?" query ]
-
relativeURI
BitSet for relativeURI.relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
-
absoluteURI
BitSet for absoluteURI.absoluteURI = scheme ":" ( hier_part | opaque_part )
-
URI_reference
BitSet for URI-reference.URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
-
control
BitSet for control. -
space
BitSet for space. -
delims
BitSet for delims. -
unwise
BitSet for unwise. -
disallowed_rel_path
Disallowed rel_path before escaping. -
disallowed_opaque_part
Disallowed opaque_part before escaping. -
allowed_authority
Those characters that are allowed for the authority component. -
allowed_opaque_part
Those characters that are allowed for the opaque_part. -
allowed_reg_name
Those characters that are allowed for the reg_name. -
allowed_userinfo
Those characters that are allowed for the userinfo component. -
allowed_within_userinfo
Those characters that are allowed for within the userinfo component. -
allowed_IPv6reference
Those characters that are allowed for the IPv6reference component. The characters '[', ']' in IPv6reference should be excluded. -
allowed_host
Those characters that are allowed for the host component. The characters '[', ']' in IPv6reference should be excluded. -
allowed_within_authority
Those characters that are allowed for the authority component. -
allowed_abs_path
Those characters that are allowed for the abs_path. -
allowed_rel_path
Those characters that are allowed for the rel_path. -
allowed_within_path
Those characters that are allowed within the path. -
allowed_query
Those characters that are allowed for the query component. -
allowed_within_query
Those characters that are allowed within the query component. -
allowed_fragment
Those characters that are allowed for the fragment component. -
_is_hier_part
protected boolean _is_hier_part -
_is_opaque_part
protected boolean _is_opaque_part -
_is_net_path
protected boolean _is_net_path -
_is_abs_path
protected boolean _is_abs_path -
_is_rel_path
protected boolean _is_rel_path -
_is_reg_name
protected boolean _is_reg_name -
_is_server
protected boolean _is_server -
_is_hostname
protected boolean _is_hostname -
_is_IPv4address
protected boolean _is_IPv4address -
_is_IPv6reference
protected boolean _is_IPv6reference
-
-
Constructor Details
-
URI
protected URI()Create an instance as an internal use -
URI
Construct a URI from a string with the given charset. The input string can be either in escaped or unescaped form.- Parameters:
s
- URI character sequenceescaped
-true
if URI character sequence is in escaped form.false
otherwise.charset
- the charset string to do escape encoding, if required- Throws:
URIException
- If the URI cannot be created.NullPointerException
- if input string isnull
- Since:
- 3.0
- See Also:
-
URI
Construct a URI from a string with the given charset. The input string can be either in escaped or unescaped form.- Parameters:
s
- URI character sequenceescaped
-true
if URI character sequence is in escaped form.false
otherwise.- Throws:
URIException
- If the URI cannot be created.NullPointerException
- if input string isnull
- Since:
- 3.0
- See Also:
-
URI
Construct a general URI from the given components.URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] absoluteURI = scheme ":" ( hier_part | opaque_part ) opaque_part = uric_no_slash *uric
- Parameters:
scheme
- the scheme stringschemeSpecificPart
- scheme_specific_partfragment
- the fragment string- Throws:
URIException
- If the URI cannot be created.- See Also:
-
URI
public URI(String scheme, String authority, String path, String query, String fragment) throws URIException Construct a general URI from the given components.URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] absoluteURI = scheme ":" ( hier_part | opaque_part ) relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] hier_part = ( net_path | abs_path ) [ "?" query ]
- Parameters:
scheme
- the scheme stringauthority
- the authority stringpath
- the path stringquery
- the query stringfragment
- the fragment string- Throws:
URIException
- If the new URI cannot be created.- See Also:
-
URI
Construct a general URI from the given components.- Parameters:
scheme
- the scheme stringuserinfo
- the userinfo stringhost
- the host stringport
- the port number- Throws:
URIException
- If the new URI cannot be created.- See Also:
-
URI
Construct a general URI from the given components.- Parameters:
scheme
- the scheme stringuserinfo
- the userinfo stringhost
- the host stringport
- the port numberpath
- the path string- Throws:
URIException
- If the new URI cannot be created.- See Also:
-
URI
public URI(String scheme, String userinfo, String host, int port, String path, String query) throws URIException Construct a general URI from the given components.- Parameters:
scheme
- the scheme stringuserinfo
- the userinfo stringhost
- the host stringport
- the port numberpath
- the path stringquery
- the query string- Throws:
URIException
- If the new URI cannot be created.- See Also:
-
URI
public URI(String scheme, String userinfo, String host, int port, String path, String query, String fragment) throws URIException Construct a general URI from the given components.- Parameters:
scheme
- the scheme stringuserinfo
- the userinfo stringhost
- the host stringport
- the port numberpath
- the path stringquery
- the query stringfragment
- the fragment string- Throws:
URIException
- If the new URI cannot be created.- See Also:
-
URI
Construct a general URI from the given components.- Parameters:
scheme
- the scheme stringhost
- the host stringpath
- the path stringfragment
- the fragment string- Throws:
URIException
- If the new URI cannot be created.- See Also:
-
URI
Construct a general URI with the given relative URI string.- Parameters:
base
- the base URIrelative
- the relative URI stringescaped
-true
if URI character sequence is in escaped form.false
otherwise.- Throws:
URIException
- If the new URI cannot be created.- Since:
- 3.0
-
URI
Construct a general URI with the given relative URI.URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
http://a/b/c/d;p?q
g:h = g:h g = http://a/b/c/g ./g = http://a/b/c/g g/ = http://a/b/c/g/ /g = http://a/g //g = http://g ?y = http://a/b/c/?y g?y = http://a/b/c/g?y #s = (current document)#s g#s = http://a/b/c/g#s g?y#s = http://a/b/c/g?y#s ;x = http://a/b/c/;x g;x = http://a/b/c/g;x g;x?y#s = http://a/b/c/g;x?y#s . = http://a/b/c/ ./ = http://a/b/c/ .. = http://a/b/ ../ = http://a/b/ ../g = http://a/b/g ../.. = http://a/ ../../ = http://a/ ../../g = http://a/g
- Parameters:
base
- the base URIrelative
- the relative URI- Throws:
URIException
- If the new URI cannot be created.
-
-
Method Details
-
encode
Encodes URI string. This is a two mapping, one from original characters to octets, and subsequently a second from octets to URI characters:original character sequence->octet sequence->URI character sequence
- Parameters:
original
- the original character sequenceallowed
- those characters that are allowed within a componentcharset
- the protocol charset- Returns:
- URI character sequence
- Throws:
URIException
- null component or unsupported character encoding
-
decode
Decodes URI encoded string. This is a two mapping, one from URI characters to octets, and subsequently a second from octets to original characters:URI character sequence->octet sequence->original character sequence
- Parameters:
component
- the URI character sequencecharset
- the protocol charset- Returns:
- original character sequence
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding
-
decode
Decodes URI encoded string. This is a two mapping, one from URI characters to octets, and subsequently a second from octets to original characters:URI character sequence->octet sequence->original character sequence
- Parameters:
component
- the URI character sequencecharset
- the protocol charset- Returns:
- original character sequence
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- Since:
- 3.0
-
prevalidate
Pre-validate the unescaped URI string within a specific component.- Parameters:
component
- the component string within the componentdisallowed
- those characters disallowed within the component- Returns:
- if true, it doesn't have the disallowed characters if false, the component is undefined or an incorrect one
-
validate
Validate the URI characters within a specific component. The component must be performed after escape encoding. Or it doesn't include escaped characters.- Parameters:
component
- the characters sequence within the componentgenerous
- those characters that are allowed within a component- Returns:
- if true, it's the correct URI character sequence
-
validate
Validate the URI characters within a specific component. The component must be performed after escape encoding. Or it doesn't include escaped characters. It's not that much strict, generous. The strict validation might be performed before being called this method.- Parameters:
component
- the characters sequence within the componentsoffset
- the starting offset of the given componenteoffset
- the ending offset of the given component if -1, it means the length of the componentgenerous
- those characters that are allowed within a component- Returns:
- if true, it's the correct URI character sequence
-
parseUriReference
In order to avoid any possilbity of conflict with non-ASCII characters, Parse a URI reference as aString
with the character encoding of the local system or the document. The following line is the regular expression for breaking-down a URI reference into its components.^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
$1 = http: scheme = $2 = http $3 = //jakarta.apache.org authority = $4 = jakarta.apache.org path = $5 = /ietf/uri/ $6 = <undefined> query = $7 = <undefined> $8 = #Related fragment = $9 = Related
- Parameters:
original
- the original character sequenceescaped
-true
iforiginal
is escaped- Throws:
URIException
- If an error occurs.
-
indexFirstOf
Get the earlier index that to be searched for the first occurrance in one of any of the given string.- Parameters:
s
- the string to be indexeddelims
- the delimiters used to index- Returns:
- the earlier index if there are delimiters
-
indexFirstOf
Get the earlier index that to be searched for the first occurrance in one of any of the given string.- Parameters:
s
- the string to be indexeddelims
- the delimiters used to indexoffset
- the from index- Returns:
- the earlier index if there are delimiters
-
indexFirstOf
protected int indexFirstOf(char[] s, char delim) Get the earlier index that to be searched for the first occurrance in one of any of the given array.- Parameters:
s
- the character array to be indexeddelim
- the delimiter used to index- Returns:
- the ealier index if there are a delimiter
-
indexFirstOf
protected int indexFirstOf(char[] s, char delim, int offset) Get the earlier index that to be searched for the first occurrance in one of any of the given array.- Parameters:
s
- the character array to be indexeddelim
- the delimiter used to indexoffset
- The offset.- Returns:
- the ealier index if there is a delimiter
-
parseAuthority
Parse the authority component.- Parameters:
original
- the original character sequence of authority componentescaped
-true
iforiginal
is escaped- Throws:
URIException
- If an error occurs.
-
setURI
protected void setURI()Once it's parsed successfully, set this URI.- See Also:
-
isAbsoluteURI
public boolean isAbsoluteURI()Tell whether or not this URI is absolute.- Returns:
- true iif this URI is absoluteURI
-
isRelativeURI
public boolean isRelativeURI()Tell whether or not this URI is relative.- Returns:
- true iif this URI is relativeURI
-
isHierPart
public boolean isHierPart()Tell whether or not the absoluteURI of this URI is hier_part.- Returns:
- true iif the absoluteURI is hier_part
-
isOpaquePart
public boolean isOpaquePart()Tell whether or not the absoluteURI of this URI is opaque_part.- Returns:
- true iif the absoluteURI is opaque_part
-
isNetPath
public boolean isNetPath()Tell whether or not the relativeURI or heir_part of this URI is net_path. It's the same function as the has_authority() method.- Returns:
- true iif the relativeURI or heir_part is net_path
- See Also:
-
isAbsPath
public boolean isAbsPath()Tell whether or not the relativeURI or hier_part of this URI is abs_path.- Returns:
- true iif the relativeURI or hier_part is abs_path
-
isRelPath
public boolean isRelPath()Tell whether or not the relativeURI of this URI is rel_path.- Returns:
- true iif the relativeURI is rel_path
-
hasAuthority
public boolean hasAuthority()Tell whether or not this URI has authority. It's the same function as the is_net_path() method.- Returns:
- true iif this URI has authority
- See Also:
-
isRegName
public boolean isRegName()Tell whether or not the authority component of this URI is reg_name.- Returns:
- true iif the authority component is reg_name
-
isServer
public boolean isServer()Tell whether or not the authority component of this URI is server.- Returns:
- true iif the authority component is server
-
hasUserinfo
public boolean hasUserinfo()Tell whether or not this URI has userinfo.- Returns:
- true iif this URI has userinfo
-
isHostname
public boolean isHostname()Tell whether or not the host part of this URI is hostname.- Returns:
- true iif the host part is hostname
-
isIPv4address
public boolean isIPv4address()Tell whether or not the host part of this URI is IPv4address.- Returns:
- true iif the host part is IPv4address
-
isIPv6reference
public boolean isIPv6reference()Tell whether or not the host part of this URI is IPv6reference.- Returns:
- true iif the host part is IPv6reference
-
hasQuery
public boolean hasQuery()Tell whether or not this URI has query.- Returns:
- true iif this URI has query
-
hasFragment
public boolean hasFragment()Tell whether or not this URI has fragment.- Returns:
- true iif this URI has fragment
-
setDefaultProtocolCharset
Set the default charset of the protocol. The character set used to store files SHALL remain a local decision and MAY depend on the capability of local operating systems. Prior to the exchange of URIs they SHOULD be converted into a ISO/IEC 10646 format and UTF-8 encoded. This approach, while allowing international exchange of URIs, will still allow backward compatibility with older systems because the code set positions for ASCII characters are identical to the one byte sequence in UTF-8. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used. Always all the time, the setter method is always succeeded and throwsDefaultCharsetChanged
exception. So API programmer must follow the following way:import org.apache.util.URI$DefaultCharsetChanged; . . . try { URI.setDefaultProtocolCharset("UTF-8"); } catch (DefaultCharsetChanged cc) { // CASE 1: the exception could be ignored, when it is set by user if (cc.getReasonCode() == DefaultCharsetChanged.PROTOCOL_CHARSET) { // CASE 2: let user know the default protocol charset changed } else { // CASE 2: let user know the default document charset changed } }
The API programmer is responsible to set the correct charset. And each application should remember its own charset to support.- Parameters:
charset
- the default charset for each protocol- Throws:
URI.DefaultCharsetChanged
- default charset changed
-
getDefaultProtocolCharset
Get the default charset of the protocol. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used. To work globally either requires support of a number of character sets and to be able to convert between them, or the use of a single preferred character set. For support of global compatibility it is STRONGLY RECOMMENDED that clients and servers use UTF-8 encoding when exchanging URIs.- Returns:
- the default charset string
-
getProtocolCharset
Get the protocol charset used by this current URI instance. It was set by the constructor for this instance. If it was not set by contructor, it will return the default protocol charset.- Returns:
- the protocol charset string
- See Also:
-
setDefaultDocumentCharset
Set the default charset of the document. Notice that it will be possible to contain mixed characters (e.g. ftp://host/KoreanNamespace/ChineseResource). To handle the Bi-directional display of these character sets, the protocol charset could be simply used again. Because it's not yet implemented that the insertion of BIDI control characters at different points during composition is extracted. Always all the time, the setter method is always succeeded and throwsDefaultCharsetChanged
exception. So API programmer must follow the following way:import org.apache.util.URI$DefaultCharsetChanged; . . . try { URI.setDefaultDocumentCharset("EUC-KR"); } catch (DefaultCharsetChanged cc) { // CASE 1: the exception could be ignored, when it is set by user if (cc.getReasonCode() == DefaultCharsetChanged.DOCUMENT_CHARSET) { // CASE 2: let user know the default document charset changed } else { // CASE 2: let user know the default protocol charset changed } }
The API programmer is responsible to set the correct charset. And each application should remember its own charset to support.- Parameters:
charset
- the default charset for the document- Throws:
URI.DefaultCharsetChanged
- default charset changed
-
getDefaultDocumentCharset
Get the recommended default charset of the document.- Returns:
- the default charset string
-
getDefaultDocumentCharsetByLocale
Get the default charset of the document by locale.- Returns:
- the default charset string by locale
-
getDefaultDocumentCharsetByPlatform
Get the default charset of the document by platform.- Returns:
- the default charset string by platform
-
getRawScheme
public char[] getRawScheme()Get the scheme.- Returns:
- the scheme
-
getScheme
Get the scheme.- Returns:
- the scheme null if undefined scheme
-
setRawAuthority
Set the authority. It can be one type of server, hostport, hostname, IPv4address, IPv6reference and reg_name.authority = server | reg_name
- Parameters:
escapedAuthority
- the raw escaped authority- Throws:
URIException
- IfparseAuthority(java.lang.String,boolean)
failsNullPointerException
- null authority
-
setEscapedAuthority
Set the authority. It can be one type of server, hostport, hostname, IPv4address, IPv6reference and reg_name. Note that there is no setAuthority method by the escape encoding reason.- Parameters:
escapedAuthority
- the escaped authority string- Throws:
URIException
- IfparseAuthority(java.lang.String,boolean)
fails
-
getRawAuthority
public char[] getRawAuthority()Get the raw-escaped authority.- Returns:
- the raw-escaped authority
-
getEscapedAuthority
Get the escaped authority.- Returns:
- the escaped authority
-
getAuthority
Get the authority.- Returns:
- the authority
- Throws:
URIException
- Ifdecode(char[], java.lang.String)
fails
-
getRawUserinfo
public char[] getRawUserinfo()Get the raw-escaped userinfo.- Returns:
- the raw-escaped userinfo
- See Also:
-
getEscapedUserinfo
Get the escaped userinfo.- Returns:
- the escaped userinfo
- See Also:
-
getUserinfo
Get the userinfo.- Returns:
- the userinfo
- Throws:
URIException
- Ifdecode(char[], java.lang.String)
fails- See Also:
-
getRawHost
public char[] getRawHost()Get the host.host = hostname | IPv4address | IPv6reference
- Returns:
- the host
- See Also:
-
getHost
Get the host.host = hostname | IPv4address | IPv6reference
- Returns:
- the host
- Throws:
URIException
- Ifdecode(char[], java.lang.String)
fails- See Also:
-
getPort
public int getPort()Get the port. In order to get the specfic default port, the specific protocol-supported class extended from the URI class should be used. It has the server-based naming authority.- Returns:
- the port if -1, it has the default port for the scheme or the server-based naming authority is not supported in the specific URI.
-
setRawPath
Set the raw-escaped path.- Parameters:
escapedPath
- the path character sequence- Throws:
URIException
- encoding error or not proper for initial instance- See Also:
-
setEscapedPath
Set the escaped path.- Parameters:
escapedPath
- the escaped path string- Throws:
URIException
- encoding error or not proper for initial instance- See Also:
-
setPath
Set the path.- Parameters:
path
- the path string- Throws:
URIException
- set incorrectly or fragment only- See Also:
-
resolvePath
Resolve the base and relative path.- Parameters:
basePath
- a character array of the basePathrelPath
- a character array of the relPath- Returns:
- the resolved path
- Throws:
URIException
- no more higher path level to be resolved
-
getRawCurrentHierPath
Get the raw-escaped current hierarchy level in the given path. If the last namespace is a collection, the slash mark ('/') should be ended with at the last character of the path string.- Parameters:
path
- the path- Returns:
- the current hierarchy level
- Throws:
URIException
- no hierarchy level
-
getRawCurrentHierPath
Get the raw-escaped current hierarchy level.- Returns:
- the raw-escaped current hierarchy level
- Throws:
URIException
- IfgetRawCurrentHierPath(char[])
fails.
-
getEscapedCurrentHierPath
Get the escaped current hierarchy level.- Returns:
- the escaped current hierarchy level
- Throws:
URIException
- IfgetRawCurrentHierPath(char[])
fails.
-
getCurrentHierPath
Get the current hierarchy level.- Returns:
- the current hierarchy level
- Throws:
URIException
- IfgetRawCurrentHierPath(char[])
fails.- See Also:
-
getRawAboveHierPath
Get the level above the this hierarchy level.- Returns:
- the raw above hierarchy level
- Throws:
URIException
- IfgetRawCurrentHierPath(char[])
fails.
-
getEscapedAboveHierPath
Get the level above the this hierarchy level.- Returns:
- the raw above hierarchy level
- Throws:
URIException
- IfgetRawCurrentHierPath(char[])
fails.
-
getAboveHierPath
Get the level above the this hierarchy level.- Returns:
- the above hierarchy level
- Throws:
URIException
- IfgetRawCurrentHierPath(char[])
fails.- See Also:
-
getRawPath
public char[] getRawPath()Get the raw-escaped path.path = [ abs_path | opaque_part ]
- Returns:
- the raw-escaped path
-
getEscapedPath
Get the escaped path.path = [ abs_path | opaque_part ] abs_path = "/" path_segments opaque_part = uric_no_slash *uric
- Returns:
- the escaped path string
-
getPath
Get the path.path = [ abs_path | opaque_part ]
- Returns:
- the path string
- Throws:
URIException
- Ifdecode(char[], java.lang.String)
fails.- See Also:
-
getRawName
public char[] getRawName()Get the raw-escaped basename of the path.- Returns:
- the raw-escaped basename
-
getEscapedName
Get the escaped basename of the path.- Returns:
- the escaped basename string
-
getName
Get the basename of the path.- Returns:
- the basename string
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- See Also:
-
getRawPathQuery
public char[] getRawPathQuery()Get the raw-escaped path and query.- Returns:
- the raw-escaped path and query
-
getEscapedPathQuery
Get the escaped query.- Returns:
- the escaped path and query string
-
getPathQuery
Get the path and query.- Returns:
- the path and query string.
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- See Also:
-
setRawQuery
Set the raw-escaped query.- Parameters:
escapedQuery
- the raw-escaped query- Throws:
URIException
- escaped query not valid
-
setEscapedQuery
Set the escaped query string.- Parameters:
escapedQuery
- the escaped query string- Throws:
URIException
- escaped query not valid
-
setQuery
Set the query. When a query string is not misunderstood the reserved special characters ("&", "=", "+", ",", and "$") within a query component, it is recommended to use in encoding the whole query with this method. The additional APIs for the special purpose using by the reserved special characters used in each protocol are implemented in each protocol classes inherited fromURI
. So refer to the same-named APIs implemented in each specific protocol instance.- Parameters:
query
- the query string.- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- See Also:
-
getRawQuery
public char[] getRawQuery()Get the raw-escaped query.- Returns:
- the raw-escaped query
-
getEscapedQuery
Get the escaped query.- Returns:
- the escaped query string
-
getQuery
Get the query.- Returns:
- the query string.
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- See Also:
-
setRawFragment
Set the raw-escaped fragment.- Parameters:
escapedFragment
- the raw-escaped fragment- Throws:
URIException
- escaped fragment not valid
-
setEscapedFragment
Set the escaped fragment string.- Parameters:
escapedFragment
- the escaped fragment string- Throws:
URIException
- escaped fragment not valid
-
setFragment
Set the fragment.- Parameters:
fragment
- the fragment string.- Throws:
URIException
- If an error occurs.
-
getRawFragment
public char[] getRawFragment()Get the raw-escaped fragment. The optional fragment identifier is not part of a URI, but is often used in conjunction with a URI. The format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.- Returns:
- the raw-escaped fragment
-
getEscapedFragment
Get the escaped fragment.- Returns:
- the escaped fragment string
-
getFragment
Get the fragment.- Returns:
- the fragment string
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- See Also:
-
removeFragmentIdentifier
protected char[] removeFragmentIdentifier(char[] component) Remove the fragment identifier of the given component.- Parameters:
component
- the component that a fragment may be included- Returns:
- the component that the fragment identifier is removed
-
normalize
Normalize the given hier path part. Algorithm taken from URI reference parser at http://www.apache.org/~fielding/uri/rev-2002/issues.html.- Parameters:
path
- the path to normalize- Returns:
- the normalized path
- Throws:
URIException
- no more higher path level to be normalized
-
normalize
Normalizes the path part of this URI. Normalization is only meant to be performed on URIs with an absolute path. Calling this method on a relative path URI will have no effect.- Throws:
URIException
- no more higher path level to be normalized- See Also:
-
equals
protected boolean equals(char[] first, char[] second) Test if the first array is equal to the second array.- Parameters:
first
- the first character arraysecond
- the second character array- Returns:
- true if they're equal
-
equals
Test an object if this URI is equal to another. -
hashCode
public int hashCode()Return a hash code for this URI. -
compareTo
Compare this URI to another object.- Specified by:
compareTo
in interfaceComparable<URI>
- Parameters:
another
- the object to be compared.- Returns:
- 0, if it's same, -1, if failed, first being compared with in the authority component
- Throws:
ClassCastException
- not URI argument
-
clone
Create and return a copy of this object, the URI-reference containing the userinfo component. Notice that the whole URI-reference including the userinfo component counld not be gotten as aString
. To copy the identicalURI
object including the userinfo component, it should be used.- Overrides:
clone
in classObject
- Returns:
- a clone of this instance
- Throws:
CloneNotSupportedException
-
getRawURI
public char[] getRawURI()It can be gotten the URI character sequence. It's raw-escaped. For the purpose of the protocol to be transported, it will be useful. It is clearly unwise to use a URL that contains a password which is intended to be secret. In particular, the use of a password within the 'userinfo' component of a URL is strongly disrecommended except in those rare cases where the 'password' parameter is intended to be public. When you want to get each part of the userinfo, you need to use the specific methods in the specific URL. It depends on the specific URL.- Returns:
- the URI character sequence
-
getEscapedURI
It can be gotten the URI character sequence. It's escaped. For the purpose of the protocol to be transported, it will be useful.- Returns:
- the escaped URI string
-
getURI
It can be gotten the URI character sequence.- Returns:
- the original URI string
- Throws:
URIException
- incomplete trailing escape pattern or unsupported character encoding- See Also:
-
getRawURIReference
public char[] getRawURIReference()Get the URI reference character sequence.- Returns:
- the URI reference character sequence
-
getEscapedURIReference
Get the escaped URI reference string.- Returns:
- the escaped URI reference string
-
getURIReference
Get the original URI reference string.- Returns:
- the original URI reference string
- Throws:
URIException
- Ifdecode(char[], java.lang.String)
fails.
-
toString
Get the escaped URI string. On the document, the URI-reference form is only used without the userinfo component like http://jakarta.apache.org/ by the security reason. But the URI-reference form with the userinfo component could be parsed. In other words, this URI and any its subclasses must not expose the URI-reference expression with the userinfo component like http://user:password@hostport/restricted_zone.
It means that the API client programmer should extract each user and password to access manually. Probably it will be supported in the each subclass, however, not a whole URI-reference expression. -
getString
Converts the byte array of HTTP content characters to a string. If the specified charset is not supported, default system encoding is used.- Parameters:
data
- the byte array to be encodedcharset
- the desired character encoding- Returns:
- The result of the conversion.
- Since:
- 3.0
-
getAsciiBytes
Converts the specified string to byte array of ASCII characters.- Parameters:
data
- the string to be encoded- Returns:
- The string as a byte array.
- Since:
- 3.0
-