URL
Uniform Resource Locator | |
Abbreviation | URL |
---|---|
Status | Published |
First published | 1994 |
Latest version | Living Standard 2023 |
Organization | Web Hypertext Application Technology Working Group (WHATWG) |
Series | Request for Comments (RFC) |
Editors | Anne van Kesteren |
Authors | Tim Berners-Lee |
Base standards |
|
Related standards | URI, URN |
Domain | World Wide Web |
License | CC BY 4.0 |
Website | url |
A Uniform Resource Locator (URL), colloquially known as an address on the
), and many other applications.Most web browsers display the URL of a web page above the page in an address bar. A typical URL could have the form http://www.example.com/index.html
, which indicates a protocol (http
), a hostname (www.example.com
), and a file name (index.html
).
History
Uniform Resource Locators were defined in
The format combines the pre-existing system of
//
).[9]Berners-Lee later expressed regret at the use of dots to separate the parts of the
Early WorldWideWeb collaborators including Berners-Lee originally proposed the use of UDIs: Universal Document Identifiers. An early (1993) draft of the HTML Specification[11] referred to "Universal" Resource Locators. This was dropped some time between June 1994 (RFC 1630) and October 1994 (draft-ietf-uri-url-08.txt).[12] In his book Weaving the Web, Berners-Lee emphasizes his preference for the original inclusion of "universal" in the expansion rather than the word "uniform", to which it was later changed, and he gives a brief account of the contention that led to the change.
Syntax
Every HTTP URL conforms to the syntax of a generic URI. The URI generic syntax consists of five components organized hierarchically in order of decreasing significance from left to right:[13]
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
A component is undefined if it has an associated delimiter and the delimiter does not appear in the URI; the scheme and path components are always defined.[14] A component is empty if it has no characters; the scheme component is always non-empty.[13]
The authority component consists of subcomponents:
authority = [userinfo "@"] host [":" port]
This is represented in a syntax diagram as:
The URI comprises:
- A non-empty scheme component followed by a colon (
:
), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (+
), period (.
), or hyphen (-
). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include - An optional authority component preceded by two slashes (
//
), comprising:- An optional userinfo subcomponent followed by an at symbol (
@
), that may consist of a user name and an optional password preceded by a colon (:
). Use of the formatusername:password
in the userinfo subcomponent is deprecated for security reasons. Applications should not render as clear text any data after the first colon (:
) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). - A host subcomponent, consisting of either a registered name (including but not limited to a IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets (
[]
).[16][c] - An optional port subcomponent preceded by a colon (
:
), consisting of decimal digits.
- An optional userinfo subcomponent followed by an at symbol (
- A path component, consisting of a sequence of path segments separated by a slash (
/
). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (//
) in the path component. A path component may resemble or map exactly to a file system path but does not always imply a relation to one. If an authority component is defined, then the path component must either be empty or begin with a slash (/
). If an authority component is undefined, then the path cannot begin with an empty segment—that is, with two slashes (//
)—since the following characters would be interpreted as an authority component.[18]
- By convention, in http and https URIs, the last part of a path is named pathinfo and it is optional. It is composed by zero or more path segments that do not refer to an existing physical resource name (e.g. a file, an internal module program or an executable program) but to a logical part (e.g. a command or a qualifier part) that has to be passed separately to the first part of the path that identifies an executable module or program managed by a web server; this is often used to select dynamic content (a document, etc.) or to tailor it as requested (see also: CGI and PATH_INFO, etc.).
- Example:
- URI:
"http://www.example.com/questions/3456/my-document"
- where:
"/questions"
is the first part of the path (an executable module or program) and"/3456/my-document"
is the second part of the path named pathinfo, which is passed to the executable module or program named"/questions"
to select the requested document.
- URI:
- An http or https URI containing a pathinfo part without a query part may also be referred to as a 'clean URL' whose last part may be a 'slug'.
Query delimiter | Example |
---|---|
Ampersand (& )
|
key1=value1&key2=value2
|
Semicolon (; )[d]
|
key1=value1;key2=value2
|
- An optional query component preceded by a question mark (
?
), consisting of aattribute–value pairs separated by a delimiter. - An optional fragment component preceded by a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often anof a specific element, and web browsers will scroll this element into view.
id
attribute
A web browser will usually
https
scheme require that requests and responses be made over a secure connection to the website
Internationalized URL
Internet users are distributed throughout the world using a wide variety of languages and alphabets, and expect to be able to create URLs in their own local alphabets. An Internationalized Resource Identifier (IRI) is a form of URL that includes Unicode characters. All modern browsers support IRIs. The parts of the URL requiring special treatment for different alphabets are the domain name and path.[20][21]
The domain name in the IRI is known as an
http://例子.卷筒纸
becomes http://xn--fsqu00a.xn--3lr804guic/
. The xn--
indicates that the character was not originally ASCII.[22]The URL path name can also be specified by the user in the local writing system. If not already encoded, it is converted to UTF-8, and any characters not part of the basic URL character set are escaped as hexadecimal using percent-encoding; for example, the Japanese URL http://example.com/引き割り.html
becomes http://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
. The target computer decodes the address and displays the page.[20]
Protocol-relative URLs
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, //example.com
will use the protocol of the current page, typically HTTP or HTTPS.[23][24]
See also
- Hyperlink
- PURL – Persistent URL
- CURIE (Compact URI)
- URI fragment
- Internet resource locator (IRL)
- Internationalized Resource Identifier (IRI)
- Clean URL
- Typosquatting
- Uniform Resource Identifier
- URI normalization
- Use of slashes in networking
Notes
- ^ A URL implies the means to access an indicated resource and is denoted by a protocol or an access mechanism, which is not true of every URI.[5][4] Thus
http://www.example.com
is a URL, whilewww.example.com
is not.[6] - ^ For URIs relating to resources on the World Wide Web, some web browsers allow
.0
portions of dot-decimal notation to be dropped or raw integer IP addresses to be used.[17]
Citations
- ^ W3C (2009).
- ^ "Forward and Backslashes in URLs". zzz.buzz. Archived from the original on 2018-09-04. Retrieved 2018-09-19.
- ^ RFC 3986 (2005).
- ^ a b Joint W3C/IETF URI Planning Interest Group (2002).
- ^ RFC 2396 (1998).
- ^ Miessler, Daniel. "The Difference Between URLs and URIs". Archived from the original on 2017-03-17. Retrieved 2017-03-16.
- ^ a b W3C (1994).
- ^ IETF (1992).
- ^ a b Berners-Lee (2015).
- ^ BBC News (2009).
- Connolly, Daniel "Dan" (March 1993). Hypertext Markup Language (draft RFCxxx) (Technical report). p. 28. Archivedfrom the original on 2017-10-23. Retrieved 2017-10-23.
- McCahill, Mark Perry (October 1994). Uniform Resource Locators (URL) (Technical report). (This Internet-Draft was published as a Proposed Standard RFC, RFC 1738 (1994)) Cited in Ang, C. S.; Martin, D. C. (January 1995). Constituent Component Interface++ (Technical report). UCSF Library and Center for Knowledge Management. Archivedfrom the original on 2017-10-23. Retrieved 2017-10-23.
- ^ a b RFC 3986 (2005), §3.
- ^ RFC 3986 (2005), §5.2.1.
- ^ IETF (2015).
- ^ RFC 3986 (2005), §3.2.2.
- ^ Lawrence (2014).
- ^ RFC 2396 (1998), §3.3.
- ^ RFC 1866 (1995), §8.2.1.
- ^ a b W3C (2008).
- ^ W3C (2014).
- ^ IANA (2003).
- ISBN 978-1-48220903-7. Retrieved 2015-10-12.
- ISBN 978-1-11808130-3. Retrieved 2015-10-12.
References
- "Berners-Lee "sorry" for slashes". BBC News. 2009-10-14. Archived from the original on 2020-06-05. Retrieved 2010-02-14.
- "Living Documents BoF Minutes". World Wide Web Consortium. 1992-03-18. Archived from the original on 2012-11-22. Retrieved 2011-12-26.
- Berners-Lee, Tim (1994-03-21). "Uniform Resource Locators (URL): A Syntax for the Expression of Access Information of Objects on the Network". World Wide Web Consortium. Archived from the original on 2015-09-09. Retrieved 2015-09-13.
- . Retrieved 2015-08-31.
- Berners-Lee, Tim (2015) [2000]. "Why the //, #, etc?". Frequently asked questions. World Wide Web Consortium. Archived from the original on 2020-05-14. Retrieved 2010-02-03.
- Sperberg-McQueen, C. Michael, eds. (2009-05-21). "Web addresses in HTML 5". World Wide Web Consortium. Archivedfrom the original on 2015-07-10. Retrieved 2015-09-13.
- IANA (2003-02-14). "Completion of IANA Selection of IDNA Prefix". IETF-Announce mailing list. Archived from the original on 2004-12-08. Retrieved 2015-09-03.
- from the original on 2011-08-27. Retrieved 2015-09-13.
- . Retrieved 2015-08-31.
- Hansen, Tony; Hardie, Ted (June 2015). Thaler, Dave (ed.). Guidelines and Registration Procedures for URI Schemes. .
- . Retrieved 2015-09-13.
- . Retrieved 2015-08-31.
- "An Introduction to Multilingual Web Addresses". 2008-05-09. Archived from the original on 2015-01-05. Retrieved 2015-01-11.
- Phillip, A. (2014). "What is Happening with "International URLs"". World Wide Web Consortium. Archived from the original on 2015-02-17. Retrieved 2015-01-11.
- Lawrence, Eric (2014-03-06). "Browser Arcana: IP Literals in URLs". docs.microsoft.com. Archived from the original on 2020-06-22. Retrieved 2020-06-22.