Internationalized Resource Identifier
Internationalized Resource Identifier | |
Abbreviation | IRI |
---|---|
Status | Proposed Standard |
Year started | 22 April 2002 |
First published | 22 April 2002 |
Latest version | 21 January 2020 |
Organization | IETF |
Authors |
|
Base standards |
|
Domain | RFC 3987 |
The Internationalized Resource Identifier (IRI) is an
Syntax
IRIs extend URIs by using the
Compatibility
IRIs are mapped to URIs to retain backwards-compatibility with systems that do not support the new format.[6]
For applications and protocols that do not allow direct consumption of IRIs, the IRI should first be converted to Unicode using canonical composition normalization (NFC), if not already in Unicode format.
All non-ASCII code points in the IRI should next be encoded as UTF-8, and the resulting bytes percent-encoded, to produce a valid URI.
Example: The IRI https://en.wiktionary.org/wiki/Ῥόδος becomes the URI https://en.wiktionary.org/wiki/%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82
ASCII code points that are invalid URI characters may be encoded the same way, depending on implementation.[6]
This conversion is easily reversible; by definition, converting an IRI to an URI and back again will yield an IRI that is semantically equivalent to the original IRI, even though it may differ in exact representation.[7]
Some protocols may impose further transformations; e.g. Punycode for DNS labels.
Advantages
There are reasons to see URIs displayed in different languages; mostly, it makes it easier for users who are unfamiliar with the Latin (A–Z) alphabet. Assuming that it isn't too difficult for anyone to replicate arbitrary Unicode on their keyboards, this can make the
Disadvantages
Mixing IRIs and
www.myfictionαlbank.com
and point that IRI to a malicious site. This is known as an IDN homograph attackWhile a URI does not provide people with a way to specify web resources using their own alphabets, an IRI does not make clear how web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters. This means that IRIs are now handled in a way very similar to many other software which might require the use of a non-keyboard input method when dealing with texts in various languages.
See also
- IDN (Internationalized Domain Name)
- Semantic Web
- Punycode
- XRI(Extensible Resource Identifier)
References
- ^ Gangemi, Aldo; Presutti, Valentina (2006). "The bourne identity of a web resource" (PDF). Proceedings of Identity Reference and the Web Workshop (IRW). Laboratory for Applied Ontology: 3.
Notice that IRIs (Internationalized Resource Identifier) [11] are supposed to replace URIs in next future.
- ^ Suignard, Michel (January 2005). "Internationalized Resource Identifiers (IRIs)". tools.ietf.org. Retrieved 2018-06-09.
This document defines a new protocol element, the Internationalized Resource Identifier (IRI), as a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined, which means that IRIs can be used instead of URIs, where appropriate, to identify resources. The approach of defining a new protocol element was chosen instead of extending or changing the definition of URIs.
- ^ Suignard, Michel (January 2005). "Internationalized Resource Identifiers (IRIs)". tools.ietf.org. Retrieved 2018-06-09.
This document defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. It also defines "internationalized" versions corresponding to other constructs from [RFC3986], such as URI references. The syntax of IRIs is defined in section 2, and the relationship between IRIs and URIs in section 3.
- ^ Suignard, Michel (January 2005). "Internationalized Resource Identifiers (IRIs)". tools.ietf.org. Retrieved 2018-06-09.
- ^ Suignard, Michel (January 2005). "Internationalized Resource Identifiers (IRIs)". tools.ietf.org. Retrieved 2018-06-09.
- ^ a b c Duerst, M. (2005). "RFC 3987". Network Working Group. Standards Track. Retrieved 12 October 2014.
- ISBN 978-3-540-92912-3. Retrieved 12 October 2014.)
{{cite book}}
: CS1 maint: multiple names: authors list (link - ^ Clark, Kendall (2003-05-07). "Internationalizing the URI". O’Reilly Media, Inc. Retrieved 12 October 2014.