Digraphs and trigraphs (programming)
This article needs additional citations for verification. (September 2008) |
In computer programming, digraphs and trigraphs are sequences of two and three characters, respectively, that appear in source code and, according to a programming language's specification, should be treated as if they were single characters. Trigraphs have been removed from the C++ language, and will be from C as of C23, thus likely aren't used much in practice in C already, nor in any other mainstream language (use of them in the language J is an exception). In the modern world of Unicode/UTF-8 (even just with ASCII) there's no need for trigraphs in language design, which were considered a burden, and neither really digraphs, that likely have very few users, at least in those languages.
Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire
{
and }
.
History
The basic character set of the
Implementations
Trigraphs are not commonly encountered outside compiler test suites.[2] Some compilers support an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files. Borland supplied a separate program, the trigraph preprocessor (TRIGRAPH.EXE
), to be used only when trigraph processing is desired (the rationale was to maximise speed of compilation).
Language support
Different systems define different sets of digraphs and trigraphs, as described below.
ALGOL
Early versions of
:=
for ←
(assignment) and >=
for ≥
(greater than or equal).
Pascal
The Pascal programming language supports digraphs (.
, .)
, (*
and *)
for [
, ]
, {
and }
respectively. Unlike all other cases mentioned here, (*
and *)
were and still are in wide use. However, many compilers treat them as a different type of commenting block rather than as actual digraphs, that is, a comment started with (*
cannot be closed with }
and vice versa.
J
The
.
(dot) and :
(colon) characters are used to inflect ASCII symbols, effectively interpreting unigraphs, digraphs or rarely trigraphs as standalone "symbols".[3]Unlike the use of digraphs and trigraphs in C and C++, there are no single-character equivalents to these in J.
C
Trigraph | Equivalent |
---|---|
??= |
#
|
??/ |
\
|
??' |
^
|
??( |
[
|
??) |
]
|
??! |
|
|
??< |
{
|
??> |
}
|
??- |
~
|
The C preprocessor (used for C and with slight differences in C++; see below) replaces all occurrences of the nine trigraph sequences in this table by their single-character equivalents before any other processing (until C23[4]).[5][6]
A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two consecutive ?
tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants, string literals, and comments. This is particularly a problem for the classic Mac OS, where the constant '????'
may be used as a file type or creator. To safely place two consecutive question marks within a string literal, the programmer can use string concatenation "...?""?..."
or an escape sequence "...?\?..."
.
???
is not itself a trigraph sequence, but when followed by a character such as -
it will be interpreted as ?
+ ??-
, as in the example below which has 16 ?
s before the /
.
The ??/
trigraph can be used to introduce an escaped newline for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor. It can also cause surprises, particularly within comments. For example:
// Will the next line be executed????????????????/
a++;
which is a single logical comment line (used in C++ and C99), and
/??/
* A comment *??/
/
which is a correctly formed block comment. The concept can be used to check for trigraphs as in the following C99 example, where only one return statement will be executed.
int trigraphsavailable() // returns 0 or 1; language standard C99 or later
{
// are trigraphs available??/
return 0;
return 1;
}
Digraph | Equivalent |
---|---|
<: |
[
|
:> |
]
|
<% |
{
|
%> |
}
|
%: |
#
|
In 1994, a normative amendment to the C standard,[specify] included in C99, supplied digraphs as more readable alternatives to five of the trigraphs.
Unlike trigraphs, digraphs are handled during
%:%:
replacing the preprocessor concatenation token ##
. If a digraph sequence occurs inside another token, for example a quoted string, or a character constant, it will not be replaced.
C++
Token | Equivalent |
---|---|
compl |
~
|
not |
!
|
bitand |
&
|
bitor |
|
|
and |
&&
|
or |
||
|
xor |
^
|
and_eq |
&=
|
or_eq |
|=
|
xor_eq |
^=
|
not_eq |
!=
|
C++ (through C++14, see below) behaves like C, including the C99 additions, but with additional tokens listed in the table.[7]
As a note, %:%:
is treated as a single token, rather than two occurrences of %:
.
In the sequence <::
if the subsequent character is neither :
nor >
, the <
is treated as a preprocessing token by itself and not as the first character of the alternative token <:
. This is done so certain uses of templates are not broken by the substitution.
The C++ Standard makes this comment with regards to the term "digraph":[8]
The term "digraph" (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is
%:%:
and of course several primary tokens contain two characters. Nonetheless, those alternative tokens that aren't lexical keywords are colloquially known as "digraphs".
Trigraphs were proposed for deprecation in
RPL
Application support
Vim
The
can be displayed by typing :dig.GNU Screen
GNU Screen has a digraph command, bound to Ctrl+A Ctrl+V by default.[20]
Lotus
Lotus 1-2-3 for DOS uses Alt+F1 as compose key to allow easier input of many special characters of the Lotus International Character Set (LICS)[21] and Lotus Multi-Byte Character Set (LMBCS).
See also
- Compose key
- List of XML and HTML character entity references
- Escape sequence
- Escape sequences in C
- C alternative tokens
References
- ^ Rationale for International Standard—Programming Languages—C (PDF). Revision 5.10. pp. 20–21.
- ^ Jones, Derek M. "Sentence 117". The New C Standard: An Economic and Cultural Commentary.
- ^ Hui, Roger. "Vocabulary". jsoftware.com. Archived from the original on 2019-04-02. Retrieved 2015-04-16.
- ^ "Removing trigraphs??!" (PDF).
- ISBN 0-470-84573-2.
- ^ "Rationale for International Standard - Programming Languages - C" (PDF). 5.10. April 2003. Archived (PDF) from the original on 2016-06-06. Retrieved 2010-10-17.
- ISBN 0-201-54330-3.
- ^ Du Toit, Stefanus, ed. (2012-01-16). "Working Draft, Standard for Programming Language C++" (PDF). N3337. Archived (PDF) from the original on 2019-05-08. Retrieved 2019-05-08.
- ^ "C++0X, CD 1, National Body Comments" (PDF). 2009-01-30. SC22/WG21 N2837 comment UK 11. Archived (PDF) from the original on 2017-08-01. Retrieved 2019-05-12.
- ^ Wong, Michael; Tong, Hubert; Klarer, Robert; McIntosh, Ian; Mak, Raymond; Cambly, Christopher; LaBonté, Alain (2009-06-19). "Comment on Proposed Trigraph Deprecation" (PDF). N2910. Archived (PDF) from the original on 2017-08-01. Retrieved 2019-05-12.
- ^ a b Smith, Richard (2014-05-06). "Removing trigraphs??!". N3981. Archived from the original on 2018-07-09. Retrieved 2019-05-12.
- ^ Wong, Michael; Tong, Hubert; Bhakta, Rajan; Inglis, Derek (2014-10-10). "IBM comment on preparing for a Trigraph-adverse future in C++17" (PDF). IBM paper N4210. Archived (PDF) from the original on 2018-09-11. Retrieved 2019-05-12.
- ^ HP 82240B Infrared Printer (1 ed.). Corvallis, OR, USA: Hewlett-Packard. August 1989. HP reorder number 82240-90014. Archived from the original on 2016-08-14. Retrieved 2016-08-01.
- ^ a b c d HP 48G Series – User's Guide (UG) (8 ed.). Hewlett-Packard. December 1994 [1993]. pp. 2–5, 27–16. HP 00048-90126, (00048-90104). Archived from the original on 2016-08-06. Retrieved 2015-09-06. [1]
- ^ a b c d HP 50g / 49g+ / 48gII graphing calculator advanced user's reference manual (AUR) (2 ed.). Hewlett-Packard. 2009-07-14 [2005]. pp. J-1, J-2. HP F2228-90010. Archived from the original on 2018-07-08. Retrieved 2015-10-10. Searchable PDF
- ^ a b c "HP RPL TIO Table". holyjoe.org. Archived from the original on 2016-05-23. Retrieved 2015-01-23.
- ^ a b Heinz, Sr., Michael W. (2005). "HP-ASCII and Trigraphs". Archived from the original on 2016-08-02. Retrieved 2016-08-02.
- ^ Finseth, Craig A. (2012-02-25). "chars". Archived from the original on 2017-12-21. Retrieved 2017-12-21.
- ^ "Vim documentation: *digraphs-default*". 2011-01-15. Archived from the original on 2018-12-20. Retrieved 2019-05-12.
- ^ "Digraph - Screen User's Manual". Archived from the original on 2018-12-31. Retrieved 2019-05-12.
- Hewlett-Packard Company, Corvallis Division. June 1991 [March 1991]. F0001-90003. Archived(PDF) from the original on 2016-11-28. Retrieved 2016-11-27.
External links
- RFC 1345