Substitute character

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some programming languages.

In the

replacement character

(�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.

Uses

End of file

Historically, under

Concurrent DOS, and DOS Plus) did support byte-granular files,^[8]^[9] so this was no longer a requirement, but it remained as a convention (especially for text files

) in order to ensure backward compatibility.

In

command line

window (and as such, often used to finish console input redirection, e.g. as instigated by the command COPY CON: TYPEDTXT.TXT).

While no longer technically required to indicate the end of a file, as of 2017, many text editors^[

APIs

of those systems use the character to denote the actual end of a file.
Some programming languages (e.g. Visual Basic) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.),^{[citation needed]} and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it.
Character 26 was used to mark "End of file" even though ASCII calls this character Substitute, and has other characters to indicate "End of file". Number 28 which is called "
File Separator
" has also been used for similar purposes.

Other uses

In
bg
).
The Unicode Security Considerations report^[12] recommends this character as a safe replacement for unmappable characters during character set conversion.
In many GUIs and applications, Control+Z (⌘ Command+Z on
Xerox PARC to control text editing
.

Representation

ASCII and Unicode representation of "substitute":

Octal code: 32

Decimal code: 26

Hexadecimal code: 1A, U+001A

Mnemonic symbol: SUB

Binary value: 11010

See also

ISO 646
)

U+FFFD
(Unicode replacement character �)

Access key

Control-C

Control-G

Control-V

Control-X

Control-\

Keyboard shortcut

List of file signatures

.notdef
, a symbol (sometimes called by the slang term tofu) used to represent a missing character
Noto fonts, a Google project to eliminate missing characters

References

^ "Keyboard shortcuts for Windows". Microsoft Support. Microsoft. Retrieved 2012-06-02.

^ "Table of IO Device Characteristics - Console or Teletypewriters". PDP-6 Multiprogramming System Manual (PDF). Maynard, Massachusetts, USA: Digital Equipment Corporation (DEC). 1965. p. 43. DEC-6-0-EX-SYS-UM-IP-PRE00. Archived (PDF) from the original on 2014-07-14. Retrieved 2014-07-10. (1+84+10 pages)

^ "5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line)". PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors (PDF). Vol. 3. Digital Equipment Corporation (DEC). 1969. pp. 5-3 – 5-6 [5-5 (431)]. Archived (PDF) from the original on 2011-11-15. Retrieved 2014-07-10. (207 pages)

^ Elliott, John C. (1998). "CP/M 1.4 disc formats". Archived from the original on 2020-11-14. Retrieved 2021-11-18.

^ Elliott, John C. (1998). "CP/M 2.2 disc formats". Archived from the original on 2020-11-05. Retrieved 2021-11-18.

control-Z character (1AH) or a real end of file, returned by the CP/M read operation. Control-Z characters embedded within machine code files (e.g., COM files
) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. [...] (56 pages)

end-of-file marker is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. [...] [1][2]

^ Elliott, John C. (1998). "CP/M 3.1 disc formats". Archived from the original on 2021-10-26. Retrieved 2021-11-18.

^ Elliott, John C. (1998). "CP/M 4.1 disc formats". Archived from the original on 2020-11-05. Retrieved 2021-11-18.

^ CSV-1203 format specification Archived 2016-05-16 at the Portuguese Web Archive

^ "Quick Reference: Unix Commands". IT Connect. University of Washington. Retrieved 2012-06-02.

^ Unicode Security Considerations report

Further reading

Federal Standard 1037C

Retrieved from "https://en.wikipedia.org/w/index.php?title=Substitute_character&oldid=1210835871"

[Microsoft_126449-1] "Keyboard shortcuts for Windows". Microsoft Support. Microsoft. Retrieved 2012-06-02.

[DEC_1965_PDP-6-2] "Table of IO Device Characteristics - Console or Teletypewriters". PDP-6 Multiprogramming System Manual (PDF). Maynard, Massachusetts, USA: Digital Equipment Corporation (DEC). 1965. p. 43. DEC-6-0-EX-SYS-UM-IP-PRE00. Archived (PDF) from the original on 2014-07-14. Retrieved 2014-07-10. (1+84+10 pages)

[DEC_1969_PDP-10-3] "5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line)". PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors (PDF). Vol. 3. Digital Equipment Corporation (DEC). 1969. pp. 5-3 – 5-6 [5-5 (431)]. Archived (PDF) from the original on 2011-11-15. Retrieved 2014-07-10. (207 pages)

[Elliott_1998_CPM14-4] Elliott, John C. (1998). "CP/M 1.4 disc formats". Archived from the original on 2020-11-14. Retrieved 2021-11-18.

[Elliott_1998_CPM22-5] Elliott, John C. (1998). "CP/M 2.2 disc formats". Archived from the original on 2020-11-05. Retrieved 2021-11-18.

[DRI_1979_CPM20-IG-6] trol-Z character (1AH) or a real end of file, returned by the CP/M read operation. Control-Z characters embedded within machine code files (e.g., COM files
) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. [...] (56 pages)

[Hogan_1982_CP/M-7] -of-file marker is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. [...] [1][2]

[Elliott_1998_CPM31-8] Elliott, John C. (1998). "CP/M 3.1 disc formats". Archived from the original on 2021-10-26. Retrieved 2021-11-18.

[Elliott_1998_DOSPLUS-9] Elliott, John C. (1998). "CP/M 4.1 disc formats". Archived from the original on 2020-11-05. Retrieved 2021-11-18.

[Mastpoint_2016_CSV-1203-10] CSV-1203 format specification Archived 2016-05-16 at the Portuguese Web Archive

[UW_Unix-11] "Quick Reference: Unix Commands". IT Connect. University of Washington. Retrieved 2012-06-02.

[Unicode_USC-12] Unicode Security Considerations report

[8]

[9]

[12]