Endianness
This article needs additional citations for verification. (July 2020) |
In
Computers store information in various-sized groups of binary bits. Each group is assigned a number, called its address, that the computer uses to access that data. On most modern computers, the smallest data group with an address is eight bits long and is called a byte. Larger groups comprise two or more bytes, for example, a 32-bit word contains four bytes. There are two possible ways a computer could number the individual bytes in a larger group, starting at either end. Both types of endianness are in widespread use in digital electronic engineering. The initial choice of endianness of a new design is often arbitrary, but later technology revisions and updates perpetuate the existing endianness to maintain backward compatibility.
A big-endian system stores the
Big-endianness is the dominant ordering in networking protocols, such as in the
Characteristics
Computer memory consists of a sequence of storage cells (smallest addressable units); in machines that support byte addressing, those units are called bytes. Each byte is identified and accessed in hardware and software by its memory address. If the total number of bytes in memory is n, then addresses are enumerated from 0 to n − 1.
Computer programs often use data structures or fields that may consist of more data than can be stored in one byte. In the context of this article where its type cannot be arbitrarily complicated, a "field" consists of a consecutive sequence of bytes and represents a "simple data value" which – at least potentially – can be manipulated by one single hardware instruction. On most systems, the address of a multi-byte simple data value is the address of its first byte (the byte with the lowest address). An exception to this rule is e.g. the Add instruction of the IBM 1401 which addresses variable-length fields at their low-order (highest-addressed) position with their lengths being defined by a word mark set at their high-order (lowest-addressed) position. When an operation such as addition is performed, the processor begins at the low-order positions at the high addresses of the two fields and works its way down to the high-order.[citation needed]
Another important attribute of a byte being part of a "field" is its "significance". These attributes of the parts of a field play an important role in the sequence the bytes are accessed by the computer hardware, more precisely: by the low-level algorithms contributing to the results of a computer instruction.
Numbers
Positional number systems (mostly base 2, or less often base 10) are the predominant way of representing and particularly of manipulating integer data by computers. In pure form this is valid for moderate sized non-negative integers, e.g. of C data type unsigned
. In such a number system, the value of a digit which it contributes to the whole number is determined not only by its value as a single digit, but also by the position it holds in the complete number, called its significance. These positions can be mapped to memory mainly in two ways:[12]
- Decreasing numeric significance with increasing memory addresses (or increasing time), known as big-endian and
- Increasing numeric significance with increasing memory addresses (or increasing time), known as little-endian.
In these expressions, the term "end" is meant as the extremity where the big resp. little significance is written first, namely where the field starts.
The integer data that are directly supported by the computer hardware have a fixed width of a low power of 2, e.g. 8 bits ≙ 1 byte, 16 bits ≙ 2 bytes, 32 bits ≙ 4 bytes, 64 bits ≙ 8 bytes, 128 bits ≙ 16 bytes. The low-level access sequence to the bytes of such a field depends on the operation to be performed. The least-significant byte is accessed first for addition, subtraction and multiplication. The most-significant byte is accessed first for division and comparison. See § Calculation order.
Text
When character (text) strings are to be compared with one another, e.g. in order to support some mechanism like
Integer numbers written as text are always represented most significant digit first in memory, which is similar to big-endian, independently of
Byte addressing
When memory bytes are printed sequentially from left to right (e.g. in a hex dump), little-endian representation of integers has the significance increasing from left to right. In other words, it appears backwards when visualized, which can be counter-intuitive.
This behavior arises, for example, in FourCC or similar techniques that involve packing characters into an integer, so that it becomes a sequences of specific characters in memory. For example, take the string "JOHN", stored in hexadecimal ASCII. On big-endian machines, the value appears left-to-right, coinciding with the correct string order for reading the result ("J O H N"). But on a little-endian machine, one would see "N H O J". Middle-endian machines complicate this even further; for example, on the PDP-11, the 32-bit value is stored as two 16-bit words "JO" "HN" in big-endian, with the characters in the 16-bit words being stored in little-endian, resulting in "O J N H".[citation needed]
Byte swapping
Byte-swapping consists of rearranging bytes to change endianness. Many compilers provide built-ins that are likely to be compiled into native processor instructions (bswap
/movbe
), such as __builtin_bswap32
. Software interfaces for swapping include:
- Standard network endianness functions (from/to BE, up to 32-bit).[13] Windows has a 64-bit extension in
winsock2.h
. - BSD and Glibc
endian.h
functions (from/to BE and LE, up to 64-bit).[14] - macOS
OSByteOrder.h
macros (from/to BE and LE, up to 64-bit). - The
std::byteswap
function in C++23.[15]
Some
Some compilers have built-in facilities for byte swapping. For example, the Intel Fortran compiler supports the non-standard CONVERT
specifier when opening a file, e.g.: OPEN(unit, CONVERT='BIG_ENDIAN',...)
. Other compilers have options for generating code that globally enables the conversion for all file IO operations. This permits the reuse of code on a system with the opposite endianness without code modification.
Considerations
Simplified access to part of a field
On most systems, the address of a multi-byte value is the address of its first byte (the byte with the lowest address); little-endian systems of that type have the property that, for sufficiently low data values, the same value can be read from memory at different lengths without using different addresses (even when
4A 00 00 00
can be read at the same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value. Although this little-endian property is rarely used directly by high-level programmers, it is occasionally employed by code optimizers as well as by assembly language programmers. While not allowed by C++, such type punning code is allowed as "implementation-defined" by the C11 standard[19] and commonly used[20] in code interacting with hardware.[21]Calculation order
Some operations in
Addition, subtraction, and multiplication start at the least significant digit position and propagate the carry to the subsequent more significant position. On most systems, the address of a multi-byte value is the address of its first byte (the byte with the lowest address). The implementation of these operations is marginally simpler using little-endian machines where this first byte contains the least significant digit.
Comparison and division start at the most significant digit and propagate a possible carry to the subsequent less significant digits. For fixed-length numerical values (typically of length 1,2,4,8,16), the implementation of these operations is marginally simpler on big-endian machines.
Some big-endian processors (e.g. the IBM System/360 and its successors) contain hardware instructions for lexicographically comparing varying length character strings.
The normal data transport by an assignment statement is in principle independent of the endianness of the processor.
Hardware
Many historical and extant processors use a big-endian memory representation, either exclusively or as a design option. The
The
, and many other processors and processor families are also little-endian.The Intel
The IA-32 and x86-64 instruction set architectures use the little-endian format. Other instruction set architectures that follow this convention, allowing only little-endian mode, include Nios II, Andes Technology NDS32, and Qualcomm Hexagon.
Some instruction set architectures are "bi-endian" and allow running software of either endianness; these include Power ISA, SPARC, ARM AArch64, C-Sky, and RISC-V. IBM AIX and IBM i run in big-endian mode on bi-endian Power ISA; Linux originally ran in big-endian mode, but by 2019, IBM had transitioned to little-endian mode for Linux to ease the porting of Linux software from x86 to Power.[25][26] SPARC has no relevant little-endian deployment, as both Oracle Solaris and Linux run in big-endian mode on bi-endian SPARC systems, and can be considered big-endian in practice. ARM, C-Sky, and RISC-V have no relevant big-endian deployments, and can be considered little-endian in practice.
Bi-endianness
Some architectures (including
Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some systems, the default endianness is selected by hardware on the motherboard and cannot be changed via software (e.g. the Alpha, which runs only in big-endian mode on the Cray T3E).
The term bi-endian refers primarily to how a processor treats data accesses. Instruction accesses (fetches of instruction words) on a given processor may still assume a fixed endianness, even if data accesses are fully bi-endian, though this is not always the case, such as on Intel's IA-64-based Itanium CPU, which allows both.
Some nominally bi-endian CPUs require motherboard help to fully switch endianness. For instance, the 32-bit desktop-oriented
Some CPUs, such as many PowerPC processors intended for embedded use and almost all SPARC processors, allow per-page choice of endianness.
SPARC processors since the late 1990s (SPARC v9 compliant processors) allow data endianness to be chosen with each individual instruction that loads from or stores to memory.
The
Many processors have instructions to convert a word in a register to the opposite endianness, that is, they swap the order of the bytes in a 16-, 32- or 64-bit word.
Recent Intel x86 and x86-64 architecture CPUs have a MOVBE instruction (Intel Core since generation 4, after Atom),[28] which fetches a big-endian format word from memory or writes a word into memory in big-endian format. These processors are otherwise thoroughly little-endian.
There are also devices which use different formats in different places. For instance, the BQ27421 Texas Instruments battery gauge uses the little-endian format for its registers and the big-endian format for its random-access memory.
Floating point
Although many processors use little-endian storage for all types of data (integer, floating point), there are a number of hardware architectures where
Variable-length data
Most instructions considered so far contain the size (lengths) of their
Middle-endian
Numerous other orderings, generically called middle-endian or mixed-endian, are possible.
The
UNIX was one of the first systems to allow the same code to be compiled for platforms with different internal representations. One of the first programs converted was supposed to print out Unix
, but on the Series/1 it printed nUxi
instead.[32]
A way to interpret this endianness is that it stores a 32-bit integer as two little-endian 16-bit words, with a big-endian word ordering:
byte offset | 8-bit value | 16-bit little-endian value |
---|---|---|
0 | 0Bh | 0A0Bh |
1 | 0Ah | |
2 | 0Dh | 0C0Dh |
3 | 0Ch |
Software
Logic design
Hardware description languages (HDLs) used to express digital logic often support arbitrary endianness, with arbitrary granularity. For example, in SystemVerilog, a word can be defined as little-endian or big-endian.[citation needed]
Files and filesystems
The recognition of endianness is important when reading a file or filesystem created on a computer with different endianness.
Fortran sequential unformatted files created with one endianness usually cannot be read on a system using the other endianness because Fortran usually implements a record (defined as the data written by a single Fortran statement) as data preceded and succeeded by count fields, which are integers equal to the number of bytes in the data. An attempt to read such a file using Fortran on a system of the other endianness results in a run-time error, because the count fields are incorrect.
Unicode text can optionally start with a byte order mark (BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a big-endian file should start with 00 00 FE FF
; a little-endian should start with FF FE 00 00
.
Application binary data formats, such as
TIFF image files are an example of the second strategy, whose header instructs the application about the endianness of their internal binary integers. If a file starts with the signature MM
it means that integers are represented as big-endian, while II
means little-endian. Those signatures need a single 16-bit word each, and they are palindromes, so they are endianness independent. I
stands for Intel and M
stands for Motorola. Intel CPUs are little-endian, while Motorola 680x0 CPUs are big-endian. This explicit signature allows a TIFF reader program to swap bytes if necessary when a given file was generated by a TIFF writer program running on a computer with a different endianness.
As a consequence of its original implementation on the Intel 8080 platform, the operating system-independent File Allocation Table (FAT) file system is defined with little-endian byte ordering, even on platforms using another endianness natively, necessitating byte-swap operations for maintaining the FAT on these platforms.
Networking
Many
However, not all protocols use big-endian byte order as the network order. The
The
While the high-level network protocols usually consider the byte (mostly meant as
See also
- Bit order– Convention to identify bit positions
References
- IEEE Computer, October 1981 issue.
- ^ Swift, Jonathan (1726). "A Voyage to Lilliput, Chapter IV". Gulliver's Travels.
- ISBN 978-1-488-67207-1
- ^ Understanding big and little endian byte order
- ^ Byte Ordering PPC
- ^ Writing endian-independent code in C
- The Internet Society.
- ^ Cary, David. "Endian FAQ". Archived from the original on 2017-11-09. Retrieved 2010-10-11.
- S2CID 24291134.
- ^ Blanc, Bertrand; Maaraoui, Bob (December 2005). "Endianness or Where is Byte 0?" (PDF). Retrieved 2008-12-21.
- . Retrieved 2021-08-16.
- ISBN 978-0-13-291652-3. Retrieved 18 May 2013.
- ^ Linux Programmer's Manual – Library Functions –
- ^ Linux Programmer's Manual – Library Functions –
- ^ "std::byteswap". en.cppreference.com. Retrieved 3 October 2023.
- ^ "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z" (PDF). Intel. September 2016. p. 3–112. Archived (PDF) from the original on 2022-10-09. Retrieved 2017-02-05.
- ^ "i960® VH Processor Developer's Manual" (PDF). Intel. October 1998. Retrieved 2024-04-02.
- ARM Holdings.
- ^ "C11 standard". ISO. Section 6.5.2.3 "Structure and Union members", §3 and footnote 95. Retrieved 15 August 2018.
- ^ "3.10 Options That Control Optimization: -fstrict-aliasing". GNU Compiler Collection (GCC). Free Software Foundation. Retrieved 15 August 2018.
- ^ Torvalds, Linus (5 Jun 2018). "[GIT PULL] Device properties framework update for v4.18-rc1". Linux Kernel (Mailing list). Retrieved 15 August 2018.
- ^ House, David; Faggin, Federico; Feeney, Hal; Gelbach, Ed; Hoff, Ted; Mazor, Stan; Smith, Hank (2006-09-21). "Oral History Panel on the Development and Promotion of the Intel 8008 Microprocessor" (PDF). Computer History Museum. p. b5. Retrieved 23 April 2014.
- ISBN 978-0-596-51447-1. Retrieved 21 May 2013.
- ^ "Cx51 User's Guide: E. Byte Ordering". keil.com.
- ^ Jeff Scheel (2016-06-16). "Little endian and Linux on IBM Power Systems". IBM. Retrieved 2022-03-27.
- ^ Timothy Prickett Morgan (10 June 2019). "The Transition To RHEL 8 Begins On Power Systems". ITJungle. Retrieved 26 March 2022.
- ^ "Differences between BE-32 and BE-8 buses".
- ^ "How to detect New Instruction support in the 4th generation Intel® Core™ processor family" (PDF). Retrieved 2 May 2017.
- ^ Savard, John J. G. (2018) [2005], "Floating-Point Formats", quadibloc, archived from the original on 2018-07-03, retrieved 2018-07-16
- ^ "pack – convert a list into a binary representation".
- ^ PDP-11/45 Processor Handbook (PDF). Digital Equipment Corporation. 1973. p. 165. Archived (PDF) from the original on 2022-10-09.
- S2CID 15558835.
- ^ AMD64 Architecture Programmer's Manual Volume 2: System Programming (PDF) (Technical report). 2013. p. 80. Archived from the original (PDF) on 2018-02-18.
- ^ "Microsoft Office Excel 97 - 2007 Binary File Format Specification (*.xls 97-2007 format)". Microsoft Corporation. 2007.
- ^ Matt Ahrens (2016). FreeBSD Kernel Internals: An Intensive Code Walkthrough. OpenZFS Documentation/Read Write Lecture.
- . Retrieved 2012-03-02.
- ^ Ethernet POWERLINK Standardisation Group (2012), EPSG Working Draft Proposal 301: Ethernet POWERLINK Communication Profile Specification Version 1.1.4, chapter 6.1.1.
- ^ IEEE and The Open Group (2018). "3. System Interfaces". The Open Group Base Specifications Issue 7. Vol. 2. p. 1120. Retrieved 2021-04-09.
- ^ "htonl(3) - Linux man page". linux.die.net. Retrieved 2021-04-09.
External links
- The dictionary definition of endianness at Wiktionary