Machine code
Program execution |
---|
General concepts |
Types of code |
Compilation strategies |
|
Notable runtimes |
|
Notable compilers & toolchains |
|
In
Each instruction causes the CPU to perform a very specific task, such as a load, a store, a
Early CPUs had specific machine code that might break backward compatibility with each new CPU released. The notion of an instruction set architecture (ISA) defines and specifies the behavior and encoding in memory of the instruction set of the system, without specifying its exact implementation. This acts as an abstraction layer, enabling compatibility within the same family of CPUs, so that machine code written or generated according to the ISA for the family will run on all CPUs in the family, including future CPUs.
In general, each architecture family (e.g. x86, ARM) has its own ISA, and hence its own specific machine code language. There are exceptions, such as the VAX architecture, which included optional support of the PDP-11 instruction set and IA-64, which included optional support of the IA-32 instruction set. Another example is the PowerPC 615, a processor designed to natively process both PowerPC and x86 instructions.
Machine code is a strictly numerical language, and is the lowest-level interface to the CPU intended for a programmer.
The majority of practical programs today are written in higher-level languages. Those programs are either translated into machine code by a compiler, or are interpreted by an interpreter, usually after being translated into an intermediate code, such as a bytecode, that is then interpreted.[nb 1]
Machine code is by definition the lowest level of programming detail visible to the programmer, but internally many processors use microcode or optimize and transform machine code instructions into sequences of micro-ops. Microcode and micro-ops are not generally considered to be machine code; except on some machines, the user cannot write microcode or micro-ops, and the operation of microcode and the transformation of machine-code instructions into micro-ops happens transparently to the programmer except for performance related side effects.
Instruction set
Every processor or processor family has its own
A processor's instruction set may have fixed-length or variable-length instructions. How the patterns are organized varies with the particular architecture and type of instruction. Most instructions have one or more opcode fields that specify the basic instruction type (such as arithmetic, logical, jump, etc.), the operation (such as add or compare), and other fields that may give the type of the operand(s), the addressing mode(s), the addressing offset(s) or index, or the operand value itself (such constant operands contained in an instruction are called immediate).[2]
Not all machines or individual instructions have explicit operands. On a machine with a single
Programs
A
Assembly languages
A much more human friendly rendition of machine language, called
DEC B
.[4]Example
The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.[5]: 299 The general type of instruction is given by the op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by op. R-type (register) instructions include an additional field funct to determine the exact operation. The fields used in these types are:
6 5 5 5 5 6 bits [ op | rs | rt | rd |shamt| funct] R-type [ op | rs | rt | address/immediate] I-type [ op | target address ] J-type
rs, rt, and rd indicate register operands; shamt gives a shift amount; and the address or immediate fields contain an operand directly.[5]: 299–301
For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:[5]: 554
[ op | rs | rt | rd |shamt| funct] 0 1 2 6 0 32 decimal 000000 00001 00010 00110 00000 100000 binary
Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:[5]: 552
[ op | rs | rt | address/immediate] 35 3 8 68 decimal 100011 00011 01000 00000 00001 000100 binary
Jumping to the address 1024:[5]: 552
[ op | target address ] 2 1024 decimal 000010 00000 00000 00000 10000 000000 binary
Overlapping instructions
On processor architectures with
In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in
It is also sometimes used as a
The principle is also utilized in shared code sequences of
This property is also used to find
Relationship to microcode
In some computers, the machine code of the
Using microcode to implement an emulator enables the computer to present the architecture of an entirely different computer. The System/360 line used this to allow porting programs from earlier IBM machines to the new family of computers, e.g. an IBM 1401/1440/1460 emulator on the IBM S/360 model 40.
Relationship to bytecode
Machine code is generally different from bytecode (also known as p-code), which is either executed by an interpreter or itself compiled into machine code for faster (direct) execution. An exception is when a processor is designed to use a particular bytecode directly as its machine code, such as is the case with Java processors.
Machine code and assembly code are sometimes called native code when referring to platform-dependent parts of language features or libraries.[16]
Storing in memory
From the point of view of the CPU, machine code is stored in RAM, but is typically also kept in a set of caches for performance reasons. There may be different caches for instructions and data, depending on the architecture.
The CPU knows what machine code to execute, based on its internal program counter. The program counter points to a memory address and is changed based on special instructions which may cause programmatic branches. The program counter is typically set to a hard coded value when the CPU is first powered on, and will hence execute whatever machine code happens to be at this address.
Similarly, the program counter can be set to execute whatever machine code is at some arbitrary address, even if this is not valid machine code. This will typically trigger an architecture specific protection fault.
The CPU is oftentimes told, by page permissions in a paging based system, if the current page actually holds machine code by an execute bit — pages have multiple such permission bits (readable, writable, etc.) for various housekeeping functionality. E.g. on
Similarly, in a segment based system, segment descriptors can indicate whether a segment can contain executable code and in what rings that code can run.
From the point of view of a
Readability by humans
Pamela Samuelson wrote that machine code is so unreadable that the United States Copyright Office cannot identify whether a particular encoded program is an original work of authorship;[17] however, the US Copyright Office does allow for copyright registration of computer programs[18] and a program's machine code can sometimes be decompiled in order to make its functioning more easily understandable to humans.[19] However, the output of a decompiler or disassembler will be missing the comments and symbolic references, so while the output may be easier to read than the object code, it will still be more difficult than the original source code. This problem does not exist for object-code formats like SQUOZE, where the source code is included in the file.
Cognitive science professor Douglas Hofstadter has compared machine code to genetic code, saying that "Looking at a program written in machine language is vaguely comparable to looking at a DNA molecule atom by atom."[20]
See also
- Assembly language
- Endianness
- List of machine languages
- Machine code monitor
- Overhead code
- P-code machine
- Reduced instruction set computing(RISC)
- Very long instruction word
- Teaching Machine Code: Micro-Professor MPF-I
Notes
- .
- ^ fat binaries.
- foldingtechniques to still fit everything into a physical sector of only 512 bytes without giving up any of their extended functionality.
References
- ISBN 9789332570405.
- ^ Kjell, Bradley. "Immediate Operand".
- ISBN 0-262-54178-5. Retrieved 2023-03-05.
- ISBN 0-89588-094-6. Retrieved 2023-03-05.
- ^ ISBN 978-0-12-370497-9. Retrieved 2023-03-05.
- ^ (PDF) from the original on 2018-09-04. Retrieved 2021-12-25. (12 pages)
- ISBN 978-3-540-79127-0. (22 pages)
- (PDF) from the original on 2023-08-26. Retrieved 2023-08-26. (10 pages)
- ^ a b Jakubowski, Mariusz H. (February 2016). "Graph Based Model for Software Tamper Protection". Microsoft. Archived from the original on 2019-10-31. Retrieved 2023-08-19.
- (PDF) from the original on 2023-08-26. Retrieved 2023-08-26. (1+xvii+1+152 pages)
- ^ a b "Unintended Instructions on x86". Hacker News. 2021. Archived from the original on 2021-12-25. Retrieved 2021-12-24.
- ^ Kinder, Johannes (2010-09-24). Static Analysis of x86 Executables [Statische Analyse von Programmen in x86 Maschinensprache] (PDF) (Dissertation). Munich, Germany: Technische Universität Darmstadt. D17. Archived from the original on 2020-11-12. Retrieved 2021-12-25. (199 pages)
- ^ "What is "overlapping instructions" obfuscation?". Reverse Engineering Stack Exchange. 2013-04-07. Archived from the original on 2021-12-25. Retrieved 2021-12-25.
- Gates, William "Bill" Henry, Personal communication (NB. According to Jacob et al.)
- ACM Press. Archived(PDF) from the original on 2021-12-15. Retrieved 2021-12-24.
- ^ "Managed, Unmanaged, Native: What Kind of Code Is This?". developer.com. 2003-04-28. Retrieved 2008-09-02.
- PMID 10268940.
- US Copyright Office. August 2008. Retrieved 2014-02-23.
- ^ "What is decompile? - Definition from WhatIs.com". WhatIs.com. Retrieved 2016-12-26.
- Hofstadter, Douglas R.(1980). Gödel, Escher, Bach: An Eternal Golden Braid. p. 290.
Further reading
- ISBN 1-55860-281-X.
- ISBN 0-13-020435-8.
- Brookshear, J. Glenn (2007). Computer Science: An Overview. ISBN 978-0-321-38701-1.