MAJC
Designer | Sun Microsystems |
---|---|
Introduced | 1990s |
Design | VLIW |
MAJC (Microprocessor Architecture for Java Computing) was a Sun Microsystems multi-core, multithreaded, very long instruction word (VLIW) microprocessor design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running Java programs, whose "late compiling" allowed Sun to make several favourable design decisions. The processor was released into two commercial graphical cards from Sun. Lessons learned regarding multi-threads on a multi-core processor provided a basis for later OpenSPARC implementations such as the UltraSPARC T1.
Design elements
Move instruction scheduling to the compiler
Like other VLIW designs, notably
Generalized functional units
Most processors include a number of separate "subprocessors" known as functional units that are tuned to operating on a particular type of data. For instance, a modern CPU typically has two or three functional units dedicated to processing
Variable-length instruction packets
Another difference is that MAJC allowed for variable-length "
NOP
, which simply takes up space. Although variable-length instruction packets added some complexity to the CPU, it reduced code size and thus the number of expensive cache misses by increasing the amount of code in the cache at any one time.
Avoiding interlocks and stalls
The primary difference was the way that the MAJC design required the compiler to avoid interlocks, pauses in execution while the results of one instruction need to be processed for the next to be able to run. For instance, if the processor is fed the instructions C = A + B, E = C + D
, then the second instruction can be run only after the first completes. Most processors include locks in the design to stall out and reschedule these sorts of interlocked instructions, allowing some other instructions to run while the value of C is being calculated. However these interlocks are very expensive in terms of chip real-estate, and represents the majority of the instruction scheduler's logic.
In order for the compiler to avoid these interlocks, it would have to know exactly how long each of these instructions would take to complete. For instance, if a particular implementation took three cycles to complete a floating-point multiplication, MAJC compilers would attempt to schedule in other instructions that took three cycles to complete and were not currently stalled. A change in the actual implementation might reduce this delay to only two instructions, however, and the compiler would need to be aware of this change.
This means that the compiler was not tied to MAJC as a whole, but a particular implementation of MAJC, each individual CPU based on the MAJC design. This would normally be a serious logistical problem; consider the number of different variations of the Intel
In reality, scheduling instructions in this fashion turns out to be a very difficult problem. In real-world use, processors that attempt to do this scheduling at runtime encounter numerous events when the data needed is outside the cache, and there is no other instruction in the program that isn't also dependent on such data. In these cases the processor might stall for long periods, waiting on main memory. The VLIW approach does not help much in this regard; although the compiler might be able to spend more time looking for instructions to run, that doesn't mean that it can actually find one.
MAJC attempted to address this problem through the ability to execute code from other threads if the current thread stalled on memory. Switching threads is normally a very expensive process known as a
MAJC took this idea one step further, and tried to prefetch data and instructions needed for threads while they were stalled. Most processors include similar functionality for parts of an instruction stream, known as speculative execution, where the processor runs both of the possible outcomes of a branch while waiting for the deciding variable to calculate. MAJC instead continued to run the thread as if it were not stalled, using this execution to find and then load any data or instructions that would soon be needed when the thread stopped stalling. Sun referred to this as Space-Time Computing (STC), and it is a speculative multithreading design.
Processors up to this point had tried to extract parallelism in a single thread, a technique that was reaching its limits in terms of diminishing returns. In seems that in a general sense the MAJC design attempted to avoid stalls by running across threads (and programs) as opposed to looking for parallelism in a single thread. VLIW is generally expected to be somewhat worse in terms of stalls because it is difficult to understand runtime behavior at compile-time, making the MAJC approach in dealing with this problem particularly interesting.
Implementations
Sun built a single model of the MAJC, the two-core MAJC 5200, which was the heart of Sun's XVR-1000 and XVR-4000 workstation graphics boards. However many of the multicore and multithreading design ideas, notably in terms of using multiple threads to reduce stalling delays, worked their way into the Sun SPARC processor line, as well as designs from other companies. Additionally, the MAJC idea of designing the processor to run as many threads as possible, as opposed to instructions, appears to be the basis of the later UltraSPARC T1 (code-named Niagara) design.
See also
- Thread-level parallelism
- picoJava
Further reading
- Case, Brian (25 October 1999)."Sun Makes MAJC with Mirrors". Microprocessor Report.
- Gwennap, Linley (13 September 1999). "MAJC gives VLIW a new twist". Microprocessor Report.
- Halfhill, Tom (24 August 1999). "Sun Reveals secrets of 'Magic'". Microprocessor Report.