Cell software development
Software development for the
Linux on Cell
An open source software-based strategy was adopted to accelerate the development of a Cell BE ecosystem and to provide an environment to develop Cell applications, including a GCC-based Cell compiler, binutils and a port of the Linux operating system.[1]
Octopiler
Octopiler is
Software portability
Adapting VMX for SPU
Differences between VMX and SPU
The VMX (Vector Multimedia Extensions) technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences.
feature | VMX | SPU |
---|---|---|
word size
|
32 bits | 32 bits |
number of registers | 32 | 128 |
register width | 128-bit quadword | 128-bit quadword |
integer formats | 8, 16, 32 | 8, 16, 32, 64 |
saturation support | yes | no |
byte ordering | big (default), little | big endian |
floating point modes | Java, non-Java | single precision, IEEE double |
Memory alignment | quadword only | quadword only |
The VMX
The IBM PPE Vector/SIMD manual does not define operations for double-precision floating point, though IBM has published material implying certain double-precision performance numbers associated with the Cell PPE VMX technology.
Intrinsics
Compilers for Cell[who?] provide intrinsics to expose useful SPU instructions in C and C++. Instructions that differ only in the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are typically represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand.
Porting VMX code for SPU
There is a great body of code which has been developed for other
In some cases it is possible to port existing VMX code directly. If the VMX code is highly generic (makes few assumptions about the execution environment) the translation can be relatively straightforward. The two processors specify a different
In many cases, however, a directly equivalent instruction does not exist. The workaround might be obvious or it might not. For example, if saturation behavior is required on the SPU, it can be coded by adding additional SPU instructions to accomplish this (with some loss of efficiency). At the other extreme, if Java floating-point semantics are required, this is almost impossible to achieve on the SPU processor. To achieve the same computation on the SPU might require that an entirely different algorithm be written from scratch.
The most important conceptual similarity between VMX and the SPU architecture is supporting the same vectorization model. For this reason, most algorithms adapted to Altivec will usually adapt successfully to the SPU architecture as well.
Local store exploitation
Transferring data between the local stores of different SPUs can have a large performance cost. The local stores of individual SPUs can be exploited using a variety of strategies.
Applications with high locality, such as dense matrix computations, represent an ideal workload class for the local stores in Cell BE.[5]
Streaming computations can be efficiently accommodated using software pipelining of memory block transfers using a multi-buffering strategy.[1]
The software cache offers a solution for random accesses.[6]
More sophisticated applications can use multiple strategies for different data types.[7]
References
- The Cell Project at IBM Research
- Optimizing Compiler for a CELL Processor
- Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture
- Compiler Technology for Scalable Architectures
- ^ a b "An Open Source Environment for Cell Broadband Engine System Software" (PDF). June 2007.
- ^ IBM Research Project - Compiler Technology for Scalable Architectures
- ^ IBM Systems Journal - Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture, 2017-10-23, archived from the original on 2006-04-11
- ^ IBM's Octopiler, or, why the PS3 is running late, ArsTechnica, 2006-02-26
- ^ "Synergistic Processing in Cell's Multicore Architecture" (PDF). March 2006.
- ^ "Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture" (PDF). January 2006.
- ^ "Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor" (PDF). March 2008.