Digital signal processor

Motorola 56001

with 25 MHz which was directly accessible via an interface.

A digital signal processor (DSP) is a specialized

disk drives and high-definition television (HDTV) products.^[3]

The goal of a DSP is usually to measure, filter or compress continuous real-world

analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but may not be able to keep up with such processing continuously in real-time. Also, dedicated DSPs usually have better power efficiency, thus they are more suitable in portable devices such as mobile phones because of power consumption constraints.^[5] DSPs often use special memory architectures

that are able to fetch multiple data or instructions at the same time.

Overview

Digital signal processing (DSP) algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.

Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power efficiency constraints.^[5] A specialized DSP, however, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialised cooling or large batteries.^{[citation needed]}

Such performance improvements have led to the introduction of digital signal processing in commercial

SES launched in 2018, were both built by Airbus Defence and Space with 25% of capacity using DSP.^[6]

The architecture of a DSP is optimized specifically for digital signal processing. Most also support some of the features of an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.

Architecture

Software architecture

By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple ARM or x86 instructions to compute might require only one instruction in a DSP optimized instruction set.

One implication for software architecture is that hand-optimized

routines

(assembly programs) are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms. Even with modern compiler optimizations hand-optimized assembly code is more efficient and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations.

Instruction sets

fused multiply–add
, FMA) operations
used extensively in all kinds of matrix operations
convolution for filtering

dot product

polynomial evaluation

Fundamental DSP algorithms depend heavily on multiply–accumulate performance
FIR filters

Fast Fourier transform (FFT)
related instructions:
- SIMD
- VLIW
Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing
DSPs sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.^{[citation needed]}
Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle – typically supporting reading 2 data values from 2 separate data buses and the next instruction (from the instruction cache, or a 3rd program memory) simultaneously.^[7]^[8]^[9]^[10]
Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing—such as zero-overhead looping^[11]^[12] and hardware loop buffers.^[13]^[14]

Data instructions

Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
Fixed-point arithmetic is often used to speed up arithmetic processing.
Single-cycle operations to increase the benefits of pipelining.

Program flow

Floating-point unit integrated directly into the datapath

Pipelined architecture
Highly parallel
multiplier–accumulators
(MAC units)
Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations

Hardware architecture

Memory architecture

DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data or instructions at the same time, such as the Harvard architecture or Modified von Neumann architecture, which use separate program and data memories (sometimes even concurrent access on multiple data buses).

DSPs can sometimes rely on supporting code to know about cache hierarchies and the associated delays. This is a tradeoff that allows for better performance^{[clarification needed]}. In addition, extensive use of DMA is employed.

Addressing and virtual memory

DSPs frequently use multi-tasking operating systems, but have no support for

context switching among processes

, which increases latency.

Hardware modulo addressing
- Allows circular buffers to be implemented without having to test for wrapping
Bit-reversed addressing, a special addressing mode
- useful for calculating FFTs
Exclusion of a memory management unit
Address generation unit

History

Development

In 1976, Richard Wiggins proposed the

TMS5100,^[15] the industry's first digital signal processor. It also set other milestones, being the first chip to use linear predictive coding to perform speech synthesis.^[16] The chip was made possible with a 7 µm PMOS fabrication process.^[17]

In 1978,

American Microsystems (AMI) released the S2811.^[3]^[4] The AMI S2811 "signal processing peripheral", like many later DSPs, has a hardware multiplier that enables it to do multiply–accumulate operation in a single instruction.^[18] The S2281 was the first integrated circuit chip specifically designed as a DSP, and fabricated using vertical metal oxide semiconductor (VMOS, V-groove MOS), a technology that had previously not been mass-produced.^[4] It was designed as a microprocessor peripheral, for the Motorola 6800,^[3]

and it had to be initialized by the host. The S2811 was not successful in the market.

In 1979, Intel released the 2920 as an "analog signal processor".^[19] It had an on-chip ADC/DAC with an internal signal processor, but it didn't have a hardware multiplier and was not successful in the market.

In 1980, the first stand-alone, complete DSPs –

voiceband applications, was one of the most commercially successful early DSPs.^[3]

The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.^[citation needed]

Another DSP produced by Texas Instruments (TI), the

TMS32010

presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs.

About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate tight loops; they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model only required about 21 ns for a MAC. Members of this generation were for example the AT&T DSP16A or the Motorola 56000.

The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier-transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80.

The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD extensions were added, and VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased; a 3 ns MAC now became possible.

Modern DSPs

Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry, and a wider bus system. Not all DSPs provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300.

millions of instructions per second), use VLIW (very long instruction word

), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSPs, and the newest generation C6000 chips support floating point as well as fixed point processing.

Freescale

produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz.

XMOS produces a multi-core multi-threaded line of processor well suited to DSP operations, They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4 core device would support up to 32 real time threads. Threads communicate between each other with buffered channels that are capable of up to 80 Mbit/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGAs

CEVA, Inc. produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16-bit or 32-bit word-widths and single or dual MACs. The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR)

modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs.

μCLinux, velocity and Nucleus RTOS while operating on real-time data. The SHARC-based ADSP-210xx provides both delayed branches and non-delayed branches.^[21]

SoC, but NXP also provides a range of flexible single core media processors. The TriMedia media processors support both fixed-point arithmetic as well as floating-point arithmetic

, and have specific instructions to deal with complex filters and entropy coding.

CSR produces the Quatro family of SoCs that contain one or more custom Imaging DSPs optimized for processing document image data for scanner and copier applications.

Microchip Technology produces the PIC24 based dsPIC line of DSPs. Introduced in 2004, the dsPIC is designed for applications needing a true DSP as well as a true microcontroller, such as motor control and in power supplies. The dsPIC runs at up to 40MIPS, and has support for 16 bit fixed point MAC, bit reverse and modulo addressing, as well as DMA.

Most DSPs use fixed-point arithmetic, because in real world signal processing the additional range provided by floating point is not needed, and there is a large speed benefit and cost benefit due to reduced hardware complexity. Floating point DSPs may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating point DSPs to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.

Generally, DSPs are dedicated integrated circuits; however DSP functionality can also be produced by using field-programmable gate array chips (FPGAs).

Embedded general-purpose RISC processors are becoming increasingly DSP like in functionality. For example, the

OMAP3 processors include an ARM Cortex-A8

and C6000 DSP.

In Communications a new breed of DSPs offering the fusion of both DSP functions and H/W acceleration function is making its way into the mainstream. Such Modem processors include ASOCS ModemX and CEVA's XC4000.

In May 2018, Huarui-2 designed by Nanjing Research Institute of Electronics Technology of

China Electronics Technology Group passed acceptance. With a processing speed of 0.4 TFLOPS, the chip can achieve better performance than current mainstream DSP chips.^[22] The design team has begun to create Huarui-3, which has a processing speed in TFLOPS level and a support for artificial intelligence.^[23]

References

OL 10070096M
.

ISBN 978-0849310812 – via Google Books
.

^ ^a ^b ^c ^d ^e "1979: Single Chip Digital Signal Processor Introduced". The Silicon Engine. Computer History Museum. Retrieved 14 October 2019.

^ ^a ^b ^c Taranovich, Steve (August 27, 2012). "30 years of DSP: From a child's toy to 4G and beyond". EDN. Retrieved 14 October 2019.

^ ^a ^b Ingrid Verbauwhede; Patrick Schaumont; Christian Piguet; Bart Kienhuis (2005-12-24). "Architectures and Design techniques for energy efficient embedded DSP and multimedia processing" (PDF). rijndael.ece.vt.edu. Retrieved 2017-06-13.

^ Beyond Frontiers Broadgate Publications (September 2016) pp22

^ "Memory and DSP Processors".

^ ""DSP processors: memory architectures"". Archived from the original on 2020-02-17. Retrieved 2020-03-03.

^ "Architecture of the Digital Signal Processor"

^ "ARC XY Memory DSP Option".

^ "Zero Overhead Loops".

^ "ADSP-BF533 Blackfin Processor Hardware Reference". p. 4-15.

^ "Understanding Advanced Processor Features Promotes Efficient Coding".

^ "Techniques for Effectively Exploiting a Zero Overhead Loop Buffer".

^ "Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978". IEEE Milestones. IEEE. Retrieved 2012-03-02.

^ Bogdanowicz, A. (2009-10-06). "IEEE Milestones Honor Three". The Institute. IEEE. Archived from the original on 2016-03-04. Retrieved 2012-03-02.

ISBN 9781351831567
.

^ Alberto Luis Andres. "Digital Graphic Audio Equalizer". p. 48.

^ "Archived copy" (PDF). Archived from the original (PDF) on 2020-09-29. Retrieved 2019-02-17.{{cite web}}: CS1 maint: archived copy as title (link)

^ "NEC Electronics Inc. μPD77C20A, 7720A, 77P20 Digital Signal Processors". p. 1. Retrieved 2023-11-13.

^ "Introduction of ADSP-21000 Family digital signal processors" (PDF). p. 6. Retrieved 2023-12-01.

科技日报
. Retrieved 2 July 2018.

^ 王珏玢. "全国产芯片华睿２号通过"核高基"验收-新华网". Xinhua News Agency. 南京. Archived from the original on May 26, 2018. Retrieved 2 July 2018.

External links

DSP Online Book

Pocket Guide to Processors for DSP - Berkeley Design Technology, INC

v
t
e
Processor technologies
Models

Abstract machine

Stored-program computer

Finite-state machine
with datapath

Hierarchical

Deterministic finite automaton

Queue automaton

Cellular automaton

Quantum cellular automaton

Turing machine
Alternating Turing machine

Universal

Post–Turing

Quantum

Nondeterministic Turing machine

Probabilistic Turing machine

Hypercomputation

Zeno machine

Belt machine

Stack machine

Register machines
Counter

Pointer

Random-access

Random-access stored program

Architecture

Microarchitecture

Von Neumann

Harvard
modified

Dataflow

Transport-triggered

Cellular

Endianness

Memory access
NUMA

HUMA

Load–store

Register/memory

Cache hierarchy

Memory hierarchy
Virtual memory

Secondary storage

Heterogeneous

Fabric

Multiprocessing

Cognitive

Neuromorphic

Instruction set
architectures
Types

Orthogonal instruction set

CISC

RISC

Application-specific

EDGE
TRIPS

VLIW
EPIC

MISC

OISC

NISC

ZISC

VISC architecture

Quantum computing

Comparison
Addressing modes

Instruction
sets

Motorola 68000 series

VAX

PDP-11

x86

ARM

Stanford MIPS

MIPS

MIPS-X

Power
POWER

PowerPC

Power ISA

Clipper architecture

SPARC

SuperH

DEC Alpha

ETRAX CRIS

M32R

Unicore

Itanium

OpenRISC

RISC-V

MicroBlaze

LMC

System/3x0
S/360

S/370

S/390

z/Architecture

Tilera ISA

VISC architecture

Epiphany architecture

Others

Execution
Instruction pipelining

Pipeline stall

Operand forwarding

Classic RISC pipeline

Hazards

Data dependency

Structural

Control

False sharing

Out-of-order

Scoreboarding

Tomasulo's algorithm
Reservation station

Re-order buffer

Register renaming

Wide-issue

Speculative

Branch prediction

Memory dependence prediction

Parallelism
Level

Bit
Bit-serial

Word

Instruction

Pipelining
Scalar

Superscalar

Task
Thread

Process

Data
Vector

Memory

Distributed

Multithreading

Temporal

Simultaneous
Hyperthreading

Simultaneous and heterogenous

Speculative

Preemptive

Cooperative

Flynn's taxonomy

SISD

SIMD
Array processing (SIMT)

Pipelined processing

Associative processing

SWAR

MISD

MIMD
SPMD

Processor
performance

Transistor count

Instructions per cycle (IPC)
Cycles per instruction (CPI)

Instructions per second (IPS)

Floating-point operations per second (FLOPS)

Transactions per second (TPS)

Synaptic updates per second (SUPS)

Performance per watt (PPW)

Cache performance metrics

Computer performance by orders of magnitude

Types

Central processing unit (CPU)

Graphics processing unit (GPU)
GPGPU

Vector

Barrel

Stream

Tile processor

Coprocessor

PAL

ASIC

FPGA

FPOA

CPLD

Multi-chip module (MCM)

System in a package (SiP)

Package on a package (PoP)

By application

Embedded system

Microprocessor

Microcontroller

Mobile

Ultra-low-voltage

ASIP

Soft microprocessor

Systems
on chip

System on a chip (SoC)

Multiprocessor
(MPSoC)

Cypress PSoC

Network on a chip (NoC)

Hardware
accelerators

Coprocessor

AI accelerator

Graphics processing unit (GPU)

Image processor

Vision processing unit (VPU)

Physics processing unit (PPU)

Digital signal processor (DSP)

Tensor Processing Unit (TPU)

Secure cryptoprocessor

Network processor

Baseband processor

Word size

1-bit

4-bit

8-bit

12-bit

15-bit

16-bit

24-bit

32-bit

48-bit

64-bit

128-bit

256-bit

512-bit

bit slicing

others
variable

Core count

Single-core

Multi-core

Manycore

Heterogeneous architecture

Components

Core

Cache
CPU cache

Scratchpad memory

Data cache

Instruction cache

replacement policies

coherence

Bus

Clock rate

Clock signal

FIFO

Functional
units

Arithmetic logic unit (ALU)

Address generation unit (AGU)

Floating-point unit (FPU)

Memory management unit (MMU)
Load–store unit

Translation lookaside buffer (TLB)

Branch predictor

Branch target predictor

Integrated memory controller (IMC)
Memory management unit

Instruction decoder

Logic

Combinational

Sequential

Glue

Logic gate
Quantum

Array

Registers

Processor register

Status register

Stack register

Register file

Memory buffer

Memory address register

Program counter

Control unit

Hardwired control unit

Instruction unit

Data buffer

Write buffer

Microcode ROM

Counter

Datapath

Multiplexer

Demultiplexer

Adder

Multiplier
CPU

Binary decoder
Address decoder

Sum-addressed decoder

Barrel shifter

Circuitry

Integrated circuit
3D

Mixed-signal

Power management

Boolean

Digital

Analog

Quantum

Switch

Power
management

PMU

APM

ACPI

Dynamic frequency scaling

Dynamic voltage scaling

Clock gating

Performance per watt (PPW)

Related

History of general-purpose CPUs

Microprocessor chronology

Processor design

Digital electronics

Hardware security module

Semiconductor device fabrication

Tick–tock model

Pin grid array

Chip carrier

Theory

Universal Turing machine

Parallel computing

Distributed computing

Applications

GPU
GPGPU

DirectX

Audio

Digital signal processing

Hardware random number generation

Artificial intelligence

Cryptography
TLS

Machine vision

Custom hardware attack
scrypt

Networking

Data

Implementations

High-level synthesis
C to HDL

FPGA

ASIC

CPLD

System on a chip
Network on a chip

Architectures

Dataflow

Transport triggered

Multicore

Manycore

Heterogeneous

In-memory computing

Systolic array

Neuromorphic

Related

Programmable logic

Processor
design

chronology

Digital electronics

Virtualization
Hardware emulation

Logic synthesis

Embedded systems

Authority control databases: National

Germany

Israel

United States

Czech Republic

Retrieved from "https://en.wikipedia.org/w/index.php?title=Digital_signal_processor&oldid=1220741007"

[1] OL 10070096M
.

[Liptak-2] ISBN 978-0849310812 – via Google Books
.

[computerhistory1979-3] "1979: Single Chip Digital Signal Processor Introduced". The Silicon Engine. Computer History Museum. Retrieved 14 October 2019.

[edn-4] Taranovich, Steve (August 27, 2012). "30 years of DSP: From a child's toy to 4G and beyond". EDN. Retrieved 14 October 2019.

[schaum-2004-5] Ingrid Verbauwhede; Patrick Schaumont; Christian Piguet; Bart Kienhuis (2005-12-24). "Architectures and Design techniques for energy efficient embedded DSP and multimedia processing" (PDF). rijndael.ece.vt.edu. Retrieved 2017-06-13.

[6] Beyond Frontiers Broadgate Publications (September 2016) pp22

[7] "Memory and DSP Processors".

[8] ""DSP processors: memory architectures"". Archived from the original on 2020-02-17. Retrieved 2020-03-03.

[9] "Architecture of the Digital Signal Processor"

[10] "ARC XY Memory DSP Option".

[11] "Zero Overhead Loops".

[12] "ADSP-BF533 Blackfin Processor Hardware Reference". p. 4-15.

[13] "Understanding Advanced Processor Features Promotes Efficient Coding".

[14] "Techniques for Effectively Exploiting a Zero Overhead Loop Buffer".

[15] "Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978". IEEE Milestones. IEEE. Retrieved 2012-03-02.

[16] Bogdanowicz, A. (2009-10-06). "IEEE Milestones Honor Three". The Institute. IEEE. Archived from the original on 2016-03-04. Retrieved 2012-03-02.

[17] ISBN 9781351831567
.

[18] Alberto Luis Andres. "Digital Graphic Audio Equalizer". p. 48.

[19] "Archived copy" (PDF). Archived from the original (PDF) on 2020-09-29. Retrieved 2019-02-17.{{cite web}}: CS1 maint: archived copy as title (link)

[20] "NEC Electronics Inc. μPD77C20A, 7720A, 77P20 Digital Signal Processors". p. 1. Retrieved 2023-11-13.

[21] "Introduction of ADSP-21000 Family digital signal processors" (PDF). p. 6. Retrieved 2023-12-01.

[22] 科技日报
. Retrieved 2 July 2018.

[xinhua-23] 王珏玢. "全国产芯片华睿２号通过"核高基"验收-新华网". Xinhua News Agency. 南京. Archived from the original on May 26, 2018. Retrieved 2 July 2018.

[3]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[4]

[18]

[19]

[21]

[22]

[23]