Harvard architecture

The Harvard architecture is a

storage and signal pathways for instructions and data. It is often contrasted with the von Neumann architecture

, where program instructions and data share the same memory and pathways.

The term is often stated as having originated from the

electro-mechanical counters. These early machines had data storage entirely contained within the central processing unit, and provided no access to the instruction storage as data. Programs needed to be loaded by an operator; the processor could not initialize itself. However, in the only peer-reviewed published paper on the topic – The Myth of the Harvard Architecture published in the IEEE Annals of the History of Computing^[1]

– the author demonstrates that:

'The term "Harvard architecture" was coined decades later, in the context of microcontroller design' and only 'retrospectively applied to the Harvard machines and subsequently applied to RISC microprocessors with separated caches';

'The so-called "Harvard" and "von Neumann" architectures are often portrayed as a dichotomy, but the various devices labeled as the former have far more in common with the latter than they do with each other';

'In short [the Harvard architecture] isn't an architecture and didn't derive from work at Harvard'.

Modern processors appear to the user to be systems with von Neumann architectures, with the program code stored in the same

main memory as the data. For performance reasons, internally and largely invisible to the user, most designs have separate processor caches for the instructions and data, with separate pathways into the processor for each. This is one form of what is known as the modified Harvard architecture

.

Harvard architecture is historically, and traditionally, split into two address spaces, but having three, i.e. two extra (and all accessed in each cycle) is also done,[2] while rare.

Memory details

In a Harvard architecture, there is no need to make the two memories share characteristics. In particular, the word width, timing, implementation technology, and memory address structure can differ. In some systems, instructions for pre-programmed tasks can be stored in read-only memory while data memory generally requires read-write memory. In some systems, there is much more instruction memory than data memory so instruction addresses are wider than data addresses.

Contrast with von Neumann architectures

In a system with a pure

instruction fetches

and data access do not contend for a single memory pathway.

Also, a Harvard architecture machine has distinct code and data address spaces: instruction address zero is not the same as data address zero. Instruction address zero might identify a twenty-four-bit value, while data address zero might indicate an eight-bit byte that is not part of that twenty-four-bit value.

Contrast with modified Harvard architecture

A

ARM architecture, Power ISA and x86

processors. It is sometimes loosely called a Harvard architecture, overlooking the fact that it is actually "modified".

Another modification provides a pathway between the instruction memory (such as ROM or

machine language instructions are provided to read data from the instruction memory, or the instruction memory can be accessed using a peripheral interface.^[a]

(This is distinct from instructions which themselves embed constant data, although for individual constants the two mechanisms can substitute for each other.)

Speed

In recent years, the speed of the CPU has grown many times in comparison to the access speed of the main memory. Care needs to be taken to reduce the number of times main memory is accessed in order to maintain performance. If, for instance, every instruction run in the CPU requires an access to memory, the computer gains nothing for increased CPU speed—a problem referred to as being

memory bound

.

It is possible to make extremely fast memory, but this is only practical for small amounts of memory for cost, power and signal routing reasons. The solution is to provide a small amount of very fast memory known as a CPU cache which holds recently accessed data. As long as the data that the CPU needs is in the cache, the performance is much higher than it is when the CPU has to get the data from the main memory. On the other side, however, it may still be limited to storing repetitive programs or data and still has a storage size limitation, and other potential problems associated with it.^[b]

Internal vs. external design

Modern high performance CPU chip designs incorporate aspects of both Harvard and von Neumann architecture. In particular, the "split cache" version of the modified Harvard architecture is very common. CPU cache memory is divided into an instruction cache and a data cache. Harvard architecture is used as the CPU accesses the cache. In the case of a cache miss, however, the data is retrieved from the main memory, which is not formally divided into separate instruction and data sections, although it may well have separate memory controllers used for concurrent access to RAM, ROM and (NOR) flash memory.

Thus, while a von Neumann architecture is visible in some contexts, such as when data and code come through the same memory controller, the hardware implementation gains the efficiencies of the Harvard architecture for cache accesses and at least some main memory accesses.

In addition, CPUs often have write buffers which let CPUs proceed after writes to non-cached regions. The von Neumann nature of memory is then visible when instructions are written as data by the CPU and software must ensure that the caches (data and instruction) and write buffer are synchronized before trying to execute those just-written instructions.

Modern uses of the Harvard architecture

The principal advantage of the pure Harvard architecture—simultaneous access to more than one memory system—has been reduced by modified Harvard processors using modern CPU cache systems. Relatively pure Harvard architecture machines are used mostly in applications where trade-offs, like the cost and power savings from omitting caches, outweigh the programming penalties from featuring distinct code and data address spaces.

Texas Instruments TMS320
C55x processors, for one example, feature multiple parallel data buses (two write, three read) and one instruction bus.
AVR by Atmel Corp
(now part of Microchip Technology).

Even in these cases, it is common to employ special instructions in order to access program memory as though it were data for read-only tables, or for reprogramming; those processors are modified Harvard architecture processors.

Notes

^ The IAP lines of 8051-compatible microcontrollers from STC have dual ported Flash memory, with one of the two ports hooked to the instruction bus of the processor core, and the other port made available in the special function register region.
Intel 80486.^[4]^: 26–34^[5]

References

S2CID 252018052
.

^ "Kalimba DSP: User guide" (PDF). July 2006. p. 18. Retrieved 2022-09-23. this is a three-bank Harvard architecture.

^ "386 vs. 030: the Crowded Fast Lane". Dr. Dobb's Journal, January 1988.

OCLC 28966593
.

^ "Embedded Systems Programming: Perils of the PC Cache". users.ece.cmu.edu. Archived from the original on January 15, 2020. Retrieved 2022-05-26.

External links

Harvard Architecture

Harvard vs von Neumann Architectures

Difference Between Harvard Architecture And Von Neumann Architecture

v
t
e
Processor technologies
Models

Abstract machine

Stored-program computer

Finite-state machine
with datapath

Hierarchical

Deterministic finite automaton

Queue automaton

Cellular automaton

Quantum cellular automaton

Turing machine
Alternating Turing machine

Universal

Post–Turing

Quantum

Nondeterministic Turing machine

Probabilistic Turing machine

Hypercomputation

Zeno machine

Belt machine

Stack machine

Register machines
Counter

Pointer

Random-access

Random-access stored program

Architecture

Microarchitecture

Von Neumann

Harvard
modified

Dataflow

Transport-triggered

Cellular

Endianness

Memory access
NUMA

HUMA

Load–store

Register/memory

Cache hierarchy

Memory hierarchy
Virtual memory

Secondary storage

Heterogeneous

Fabric

Multiprocessing

Cognitive

Neuromorphic

Instruction set
architectures
Types

Orthogonal instruction set

CISC

RISC

Application-specific

EDGE
TRIPS

VLIW
EPIC

MISC

OISC

NISC

ZISC

VISC architecture

Quantum computing

Comparison
Addressing modes

Instruction
sets

Motorola 68000 series

VAX

PDP-11

x86

ARM

Stanford MIPS

MIPS

MIPS-X

Power
POWER

PowerPC

Power ISA

Clipper architecture

SPARC

SuperH

DEC Alpha

ETRAX CRIS

M32R

Unicore

Itanium

OpenRISC

RISC-V

MicroBlaze

LMC

System/3x0
S/360

S/370

S/390

z/Architecture

Tilera ISA

VISC architecture

Epiphany architecture

Others

Execution
Instruction pipelining

Pipeline stall

Operand forwarding

Classic RISC pipeline

Hazards

Data dependency

Structural

Control

False sharing

Out-of-order

Scoreboarding

Tomasulo's algorithm
Reservation station

Re-order buffer

Register renaming

Wide-issue

Speculative

Branch prediction

Memory dependence prediction

Parallelism
Level

Bit
Bit-serial

Word

Instruction

Pipelining
Scalar

Superscalar

Task
Thread

Process

Data
Vector

Memory

Distributed

Multithreading

Temporal

Simultaneous
Hyperthreading

Simultaneous and heterogenous

Speculative

Preemptive

Cooperative

Flynn's taxonomy

SISD

SIMD
Array processing (SIMT)

Pipelined processing

Associative processing

SWAR

MISD

MIMD
SPMD

Processor
performance

Transistor count

Instructions per cycle (IPC)
Cycles per instruction (CPI)

Instructions per second (IPS)

Floating-point operations per second (FLOPS)

Transactions per second (TPS)

Synaptic updates per second (SUPS)

Performance per watt (PPW)

Cache performance metrics

Computer performance by orders of magnitude

Types

Central processing unit (CPU)

Graphics processing unit (GPU)
GPGPU

Vector

Barrel

Stream

Tile processor

Coprocessor

PAL

ASIC

FPGA

FPOA

CPLD

Multi-chip module (MCM)

System in a package (SiP)

Package on a package (PoP)

By application

Embedded system

Microprocessor

Microcontroller

Mobile

Ultra-low-voltage

ASIP

Soft microprocessor

Systems
on chip

System on a chip (SoC)

Multiprocessor
(MPSoC)

Cypress PSoC

Network on a chip (NoC)

Hardware
accelerators

Coprocessor

AI accelerator

Graphics processing unit (GPU)

Image processor

Vision processing unit (VPU)

Physics processing unit (PPU)

Digital signal processor (DSP)

Tensor Processing Unit (TPU)

Secure cryptoprocessor

Network processor

Baseband processor

Word size

1-bit

4-bit

8-bit

12-bit

15-bit

16-bit

24-bit

32-bit

48-bit

64-bit

128-bit

256-bit

512-bit

bit slicing

others
variable

Core count

Single-core

Multi-core

Manycore

Heterogeneous architecture

Components

Core

Cache
CPU cache

Scratchpad memory

Data cache

Instruction cache

replacement policies

coherence

Bus

Clock rate

Clock signal

FIFO

Functional
units

Arithmetic logic unit (ALU)

Address generation unit (AGU)

Floating-point unit (FPU)

Memory management unit (MMU)
Load–store unit

Translation lookaside buffer (TLB)

Branch predictor

Branch target predictor

Integrated memory controller (IMC)
Memory management unit

Instruction decoder

Logic

Combinational

Sequential

Glue

Logic gate
Quantum

Array

Registers

Processor register

Status register

Stack register

Register file

Memory buffer

Memory address register

Program counter

Control unit

Hardwired control unit

Instruction unit

Data buffer

Write buffer

Microcode ROM

Counter

Datapath

Multiplexer

Demultiplexer

Adder

Multiplier
CPU

Binary decoder
Address decoder

Sum-addressed decoder

Barrel shifter

Circuitry

Integrated circuit
3D

Mixed-signal

Power management

Boolean

Digital

Analog

Quantum

Switch

Power
management

PMU

APM

ACPI

Dynamic frequency scaling

Dynamic voltage scaling

Clock gating

Performance per watt (PPW)

Related

History of general-purpose CPUs

Microprocessor chronology

Processor design

Digital electronics

Hardware security module

Semiconductor device fabrication

Tick–tock model

Pin grid array

Chip carrier

Retrieved from "https://en.wikipedia.org/w/index.php?title=Harvard_architecture&oldid=1197202451"

[4] The IAP lines of 8051-compatible microcontrollers from STC have dual ported Flash memory, with one of the two ports hooked to the instruction bus of the processor core, and the other port made available in the special function register region.

[7] Intel 80486.^[4]^: 26–34^[5]

[1] S2CID 252018052
.

[2] "Kalimba DSP: User guide" (PDF). July 2006. p. 18. Retrieved 2022-09-23. this is a three-bank Harvard architecture.

[ddj1-3] "386 vs. 030: the Crowded Fast Lane". Dr. Dobb's Journal, January 1988.

[5] OCLC 28966593
.

[6] "Embedded Systems Programming: Perils of the PC Cache". users.ece.cmu.edu. Archived from the original on January 15, 2020. Retrieved 2022-05-26.

[1]

[a]

[b]

[4]

[5]