Memory access pattern
In
Memory access patterns also have implications for
Examples
Sequential
The simplest extreme is the
Strided
Linear
A linear access pattern is closely related to "strided", where a memory address may be computed from a linear combination of some index. Stepping through indices sequentially with a linear pattern yields strided access. A linear access pattern for writes (with any access pattern for non-overlapping reads) may guarantee that an algorithm can be parallelised, which is exploited in systems supporting compute kernels.
Nearest neighbor
Nearest neighbor memory access patterns appear in simulation, and are related to sequential or strided patterns. An algorithm may traverse a data structure using information from the nearest neighbors of a data element (in one or more dimensions) to perform a calculation. These are common in physics simulations operating on grids.
2D spatially coherent
In
Scatter
A
However, it may be harder to parallelise since there is no guarantee the writes do not interact,[27] and many systems are still designed assuming that a hardware cache will coalesce many small writes into larger ones.
In the past,
The PlayStation 2 console used conventional inverse texture mapping, but handled any scatter/gather processing "on-chip" using EDRAM, whilst 3D model (and a lot of texture data) from main memory was fed sequentially by DMA. This is why it lacked support for indexed primitives, and sometimes needed to manage textures "up front" in the display list.
Gather
In a
Compared to scatter, the disadvantage is that caching (and bypassing latencies) is now essential for efficient reads of small elements, however it is easier to parallelise since the writes are guaranteed to not overlap. As such the gather approach is more common for
Combined gather and scatter
An algorithm may gather data from one source, perform some computation in local or on chip memory, and scatter results elsewhere. This is essentially the full operation of a
Random
At the opposite extreme is a truly random memory access pattern. A few multiprocessor systems are specialised to deal with these.[28] The PGAS approach may help by sorting operations by data on the fly (useful when the problem *is* figuring out the locality of unsorted data).[21] Data structures which rely heavily on pointer chasing can often produce poor locality of reference, although sorting can sometimes help. Given a truly random memory access pattern, it may be possible to break it down (including scatter or gather stages, or other intermediate sorting) which may improve the locality overall; this is often a prerequisite for parallelizing.
Approaches
Data-oriented design
Contrast with locality of reference
Locality of reference refers to a property exhibited by memory access patterns. A programmer will change the memory access pattern (by reworking algorithms) to improve the locality of reference,[30] and/or to increase potential for parallelism.[26] A programmer or system designer may create frameworks or abstractions (e.g., C++ templates or higher-order functions) that encapsulate a specific memory access pattern.[31][32]
Different considerations for memory access patterns appear in parallelism beyond locality of reference, namely the separation of reads and writes. E.g.: even if the reads and writes are "perfectly" local, it can be impossible to parallelise due to dependencies; separating the reads and writes into separate areas yields a different memory access pattern, maybe initially appear worse in pure locality terms, but desirable to leverage modern parallel hardware.[26]
Locality of reference may also refer to individual variables (e.g., the ability of a
See also
- Gather-scatter (vector addressing)
- Locality of reference
- Parallel computing
- Working set
References
- ^ "data oriented design" (PDF).
- S2CID 15997131. NLM unique id 101212014.
- ISBN 9780128091951.
- ^ "Analysis of Energy and Performance of Code Transformations for PGAS-based Data Access Patterns" (PDF).
- ^ "enhancing cache coherent architectures with memory access patterns for embedded many-core systems" (PDF).
- ^ "intel terascale" (PDF).
- ISBN 9780769504506.
- ^ "QUAD a memory access pattern analyser" (PDF).
- ^ "Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems" (PDF).
- )
- S2CID 16476418.
- ^ "Putting Your Data and Code in Order: Data and layout".
- S2CID 16065152.
- ^ "Memory Access Pattern Protection for Resource-constrained Devices" (PDF).
- ^ "understanding cache attacks" (PDF).
- ^ "protecting data in the cloud".
- ^ "boosting-cloud-security-with----oblivious-ram". 24 September 2013.proposed RAM design avoiding memory-access-pattern vulnerabilities
- ^ Chuck Paridon. "Storage Performance Benchmarking Guidelines - Part I: Workload Design" (PDF).
In practice, IO access patterns are as numerous as the stars
- ^ "optimizing for tiling and data locality" (PDF).paper covers loop tiling and implication for parallel code
- ^ "Optimal 2D Data Partitioning for DMA Transfers on MPSoCs" (PDF).
- ^ a b "partitioned global address space programming". YouTube.covers cases where PGAS is a win, where data may not be already sorted, e.g., dealing with complex graphs - see "science across the irregularity spectrum".
- ^ "Quantifying Locality In The Memory Access Patterns of HPC Applications" (PDF).mentions nearest neighbor access patterns in clusters
- ^ "The Design and Analysis of a Cache Architecture for Texture Mapping" (PDF).see morton order,texture access pattern
- ^ "morton order to accelerate texturing" (PDF).
- ^ "Morton-order Matrices Deserve Compilers' Support Technical Report 533" (PDF).discusses the importance of morton order for matrices
- ^ a b c d "gpgpu scatter vs gather". Archived from the original on 2016-06-14. Retrieved 2016-06-13.
- ^ ISBN 9780123849892.deals with "scatter memory access patterns" and "gather memory access patterns" in the text
- ^ "Cray and HPCC: Benchmark Developments and Results from the Past Year" (PDF).see global random access results for Cray X1. vector architecture for hiding latencies, not so sensitive to cache coherency
- ^ "data oriented design" (PDF).
- ^ "optimize-data-structures-and-memory-access-patterns-to-improve-data-locality".
- ^ "Template-based Memory Access Engine for Accelerators in SoCs" (PDF).
- ^ "Multi-Target Vectorization With MTPS C++ Generic Library" (PDF).a C++ template library for producing optimised memory access patterns