ARM big.LITTLE
ARM big.LITTLE is a
In October 2011, big.LITTLE was announced along with the
The problem that big.LITTLE solves
For a given library of CMOS logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fast out-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors) memory cache, or a simpler microarchitecture such as removing out-of-order execution. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system.
In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.
Run-state migration
There are three ways
Clustered switching
The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when the
In-kernel switcher (CPU migration)
CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possibly
A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their Tegra 3 System-on-Chip.
Heterogeneous multi-processing (global task scheduling)
The most powerful use model of big.LITTLE architecture is heterogeneous multi-processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.[9]
This model has been implemented in the
Scheduling
The paired arrangement allows for switching to be done transparently to the operating system using the existing dynamic voltage and frequency scaling (DVFS) facility. The existing DVFS support in the kernel (e.g. cpufreq
in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012.[13]
Alternatively, all the cores may be exposed to the
Advantages of global task scheduling
- Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kernel overhead is reduced and power savings can be correspondingly increased.
- Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS.
- The ability to easily support non-symmetrical clusters (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores).
- The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.
Successor
In May 2017, ARM announced DynamIQ as the successor to big.LITTLE.[16] DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8 for Armv8.2 CPUs, 12 for Armv9 and 14 for Armv9.2[17] and allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by the Cortex-A75 and Cortex-A55 CPU cores and their successors.
References
- ^ "big.LITTLE technology". ARM.com. Archived from the original on 22 October 2012. Retrieved 17 October 2012.
- ARM Holdings. 19 October 2011. Retrieved 31 October 2012.
- ARM Holdings. Retrieved 31 October 2012.
- ^ "ARM's new Cortex-A12 is ready to power 2014's $200 midrange smartphones". The Verge. April 2014.
- ^ "ARM Cortex A17: An Evolved Cortex A12 for the Mainstream in 2015". AnandTech. April 2014.
- ARM Holdings. Archived from the originalon 10 September 2013. Retrieved 17 September 2013.
- ^ George Grey (10 July 2013). "big.LITTLE Software Update". Linaro. Archived from the original on 4 October 2013. Retrieved 17 September 2013.
- ^ Peter Clarke (6 August 2013). "Benchmarking ARM's big-little architecture". Retrieved 17 September 2013.
- ARM Holdings, September 2013, archived from the original(PDF) on 17 April 2012, retrieved 17 September 2013
- ^ Brian Klug (11 September 2013). "Samsung Announces big.LITTLE MP Support in Exynos 5420". AnandTech. Retrieved 16 September 2013.
- ^ "Samsung Unveils New Products from its System LSI Business at Mobile World Congress". Samsung Tomorrow. Archived from the original on 16 March 2014. Retrieved 26 February 2013.
- ^ "The future is here: iPhone X". Apple Newsroom. Retrieved 25 February 2018.
- ^ McKenney, Paul (12 June 2012). "A big.LITTLE scheduler update". LWN.net.
- ^ Perret, Quentin (25 February 2019). "Energy Aware Scheduling merged in Linux 5.0". community.arm.com.
- ^ "Energy Aware Scheduling". The Linux Kernel documentation.
- ^ Humrick, Matt (29 May 2017). "Exploring Dynamiq and ARM's New CPUs". Anandtech. Retrieved 10 July 2017.
- ^ Ltd, Arm. "DynamIQ – Arm®". Arm | The Architecture for the Digital World. Retrieved 18 October 2023.
Further reading
- David Zinman (25 January 2013). "big.LITTLE MP status Jan 25, 2013". LWN.net. Retrieved 25 January 2013.
- Nicolas Pitre (15 February 2012). "Linux support for ARM big.LITTLE". LWN.net. Retrieved 18 October 2012.
- Paul McKenney (12 June 2012). "A big.LITTLE scheduler update". LWN.net. Retrieved 18 October 2012.
- Jake Edge (5 September 2012). "KS2012: ARM: A big.LITTLE update". LWN.net. Retrieved 18 October 2012.
- Jon Stokes (20 October 2011). "ARM's new Cortex A7 is tailor-made for Android superphones". Ars Technica. Retrieved 31 October 2012.
- Andrew Cunningham (30 October 2012). "ARM goes 64-bit with new Cortex-A53 and Cortex-A57 designs". Ars Technica. Retrieved 31 October 2012.
External links
- big.LITTLE Processing
- big.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7 (PDF) (full technical explanation)