SMP, Asymmetric Multiprocessing, and the HSA Foundation

Kurt Shuler, 2012年09月27日

(For more information on SMP’s inability to scale well, read Jack Ganssle’s 2008 embedded.com article, “The Nulticore effect,” or the IEEE Spectrum/Sandia Labs article, “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.”)

Processor companies serving the mobility and consumer electronics markets have avoided purely SMP solutions and instead have implemented asymmetric multiprocessing (AMP) architectures. An example of AMP is a mobile phone modem baseband SoC which contains an ARM processor and a DSP to handle control and signal processing, respectively. We also see AMP architectures in today’s mobile phone application processors, which usually have multiple CPU cores and separate discrete graphics cores, video cores, audio cores and imaging cores.

Battery Size and Heat Drive Asymmetric Multiprocessing in Mobility Devices

The mobility world has always been forced to use “the best core for the job” because of the constraints imposed by battery size and heat dissipation. So architectures in mobility have always been created from a baseline expectation of heterogeneous core AMP. qualcomm snapdragon s4 block diagram 300px

This is in contrast to the server and PC markets which have relatively unlimited (at least compared to a mobile phone) power consumption and heat dissipation capabilities. In these markets, it has always been easier to add more cores of the same type, connect them using cache coherency, and reuse the legacy software to run on top.

Things are starting to change, though, as the SMP approach starts to wear thin. For example, for server farms that power the likes of Google and Facebook, power consumption and heat dissipation have become huge cost and environment issues. And in the PC space, we have run into a “GHz wall” where the only way to have a step function increase in performance is to have different cores optimized for different workload types.

Why hasn’t AMP been implemented in the PC and server markets?

It’s hard.

In mobility designs, each heterogeneous processing core, whether graphics, audio, DSP, etc., usually has a custom firmware and software stack associated it. This software must be integrated to communicate with the CPU cores’ operating system, which necessitates coding work in the OS hardware abstraction layer and drivers.

Furthermore, these heterogeneous cores do not have a single view of system memory, so complicated synchronization schemes are usually implemented in hardware and software. Context switching and preemption are difficult to implement.

And most importantly, each of these cores requires an expert programmer to code it, someone conversant in a particular core’s instruction set and tool chains.

As a result, asymmetric multiprocessing has thrived in the relatively closed-to-developers/ISVs mobility and consumer electronics worlds while SMP has flourished in the wide open world of PCs and servers.

The Heterogeneous System Architecture Foundation

The HSA Foundation is a non-profit organization that intends to make it easier for the world to adopt AMP architectures.

Its goals are to:

Make heterogeneous programming easy and a first-class pervasive complement to CPU computing
Continue to increase the power efficiency of heterogeneous systems (AMP), keeping it the platform of choice from smartphones to the cloud
Bring to market strong development solutions (tools, libraries, OS runtimes) to drive innovative advanced content and applications
Foster growth of heterogeneous computing talent through HSA developer training and academic programs to drive both learning and innovation

To achieve these goals, HSA will have to innovate by providing a technical framework and architecture to address the following issues:

Unified Programming Model – Today, CPU and GPU (or other accelerator) cores are programmed separately, with the GPU treated as a remote processor. HSA will allow developers to target the CPU or GPU by writing in task-parallel languages, like the ones they use today when writing for multicore CPUs.
Unified Address Space – HSA supports virtual address translation amongst the heterogeneous cores with an HSA-specific memory management unit (HMMU). HSA compute engines will use the same pageable virtual address space as used by CPUs today.
Queuing – CPUs, GPUs and other cores can queue tasks to each other and to themselves through an HSA runtime. Queuing can be managed in hardware to avoid OS system calls and enable very low latency communication between cores.
Preemption and Context Switching – HSA enables job preemption, job scheduling and fault handling capabilities to overcome potential problems created by rogue or faulted processes.

How will HSA do this?

HSA’s goals and the issues it has chosen to address are admirable, but are difficult to achieve. In my next article I’ll discuss the means by which the HSA Foundation will simplify heterogeneous asymmetric processing. Specifically, I’ll introduce the HSA solution stack, comprising the HSA Assembler, Runtime, Finalizer, and Kernel Driver as well as HSA software libraries and intermediate languages.

Sources

Ganssle, Jack. “The Nulticore effect.” Embedded.com, 8 December 2008.
Moore, Samuel K. “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.” IEEE Spectrum, November 2008.
Kyriazis, George (AMD). “Heterogeneous System Architecture: A Technical Review.” Whitepaper, HSA Foundation, August 2012.
Processor core performance graph is from “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.” IEEE Spectrum, November 2008 and Sandia Labs.
Qualcomm Snapdragon S4 block diagram is from https://www.cnx-software.com/wp-content/uploads/2011/10/qualcomm_snapdragon_s4_block_diagram.jpg.
HSA Solution Stack diagram is from Phil Roger’s presentation at the AMD Fusion 2012 conference titled, “The Programmer’s Guide to a Universe of Probability: The Heterogeneous System Architecture.”

—Kurt Shuler is vice president of marketing at Arteris.

Learn how you can invest in better SoC IP technology today:

Visit the Arteris website technology pages
Download our presentation on Routing Congestion
Download our Springer technical paper “Application Driven Network on Chip Architecture Exploration & Refinement for a Complex SoC”

Topics:
AI/Machine Learning
automotive
CodaCache
Connected by Arteris
Customer
ecosystem
enterprise computing
FlexNoC
low power
multi-die
Ncore
network-on-chip IP
RISC-V design

Products

Solutions

Resources

Company

SMP, Asymmetric Multiprocessing, and the HSA Foundation

Kurt Shuler, 2012年09月27日

Battery Size and Heat Drive Asymmetric Multiprocessing in Mobility Devices

Why hasn’t AMP been implemented in the PC and server markets?

The Heterogeneous System Architecture Foundation

How will HSA do this?

Sources

Learn how you can invest in better SoC IP technology today:

Topics:
AI/Machine Learning
automotive
CodaCache
Connected by Arteris
Customer
ecosystem
enterprise computing
FlexNoC
low power
multi-die
Ncore
network-on-chip IP
RISC-V design

Arteris Articles

Subscribe to Arteris News

Recent Articles

Products

Solutions

Resources

Company

SMP, Asymmetric Multiprocessing, and the HSA Foundation

Kurt Shuler, 2012年09月27日

Battery Size and Heat Drive Asymmetric Multiprocessing in Mobility Devices

Why hasn’t AMP been implemented in the PC and server markets?

The Heterogeneous System Architecture Foundation

How will HSA do this?

Sources

Learn how you can invest in better SoC IP technology today:

Topics: AI/Machine LearningautomotiveCodaCacheConnected by ArterisCustomerecosystementerprise computingFlexNoClow powermulti-dieNcorenetwork-on-chip IPRISC-V design

Arteris Articles

Subscribe to Arteris News

Recent Articles

Topics:
AI/Machine Learning
automotive
CodaCache
Connected by Arteris
Customer
ecosystem
enterprise computing
FlexNoC
low power
multi-die
Ncore
network-on-chip IP
RISC-V design