Sources of Variability?

This project addresses four sources of variability. They impact the power, performance and reliability of hardware. These sources of variability differ in the extent to which they can be a priori characterized, and in the time scales over which they are expressed:

 

Manufacturing: The authoritative International Technology Roadmap for Semiconductors (ITRS) highlights power and performance variability management in the next decade as a “red brick” (i.e., a problem with no known solutions) for design of computing hardware. On the memory/storage front, recent measurements done by Variability Expedition researchers show 27 percent and 57 percent operation energy variation and 50 percent bit error rate variation between nominally identical flash devices (i.e., with the same part number).


Environment: ITRS projects Vdd variation to be 10 percent while the operating temperature can vary from minus-30C to 175C (e.g., in automotive context) resulting in several orders of magnitude sleep power variation and several tens of percent performance change.


Vendor: Parts with almost identical specifications can have substantially different power, performance or reliability characteristics, as shown in the figure below. This variability is a concern as single vendor sourcing is difficult for large-volume systems.


Aging: Wires and transistors in modern integrated circuits suffer wear-out leading to power and performance changes over time. Degradation can be noticeable within a few milliseconds and may not saturate for several years.

 

 

 

Research Projects

The Variability Expedition is tackling a host of major challenges. A dozen, mostly multi-institutional collaborative projects are spread across five general categories of research. For details on each individual project, click on the title under 'Project Links' under each category.

 

Measurement | Modeling

Abstract: One of the first steps that we have undertaken as part of the Variability Expeditions is to find examples of hardware variability at an individual component level as well as at a platform level for current hardware. This variability can be in the form of observed differences across presumably identical parts, in terms of power consumption, performance, error rates etc. Our thesis is that as technology scales, with feature sizes reducing even further, the variability in hardware will only increase. By studying variations in current generation hardware we hope to identify, quantify and eventually model this variability so that it can be exposed to higher layers of the software stack, which can then adapt to either reduce its effect and leverage it for goals such as improving fault tolerance, improving efficiency and lowering power consumption. 


Participating Campuses: UCSD, UCLA, University of Michigan

Project Links:

• Variability in Current Computing Platforms
• OS-Level Proactive Thermal Management
• Mitigating Variability in Solid-State Storage Devices
• Circuit-Level Monitors for Power, Performance, Aging and Reliability
• Variability-Aware Memory Management


Design Tools | Testing

Abstract: The projects that are part of this research thrust are aimed towards dramatically reducing hardware design and test costs for computing systems, and will achieve the maximum performance potential at minimum energy and total costs. For hardware design flow, objectives are efficient design and test methods, given that the goal is no longer to optimize yield for fixed specifications, but rather to ensure that designs exhibit well-behaved variability characteristics that a well-configured software stack can easily exploit.



Participating Campuses: UCLA, Stanford, University of Michigan
Project Links:

• Aging Management at the Circuit, Architecture and Runtime Levels
• Circuit-Level Monitors for Power, Performance, Aging and Reliability


Micro-Architecture | Compilers 

Abstract: The projects in this thrust area will focus on the software stack that can respond to the application needs based on changing data and environment (platform characteristics and behavior). The eventual goal is to those needs through a responsive architecture and programmer interface that will extend the traditionally-fixed instruction set architecture (ISA) specification of a computing machine to one where the ISA functionality and performance are mutable across different instances of the hardware, different invocations of an application, and within the lifetime of an application. The compilers developed by this project will combine semantic analysis of the application behavior with compiler strategies exploiting both static (off-line) and lightweight on-the-fly techniques to adapt applications that can leverage the underlying variability in hardware.

 

Participating Campuses: UCSD, UCLA, UC Irvine, Stanford

Project Links:

• Maintaining Quality of Service of Computation with Variability
• Imprecise Computation for Energy Savings
• Aging Management at the Circuit, Architecture and Runtime Levels
• Abstractions for Representing Software-Visible Manifestations of Hardware Variations
• Circuit-Level Monitors for Power, Performance, Aging and Reliability


Runtime Support 

Abstract: The software stack can take several different types of run-time actions in response to hardware variability, such as altering the workload, using alternative hardware resources, changing the algorithm, or altering the operational setting of the hardware. A key difference from software mechanisms developed for fault-tolerant computing is that a variability-aware software stack can take actions preemptively, based on statistically predicted variability behavior of the hardware, and consider not simply functional and temporal correctness but also factors such as energy efficiency. The diverse forms in which variability occurs, and the strong dependence of the response on the current application and system context, present challenges to the software stack.

 

Participating Campuses: UC Irvine, UIUC 

Project Links:

• Aging Management at the Circuit, Architecture and Runtime Levels
• Abstractions for Representing Software-Visible Manifestations of Hardware Variations
• Revisiting Fault-Tolerance in Processors
• Variability-Aware Runtime Adaptation in Long-Running Embedded Systems
• Variability-Aware Memory Management


Application | Testbeds 


Abstract: Certain applications such as search engines, speech recognition, and multimedia rendering are tolerant of output quality variations, offering the system opportunities to trade output quality against system performance, resource cost, and energy usage by adaptive duty-cycling. One of the major goals of the Variability Expedition’s is to experiment with different types, and classes of platforms and experimental testbeds that enable others to not only measure and observe hardware variability, but also to test new solutions in domains with potential for broad societal, economic and education impacts. These planned testbeds draw on different classes of systems (embedded, mobile, server) and different designs (off-the-shelf integrated circuits, FPGA-based, custom-designed IC).  The testbeds are designed to be eventually network-accessible shared instruments for all Expedition partners, and for the broader research community. The testbeds will serve as highly visible demonstration vehicles, and as a way to evaluate the success of the Variability Expedition’s research and knowledge-transfer goals. 


Participating Campuses: UCSD, UCLA
Project Links:

• Variability in Current Computing Platforms
• OS-Level Proactive Thermal Management
• Hardware-Signature Based Application Adaptation
• Variability-Aware Memory Management