Circuit-Level Monitors for Power, Performance, Aging and Reliability


This project aims at developing hardware components of the software-driven variability solutions proposed in the Expedition. The hardware outcomes will provide data about, and mechanisms to cope with, the underlying variations including aging, temporal variation, process, and temperature. We seek compact, accurate, and in many cases in situ, sensing techniques to enable software better visibility into the technology platform.

Another goal of this project is to develop inexpensive performance monitoring strategies so that hardware performance signatures can be generated and reported to software. Current work focuses on algorithmically synthesizing multiple ring oscillators which collectively allow accurate prediction of frequency under process, Vdd and temperature variations.

As part of this project, we proposed a low power unified oxide and NBTI degradation sensor designed in 45nm process node. The cell power consumption is 105 lower than a previously proposed sensor. The unified nature enables efficient reliability monitoring with reduced sensor deployment effort and area overhead. Using the sensor Dynamic NBTI Management (DNM) has been implemented for the first time. DNM trades the excess ‘reliability-margin’ present in the design, due to better than worst case operating conditions, with performance. For the typical case shown in this paper, DNM allows for an average boost of 90mV in accelerated supply voltage while bringing down the excess NBTI margin of 22.5mV to 8mV where the total budget for NBTI was 66mV.

We also proposed an active learning framework to extract process variation from measurements and reduce test cost. Several techniques are developed to model the variation. By reusing a priori knowledge from earlier wafers, the partial test can be conducted on the forthcoming wafers to achieve the required accuracy and test cost. Experimental results show that the framework can achieve an accuracy of 2-3% relative error using only ~30% test structures for two industrial processes.

A paper titled “A confidence-driven model for error-resilient computing” was presented at DATE 2011 describing a confidence-driven approach to highly variable and unreliable computing platforms. The technique relies on intelligent combination of temporal and spatial redundancy to extract the benefits of both while minimizing each of their overheads. The work has been mapped to FPGA for emulation of a CORDIC processor and shows relatively low overheads with high reliability in the face of large failure rates. The work is joint with Intel and other faculty at Michigan (Zhang/Blaauw).

We previously developed a novel pulse-width modulation (PWM) scheme to improve energy-performance tradeoff in global on-chip interconnect. The technique relies on the ability to distinguish between the timing characteristics of pulses, and is susceptible to variability from a functional perspective (not just parametrically). Hence we developed a self-calibration approach that will tune individual PWM encoder and decoders to maximize process window and performance. The approach was demonstrated in hardware (65nm CMOS, 5mm on-chip links), showing 11% performance improvement and 2.5X lower spread in delay after self-calibration.


Publications:

"Crosstalk-aware PWM-based on-chip links with self-calibration in 65nm CMOS," J-S. Seo, D. Blaauw, and D. Sylvester. IEEE Journal on Solid-State Circuits, 05-12-11

"A Confidence-Driven Model for Error-Resilient Computing," C-H Chen, Y Kim, Z Zhang, D Blaauw and D Sylvester, U of Michigan, Ann Arbor; H Naeimi and S Sandhu, Intel. Proc. Design, Automation & Test in Europe (DATE\'11), 03-17-11

"Active learning framework for post-silicon variation extraction and test cost reduction," C. Zhuo, K. Agarwal, D. Sylvester, and D. Blaauw. Proc. IEEE/ACM International Conference on Computer-Aided Design 2010, pp. 508-515, 11-11-10

"Dynamic NBTI management using a 45nm multi-degradation sensor," P. Singh, E. Karl, D. Sylvester, and D. Blaauw. Proc. IEEE Custom Integrated Circuits Conference 2010, 09-21-10

Milestones:

Tapeout with ARM Cortex M3 along with the SynROs in IBM 45SOI in March 2011.


Plans/Outlook:

We await silicon results. The simulation results look very promising.


Details of dynamic NBTI management using a subset (32) of compact BTI sensors to predict the wearout of a different set of sensors (224, serving the role of “circuit under test” for this experiment).  The left figure shows the measured data of a representative sensor and the fit to the data used to extrapolate to end of lifetime and make decisions on maximum voltage allowed (staircase data inset).  At right is the resulting distributions of degradation in the full set of 256 sensors including results without voltage boosting (black) and with boosting (red) to enhance performance by ~10% in this case while preserving lifetime in all sensors given the bound of 66mV maximum DVth.

Details of dynamic NBTI management using a subset (32) of compact BTI sensors to predict the wearout of a different set of sensors (224, serving the role of “circuit under test” for this experiment). The left figure shows the measured data of a representative sensor and the fit to the data used to extrapolate to end of lifetime and make decisions on maximum voltage allowed (staircase data inset). At right is the resulting distributions of degradation in the full set of 256 sensors including results without voltage boosting (black) and with boosting (red) to enhance performance by ~10% in this case while preserving lifetime in all sensors given the bound of 66mV maximum DVth.


Category:

Measurement / Modeling

Design Tools / Testing

Micro-Architecture / Compilers


Campus:

UCLA

UMich


People:

PIs: Dennis Sylvester (UMich), Puneet Gupta (UCLA); Graduate Students: Dave Fick (UMich), Liangzhen Lai (UCLA), Tuck-Bon Chan (UCLA, now at UCSD)



Awards:

Intel/CICC Student Scholarship Award, “Dynamic NBTI management using a 45nm multi-degradation sensor” (http://blaauw.eecs.umich.edu/getFile.php?id=426&sid=b7c408d25fe797e9f741afd322f5faef)




 

Click here to view other Research Projects