MEDIATEK

Facing Test Challenges in Upcoming AI/5G-based Systems – Opportunity for Radical Ideas

SWTW Asia 2018

Harry H. Chen IC Testing Scientist Computing and Artificial Intelligence Technology Group

October 19, 2018

### Outline

- MediaTek Overview
- AI/5G-enabled Systems
- Test Challenges
- Systems Orientation
- New Opportunities











ILITEK ILI Technology Corp.





Sigm**Star** 

MStar Semiconductor, Inc.

Nephos Inc.

Richtek Technology Corp.

SigmaStar Technology, Inc.





World's 4<sup>th</sup> largest fabless IC design company

13<sup>th</sup> of top 25 semiconductors globally, top 4 in Asia

7.8 Bn MediaTek Group revenue in 2017 (USD)

Ship about 1.5 billion chips a year





### MediaTek is all around you

annually around one out of five households globally acquired a product powered by a MediaTek SoC



Source: 2015 World Bank data, United Nations Statistical Division. Population and Vital Statistics Report (various years).

2018 Copyright © MediaTek Inc. All rights reserved.



### **Core Technologies Enable Intelligent Devices with Leading Customers**



### MediaTek, the Edge AI enabler

#### MediaTek is the ONLY company which supports the broad-range consumer products

• Developed based on our wide range technology portfolio

• Across different platforms and OSes (Android, Linux, RTOS, Others)



# Computational Intelligence

# / <mark>5G</mark>

#### High Data Rate Mobility

# **Future of Smart Everything**



### AI + 5G require semiconductor advances



### **DL demands heavy-duty computation**

= 1M Giga

Delay

Energy Product

Performance per watt on various architectures •

CPU 1.5 GOPs/W  $Giga = 10^{9}$  $Peta = 10^{15}$ GPU 24.0 GOPs/W ASIC 452 GOPs/W Brain 20 PetaOPs @20W

- Wasted energy moving data • processor  $\Leftrightarrow$  memory
- Brain has CIM architecture analog, asynchronous, event-driven



Leading-edge nm SoC
Processor + HBM SiP

Courtesy: MF Chang, NTHU

Al is trending towards approximate computing 

CIM = Compute-In-Memory

ASIC...

### An example of CIM using Resistive-RAM

### □1T1R Matrix-vector Multiplication

- Input: WL signal
- Weight: RRAM
- No Leakage current

Courtesy: MF Chang, NTHU

Inputs

- Variability and reliability issues remain to be solved
- But new device types are coming



### AI + 5G require semiconductor advances



### **Testing advanced technologies for AI + 5G**

#### Hard and Expensive!

- Testing of SoC and SiP as individual components
- Testing the complex end-use system containing the components

#### Self-driving car

- Electro-mechanical
- HW + SW
- Optical, RF, GPS
- Networking
- Integrated decisions



Source: McKinsey

### **Component testing challenges**

- Increased process variation and aging effects
- Power restrictions shrink operating margins
- More subtle escapes into system "marginal defects"
- KGD assembly, interconnect, test access
- Multi-domain: analog, digital, optical, mechanical
- Power and thermal integrity
- Non-matched component interactions
- Failure diagnosis









### Unfavorable test cost trend

Leveraging Advanced Manufacturing to Address Challenges in the Automotive Memory Market

SWTW 2018

Brett Debenham Sr. Director, Test/Probe Central Engineering, Micron

"Emerging Test Methods -- How Auto IC Requirements, Adaptive Testing and Multichip Products are Changing the Industry"

> Phil Nigh | Test Strategy & Methodology phil.nigh@globalfoundries.com

> > GLOBALFOUNDRIES"

Pad Size/Probe Card Complexity Trend Bond Pad Size/Pitch/Thickness



Since SWTW 2013...

- ASIC / SOC / CPU testing will the probe electrical environment allow us to drive all testing content back to wafer test ?
  - Target probe size & pitch & probe count & current density ?
    - 20um size ? 40um pitch ? 30K probe count ?
  - Noise (e.g., full test without package-level capacitors)
  - HSS test at >28G ... probe & external loopback ?
  - RF frequency requirements ... >40GHz

Test Temperature requirements (Automotive ... >105C ?)

- DRAM 1TD wafer test is mainstream
- -40C to 150C automotive temperature requirements
- Reduced probe pad size and pitch
- Higher die-per-wafer, higher parallelism = greater power and thermal management overhead

### Is 100% component testing good enough?

#### Short answer: NO

- Does not guarantee quality and reliability of the integrated system
- Dramatic increase in system complexity!
  - SoC, SiP hardware devices
  - Embedded software, HW/SW interactions
  - Additional requirements related to safety and security

#### **SEMICONDUCTOR** ENGINEERING

#### Automakers Take On More Responsibility

OCTOBER 8TH, 2018 - BY: KEVIN FOGARTY

Carmakers traditionally have left verification, validation and testing of chips and subsystems to their suppliers. ... But as the amount of electronic content increases and the complexity of these systems grows, not to mention the deployment of these devices in safety-critical systems, carmakers are taking on a much bigger role to ensure these components work in the context of other systems.

### **Testing complex SoC**

- Deep-nm: structural testing wall ~500 DPPM
- Less control of process & lithography variations

#### → Marginal Defects

- Failure only under certain (Volt, Temp, Workload) conditions
- Escape production scan test under stable ATE environment
- Multi-core/power/clock design complexity
  - SW interact with weak HW under tight power supply margin
- Structural DFT blocks system-level interactions

#### → Test Gap

Cross-domain interactions, stressed scenarios

#### ATE Structural Test vs. System Traffic Test



Courtesy: Zoe Conroy, Cisco

### **System-level failures**

- Unanticipated scenarios involving marginal HW & SW
- Application-dependent, compound failing causes



Bill Eklow on Micoroprocessor Reliability

Distinguished manufacturing engineer at Cisco, Bill Eklow examines <u>application-enabled defects</u> ... https://youtu.be/rbQGpsIB7rE ... scaling gives rise to design complexity and more subtle defects
... system failures due to escapes from ATE running traditional tests, i.e., NTFs
Ex: soft errors induced by particular application's memory access behavior

... need new system-level test and diagnosis methods

### Re-think the role of testing in complex systems

#### A complex system of systems

- external interactions
- internal monitoring
- self regulation & healing
   autonomic nervous
   & immune systems



Same principles should guide the future role of test

- external functions
- embedded sensors everywhere
- continuous testing, adapt, repair

Human brain

#### *It's already started ...* DL neural network



### **Research in self-aware systems**



A. Jantsch, et al., "Self-Awareness in Systems on Chip - A Survey," IEEE Design & Test, vol. 34, no. 6, pp. 8-26, Nov. 2017.

Testing becomes part of system design spec, reducing or eliminating the need for and associated cost of external test





C-W. Wu, et al., "Symbiotic system models for efficient IOT system design and test," ITC-Asia 2017.

B-Y. Lin, et al., "Highly reliable and low-cost symbiotic IOT devices and systems," ITC 2017. 23

### Growing need for system-level test (SLT)

- Augment, not replace structural test
- Run system application scenarios (system functional test)
- Catch subtle escapes (save reputation, RMA expense)
- ATE (LV) overkill recovery (boost test yield)
- But SLT has associated costs
  - Test development, debug, diagnosis
  - Potentially limits volume throughput if long test times (addressed by massively parallel SLT platforms @5000 UPH)

#### SLT challenges – lack of methodology Structural Test SLT SA, TD Fault model Х (IDDQ/BR/SDD/CA) Fault coverage $\mathbf{\Lambda}$ X SA = stuck-atSCAN, BIST X DFT TD = transition delayIDDQ = CMOS static leakage BR = interconnect bridging **ATPG** $\checkmark$ X SDD = small delay defect CA = cell awareno-trouble-found Failure Deterministic hard to diagnosis

### **Multi-facet SLT methodology**

- 1. System-level fault model
  - Coverage, fault simulation, ATPG
- 2. System-level DFT
  - Reduce error latency
  - More accurate diagnosis
- 3. System-level data analytics
  - Enable adaptive SLT

### System-level fault model (SL-FM)



Random soft bit errors during memory access in some application scenarios

- Not "fault at single fixed location"
- Faulty behavior is probabilistic
- Focus on likely scenarios of faulty behavior
  - Implicated components and their interactions
  - Create targeted patterns to exercise scenarios

### Model at system-level of abstraction

### Not low structural level

- Components
- Resources
- Dependencies
- Constraints
- Actions
- Scenarios



Source: PERSPEC model, Cadence Design Systems



### System-level testability (SL-DFT)

- Complex system failures pose difficulties
  - Big effort to attribute to SW (80%) or HW causes
  - Long error latency impedes accurate diagnosis
- SL-DFT adds controllability and observability without disrupting normal system operation
- Embedded sensors and logic for system profiling and post-silicon debug
- Embedded SW can be made self-checking to reduce error latency and enable better diagnosis

### **QED (Quick Error Detection) for post-silicon debug**

#### Subhasish Mitra, Stanford University

- Various SW instrumentation techniques to reduce error latency
- Add small HW checkers for faster performance



### **SLT flow**

- Build SL-FM scenarios
  - Failure diagnosis aided by SL-DFT
  - Test gaps
  - Margin analysis
- SL-TG produce short & targeted patterns to cover SL-FM scenarios



## Adaptive SLT $\rightarrow \downarrow$ cost, $\uparrow$ quality

- Sharing data across the supply chain
- Require trust relationship to exist among the players

What if component suppliers also compete with each other?

•



### Can wireless alleviate wafer probe test

- High volume of test data generated by embedded sensors
- Probe size/pitch/count are already under heavy strain
- Why not wireless test data access? To complement probe, not replace





S. Shamim, et al., "A Wireless Interconnection Framework for Seamless Inter and Intra-chip Communication in Multichip Systems," IEEE Trans. on Computers., vol. 66, no. 3, pp. 389-402, Mar. 2017. Y-T. Hsing, et al., "Economic Analysis of the HOY Wireless Test Methodology," IEEE Design & Test, vol. 27, no. 3, pp. 20-30, May 2010.

### Conclusion

- Complex AI/5G-based systems pose an enormous test challenge
- Existing test methods become too costly to scale up
- Must shift to system-oriented approach to test
- Explore radical opportunities or prepare to be disrupted
- AI + 5G may also provide boost to big test data analytics
- System-orientation will require more trust and collaboration