tinyML Summit 2020

Enabling ultra-low Power Machine Learning at the Edge

February 12-13, 2022

About the tinyML Summit

Following the success of the inaugural tinyML Summit 2019, the tinyML committee invites low power machine learning experts from the industry, academia, start-ups and government labs from all over the Globe to join the tinyML Summit 2020 to share the “latest & greatest” in the field and to collectively drive the whole ecosystem forward.

Tiny machine learning is broadly defined as a fast growing field of machine learning technologies and applications including hardware (dedicated integrated circuits), algorithms and software capable of performing on-device sensor (vision, audio, IMU, biomedical, etc.) data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery operated devices. The inaugural tinyML Summit in March 2019 showed very strong interest from the community with active participation of senior experts from 90 companies. It revealed that: (i) tiny machine learning capable hardware is becoming “good enough” for many commercial applications and new architectures (e.g. in-memory compute) are on the horizon; (ii) significant progress on algorithms, networks and models down to 100kB and below; and (iii) initial low power applications in the vision and audio space. There is growing momentum demonstrated by technical progress and ecosystem development.

tinyML Summit 2020 will continue the tradition of high quality invited talks, poster and demo presentations, open and stimulating discussions, and significant networking opportunities. It will cover the whole stack of technologies (Systems-Hardware-Algorithms-Software-Applications) at the deep technical levels, a unique feature of the tinyML Summits. While the majority of the participants and speakers will come from industry, leading edge academic research will be represented as well as an important ingredient of the evolving tiny machine learning ecosystem. In 2020, special attention will be given to recent progress on algorithm development and tiny machine learning use-cases and applications. The program will be organized in four technical sessions: Hardware, Systems, Algorithms & Software, and Applications. There will be approximately twenty invited presentations selected by the Technical Program Committee and dedicated poster sessions and demos by tiny machine learning companies and sponsors. Overview and hands-on tutorials on hardware and software developments will be available the day before the main technical program starts. Registration will open in October 2019.


Hyatt Regency San Francisco Airport

1333 Bayshore Highway, Burlingame, CA 94010

Contact us



7:30 am to 8:30 am

Registration and Breakfast

8:30 am to 10:00 am

Tutorial 1

NVIDIA Deep Learning Accelerator (NVDLA)

Frans SIJSTERMANS, Vice President Multimedia Arch/ASIC, NVIDIA


Robin Paul PRAKASH, Senior System Architect, NVIDIA

Abstract (English)

Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is a complex and challenging problem. Innovation is required in both HW and SW domains, and we will be including topics from both today. This workshop will cover NVDLA HW’s design and methodology, leveraging domain-specific concepts to help achieve performance scalability as well as best-in-class computational efficiency. We will also be covering deep learning compiler concepts used to help convert NVDLA’s raw performance into accessible performance. By the completion of this workshop, attendees will be able to deploy their own NVDLA in the cloud and execute real-time inference with NVDLA’s open-source SW toolchain.

10:00 am to 10:30 am


10:30 am to 12:00 pm

Tutorial 2

Algorithmic and SW Techniques for Designing and Implementing Energy Efficient CNNs

Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse

12:00 pm to 1:00 pm


1:00 pm to 2:30 pm

Tutorial 3

tinyML SW Frameworks for tinyML: TF-Lite

Pete WARDEN, Technical Lead, Google

Abstract (English)

This workshop will show you how to run a magic wand and other machine learning examples in the TensorFlow Lite for Microcontrollers framework.

3:00 pm to 4:30 pm

Tutorial 4

Enabling Intelligent Edge Devices With Ultra Low-Power Arm MCUs and TensorFlow Lite

Wei XIAO, Principal Evangelist, Engineering Management, Artificial Intelligence, Arm

Abstract (English)

Advances in processing power and machine learning algorithms enable us to run machine learning models on tiny far edge devices. Arm’s latest improvements in SIMD and DSP extensions as well as our collaboration with Google TensorFlow Lite team is pushing machine smarts to our tiniest micro-controllers used in intelligent wireless sensors.

In this hands-on workshop, attendees will build a machine learning application with TensorFlow Lite Micro on Arm Cortex-M devices, then optimize our solution to unleash the unparalleled power of Arm microcontrollers.

6:00 pm to 9:00 pm

VIP Reception

For Summit Speakers, Panelists, Tutorial Instructors, Sponsors and Committee Members

7:00 am to 8:00 am

Registration and Breakfast

8:00 am to 8:15 am

Welcome and Opening Remarks

Session Chair: Evgeni GOUSEV, Senior Director, Qualcomm Research

8:15 am to 10:20 am

Session #1: Advancing the Frontier of Deep Learning Algorithms at the Edge – Part I

Session Moderator: Kurt KEUTZER, Full Professor, University of California, Berkeley

Hardware-Aware Neural Architecture Search and Compression for Efficient Deep Learning

Han CAI, PHD Student, MIT

Abstract (English)

Efficient deep learning computing requires algorithm and hardware co-design to enable specialization. However, the extra degree of freedom creates a much larger design space. Human engineers can hardly exhaust the design space by heuristics. We propose AutoML techniques to architect efficient neural networks. We investigate automatically designing small and fast models (ProxylessNAS), auto channel pruning (AMC), and auto mixed-precision quantization (HAQ). We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200× than previous work, so that we can afford to design specialized neural network models for different hardware platforms. Finally, we accelerate computation-intensive AI applications including TSM for efficient video recognition and PVCNN for efficient 3D recognition on point clouds. Bio: Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han’s research focuses on efficient deep learning computing. He proposed “Deep Compression” and “ EIE Accelerator” that impacted the industry. His work received the best paper award in ICLR’16 and FPGA’17. He was the co-founder and chief scientist of DeePhi Tech which was acquired by Xilinx

Resource Efficient ML in a Few KBs of RAM

Prateek JAIN, Sr. Principal Researcher, Microsoft

Abstract (English)

Several critical applications require ML inference on resource-constrained devices, especially in the domain of Internet of Things like smartcity, smarthouse etc. Furthermore, many of these problems reduce to time-series classification. Unfortunately, existing techniques for time-series classification like recurrent neural networks are very difficult to deploy on the tiny devices due to computation and memory bottleneck. In this talk, we will discuss two new methods FastGRNN and SRNN that can enable time-series inference on devices as small as Arduino Uno that have 2KB of RAM. Our methods can provide as much as 70x speed-up and compression over state-of-the-art methods like LSTM, GRU, while also providing strong theoretical guarantees.

  • YouTube

TinyML Audio Algorithms

Shih-Chii LIU, Professor, Institute of Neuroinformatics, University of Zurich

Abstract (English)

Development of audio TinyML algorithms and their implementation onto embedded hardware have enabled pre-ASIC study of the power-latency tradeoff of deep network architectures and feature representations. This talk presents algorithmic studies of TinyML deep networks on audio tasks including keyword spotting and speaker verification. We will compare the use of input features generated from the asynchronous events of a spiking audio sensor and input features generated from sampled audio. We will compare the throughput and energy efficiency of the FPGA implementations of recurrent neural network architectures used for different TinyML audio tasks. We will also discuss the system cost projection of implementing these networks onto an ASIC recurrent network accelerator.

  • YouTube

MobileNets on the Edge

Andrew HOWARD, Staff Software Engineer, Google AI

Abstract (English)

MobileNets are used widely across industry and academia for mobile and embedded vision applications. In this talk I will present the background and technical details which lead to the design choices for MobileNet models including what makes them extremely efficient and well matched to the mobile use case. I will then present the newest MobileNetV3 models combining neural architecture search as well as new network design elements and their application to classification, object detection and semantic segmentation. I will conclude with results for adapting MobileNets to microcontrollers for IoT applications on the edge.

Efficient Deep Learning on the Edge

Bichen WU, Research Scientist, Facebook

10:20 am to 11:00 am

Demo Pitches

11:00 am to 11:15 am


11:15 am to 12:00 pm

Panel Discussion:

How to Build a tinyML Company

Moderator: Chris ROWEN, VP of Engineering, Cisco

Rajeev MADHAVAN, Founder and General Partner, Clear Ventures

Mike PINELIS, President and CEO, Microtech Ventures, Inc.

Vidya RAMAN, Early-stage Investor, Enterprise Infrastructure and Cybersecurity software

Albert WANG, Investment Director, Qualcomm Ventures & AI Fund

Abstract (English)

When TinyML methods and use-cases push on power, they inherently break new ground on cost, volume and ubiquitousness of smart systems. Innovations that democratize and proliferate intelligent behavior are consistent hallmarks of disruptive technology. We’ve invited a remarkable group of experienced technology investors to discuss the opportunities and pitfalls of new ventures building new products and business models around TinyML.

12:00 pm to 12:30 pm

Day 1 Poster Pitches

12:30 pm to 1:30 pm


2:15 pm to 4:00 pm

Session 2: tinyML Systems and Applications

Session Chair: Hoi-Jun YOO, Professor, KAIST

Next Generation Machine Learning for Mobile and Embedded Platforms

Sang WON LEE, Co-founder and CEO, Qeexo

Abstract (English)

Modern day machine learning systems typically involve development of large networks/models that require deployment of huge cloud servers for the compute capacity which have big privacy implications and result in high costs. Small micro-controller powered devices are being overlooked for ML applications as achieving high model accuracies within the memory and latency constraints is a challenge. With the right set of tools and algorithms, this need not be the case. The technology presented here discusses ways to leverage the use of these devices that occupy our world in the billions and make them intelligent with light-weight machine learning. A platform has been developed that enables development of models optimized for these low power devices. This talk will discuss some exemplary commercial uses of this technology and the processes and suite of tools that are used to deliver new interactive experiences to hundreds of millions of end users.

Some Micro Robots that Need ML

Kris PISTER, Professor, University of California, Berkeley

Abstract (English)

All of the components necessary for making swarms of autonomous micro robots, or silicon insects, now exist. Like their biological inspiration, they will be able to scavenge and store energy, lift many times their own weight, sense and interact with their environment, and collaborate to achieve goals beyond the abilities of an individual. Two characteristics distinguish them from their biological cousins: communication, and intelligence. Silicon insects will use RF mesh networks which will enable them to communicate over distances and at bit rates orders of magnitude higher than real insects, and indeed to reach back to cloud-based resources unavailable to their chitinous friends. But their native intelligence will pale in comparison to the lowliest ant or laziest bee. Here we have an opportunity for TinyML.

  • YouTube

Voice Separation With tinyML on the Edge

Niels PONTOPPIDAN, Research Manager, Augmented Hearing Science at Eriksholm Research Center

Abstract (English)

With the recent advances in many areas of tiny ML several use cases where tiny ML is an absolute requirement has emerged. Hearing devices is an area where tiny ML holds the potential to radically transform the functionality of a 1 mW always on device. People with hearing problems can benefit from hearing device processing that separates competing voices into individual channels followed by resynthesizing of the auditory scene with spatial augmentation. The first successful segregation enhancement of competing voices required deep neural networks to achieve enough separation for the spatial augmentation to enhance segregation. It is furthermore a requirement that the latency of processing is below 20 ms – preferably less – and thus the processing must take place at the ear level without uplink and downlink latencies. Thus, for voice separation to work on the ears of people with hearing problems tiny ML is a necessity.

Perception Needs for Extended Reality Headsets

Ashwin SWAMINATHAN, Senior Director, Perception at Magic Leap

Abstract (English)

This talk presents the importance of Computer Vision and Deep learning techniques for Spatial computing and in specific for an Extended reality headset to be an effective spatial computing platform. The four fundamental perception modalities are introduced: head pose tracking, world reconstruction, eye tracking and hand tracking; emphasizing on the two main general themes: Understanding the world (spatial localization, environment mapping) and Understanding user’s intent (eye, gaze and hands). This talk will provide a deep dive into the main modalities along with key challenges, compute needs and open problems.

4:00 pm to 5:00 pm

Poster and Demo Presentations/Networking

5:15 pm to 6:15 pm


6:30 pm to 8:00 pm


7:00 am to 8:00 am


8:00 am to 8:30 am


8:30 am to 10:15 am

Session 3: tinyML Hardware

Session Chair: Edith BEIGNÉ, Silicon Research Director, Facebook

Thinking Big with Tiny ML: Low Power High Performance DNN Accelerators for Mobile and IoT Applications

Hoi-Jun YOO, Professor, KAIST

Abstract (English)

The artificial intelligence (AI) revolution is being widely spread even to the IoT with the help of 5G wireless communication. Compared to the Cloud-based or Edge-based AI applications, Internet-ofThings (IoT) applications require more autonomous, adaptive, and cooperative operations with extremely limited power, computing and memory resources without stable communication channels. AI, especially deep neural network (DNN), is the key technology to support such autonomy and adaptivity of the IoT machines in an unpredictable environment with limited available information. The IoT machines should contain not only inference but also training capabilities to adapt to environmental changes based on their experiences. Therefore, software and hardware co-optimization for DNN training is necessary for low-power and high-speed accelerators, in the same way it brought a dramatic increase in the performance of DNN inference accelerators. In addition, deep reinforcement learning (DRL) accelerators will be an essential part of the tide, showing a lot of benefits at making continuous decisions in an unknown environment, where labeled data is difficult to acquire.

Energy-efficient On-device Processing for Next-generation Endpoint ML

Tomas EDSÖ, Senior Principal Design Engineer, Arm

Abstract (English)

This talk will show how Arm IP, including future processors based on Arm Helium technology, along with comprehensive software libraries and a widely supported ecosystem, provides new performance levels for system designers creating ML solutions of tomorrow. Combining signal processing and neural network acceleration brings multi-fold efficiency gains compared to existing microcontroller systems today. Coupled with enabling software, Arm IP enables mass deployment of next-generation AI platforms for IoT everywhere, within reach of every developer. We will discuss real-world examples and benchmarks to demonstrate the scalable performance of Arm system-on-chip (SoC) technologies to help you choose the right IP for your application. Attendees will walk away with an understanding of how to accelerate software development on a single Arm Cortex-M toolchain with optimized software libraries and the choice offered by the Arm ecosystem. All resulting in much more efficient, more secure solutions for AI in IoT that are easier to develop, deploy and maintain.

Robust Always-On Battery Powered Voice with Highly Efficient Edge Neural Compute

Stephen BAILEY, CTO, Syntiant

  • YouTube

A ½ mWatt, 128-MAC Sparsity Aware Neural Processing Unit for Classification and Semantic Segmentation

Joseph HASSOUN, Sr. Director Neural Processor Architecture, Samsung Semiconductor

Abstract (English)

This Presentation describes an energy-efficient neural processing unit for battery-operated devices. The architecture utilizing threefold of parallelisms for computing Convolutional and Fully Connected layers to achieve object detection for the at-the far-edge-computation. In this presentation, we will present the underlying technology of co-designing neural net models and neural net accelerators to achieve the right tradeoff for the highest energy efficiency. This 128-MAC structure is capable of running a low-precision modified 2-bit Group-Net Network that can perform Image classification and accurate semantic segmentation of 23 frames per seconds while operating at one-half of one mWatt.

10:15 am to 10:45 am


10:45 am to 11:30 am

Panel Discussion:

The Role of NVM, Emerging Memories and In-Memory Compute for Edge A1

Boris MURMANN, Professor of Electrical Engineering, Stanford University

Geoffrey BURR, Distinguished Research Staff Member, IBM Almaden Research Center

Manar EL-CHAMMAS, VP of Engineering, Omni Design Technologies, Inc.

Jae-sun SEO, Associate Professor, Arizona State University

Joseph WANG, Sr Director of Engineering, Qualcomm

Abstract (English)

The design of energy-efficient hardware for edge AI revolves primarily around data movement and memory access. Especially for small models, it is not only feasible to have all memory on chip, but also reduce data movement further using in-memory compute cores. The community is currently investigating a variety of options in this space using compute-SRAM, Flash, RRAM, MRAM, phase change, and other emerging memory technologies. Will these ideas lead to a breakthrough in hardware efficiency? Will they deliver the programmability needed for practical deployment and real-world applications? A panel of experts from industry, academia and research institutes will debate these questions and spell out their future predictions for our audience.

11:30 am to 12:00 pm

Day 2 Poster Pitches

12:00 pm to 1:00 pm


1:00 pm to 2:15 pm

Poster and Demo Presentations/Networking

2:15 pm to 4:00 pm

Session 4: Advancing the Frontier of Deep Learning Algorithms at the Edge – Part II

Session Chair: Ofer DEKEL, Partner Research Area Manager, Microsoft Research

Deep Model Compression and Acceleration Towards On-Sensor AI

Changkyu CHOI, Senior Vice President, Samsung Advanced Institute of Technology

Abstract (English)

ML Inference on embedded hardware has been attracting attention for many applications. Limitations on computing power, memory usage, and power consumption are major bottleneck to deploy deep neural networks on resource-constrained devices. Existing techniques to mitigate the limitations often involve power-latency tradeoff and yield degraded accuracy. This talk presents algorithm studies of deep model compression and computational acceleration on facial recognition tasks including face detection and anti-spoofing. A trainable quantizer is proposed to learn intervals to quantize activations and weights. This quantization-interval-learning allows the quantized networks to maintain the accuracy of the full precision (32-bit) networks with bit-width of activations and weights as low as 4-bit. The effectiveness of our trainable quantizer on ImageNet dataset will be demonstrated with various network architectures such as AlexNet, VGG-16, ResNet-50, Inception-V3, and so on. Furthermore, this talk proposes ‘On-sensor AI’ computing architecture while exploring the advantages of 4-bit (or less) operations of activations and weights. 4-bit MAC operation can be implemented by utilizing AND and COUNT operations only. The speed of computation of AND/COUNT implementation is similar to the one of Multiplier/Adder implementation. However, it becomes way faster if the bit-width goes down to 3-bit or less. Moreover, replacing the Multiplier & Adder with logical AND & COUNT operators can reduce the number of transistor counts by over 40 times. This ‘On-sensor AI’ is challenging but promising. This will pave the way to deploy highly accurate tiny ML models to things in everyday life.

Making optimizing and deploying Tiny Machine Learning on STM32 Microcontrollers easy

Matthieu DURERIN, AI Applications Manager, STMicroelectronics

Abstract (English)

Providing machine learning based algorithms that are able to run efficiently on Tiny devices like microcontrollers is already a challenge. Making it easy and affordable for data scientists and embedded software experts is a key step for market adoption of this technology. STMicroelectronics has developed tool STM32Cube.AI so that customers have an easy path to enable neural networks on any device across the broad STM32 microcontroller portfolio. The tool maps and runs pre-trained Neural Networks on STM32 microcontrollers and supports a wide range of popular deep-learning training tools like Keras, Tensor Flow Lite, Caffe… and ONNX format. It also takes advantage of quantization by supporting post-training quantization and quantized-aware learning models. The presentation will include the latest features of the STM32Cube.AI tool. To complement STM32Cube.AI, STMicroelectronics has developed software packages for quick and easy prototyping with end-to-end Audio, Motion and Vision examples. Audio and Motion use cases, such as human activity recognition and audio-scene classification, are running on STM32L4 ultra-low-power microcontrollers. Computer Vision examples, like food classification, are running on STM32H7 microcontrollers. Examples cover a wide range of options like quantized or float models and different memory configurations.

  • YouTube

Optimizing Inference Efficiency for Tiny DNNs

Harris TEAGUE, Principal Engineer , Qualcomm, Inc.

Abstract (English)

In this talk, I will explore some of the ways that we are working on improving model inference efficiency for tiny devices – where power, area, memory, compute resources are limited. I will present results for a few of these: compute scheduling optimization, model compression, quantized inference, and inmemory computing. Finally, I will discuss our plans for next research steps to further understand and develop the technology.

  • YouTube

tinyMLPerf: Benchmarking Ultra-low Power Machine Learning Systems

Vijay JANAPA REDDI, Associate Professor, Harvard University

Abstract (English)

Tiny machine learning (ML) is poised to drive enormous growth within the IoT hardware and software industry. Measuring the performance of these rapidly proliferating systems, and comparing them in a meaningful way presents a considerable challenge; the complexity and dynamicity of the field obscure the measurement of progress and make embedded ML application and system design and deployment intractable. To foster more systematic development, while enabling innovation, a fair, replicable, and robust method of evaluating tinyML systems is required. A reliable and widely accepted tinyML benchmark is needed. To fulfill this need, tinyMLPerf is a community-driven effort to extend the scope of the existing MLPerf benchmark suite (mlperf.org) to include tinyML systems. With the broad support of over 75 member organizations, the tinyMLPerf group has begun the process of creating a benchmarking suite for tinyML systems. The talk presents the goals, objectives, and lessons learned (thus far), and welcomes others to join and contribute to tinyMLPerf.

  • YouTube

Using ML for ML to Span the Gamut of TinyML Hardware

Jason KNIGHT, Co-founder and CPO, OctoML

  • YouTube

4:00 pm to 5:00 pm

Poster and Demo Presentations/Networking

Schedule subject to change without notice.



General Chair

Qualcomm Research


General Chair







KU Leuven




Microsoft Research


University of California, Berkeley


Stanford University



Hoi-Jun YOO




Stephen BAILEY


Geoffrey BURR

IBM Almaden Research Center


Omni Design Technologies, Inc.

Changkyu CHOI

Samsung Advanced Institute of Technology

Matthieu DURERIN


Tomas EDSÖ


Song HAN





Samsung Semiconductor


Google AI

Prateek JAIN


Jinwon LEE




Shih-Chii LIU

Institute of Neuroinformatics, University of Zurich


Clear Ventures


Stanford University


Microtech Ventures, Inc.


University of California, Berkeley


Augmented Hearing Science at Eriksholm Research Center

Robin Paul PRAKASH



Enterprise Infrastructure and Cybersecurity software


Harvard University



Jae-sun SEO

Arizona State University




Perception at Magic Leap


Qualcomm, Inc.

Albert WANG

Qualcomm Ventures & AI Fund

Joseph WANG






Hoi-Jun YOO



( Click on a logo to get more information)