About the tinyML Summit
Following the success of the inaugural tinyML Summit 2019, the tinyML committee invites low power machine learning experts from the industry, academia, start-ups and government labs from all over the Globe to join the tinyML Summit 2020 to share the “latest & greatest” in the field and to collectively drive the whole ecosystem forward.
Tiny machine learning is broadly defined as a fast growing field of machine learning technologies and applications including hardware (dedicated integrated circuits), algorithms and software capable of performing on-device sensor (vision, audio, IMU, biomedical, etc.) data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery operated devices. The inaugural tinyML Summit in March 2019 showed very strong interest from the community with active participation of senior experts from 90 companies. It revealed that: (i) tiny machine learning capable hardware is becoming “good enough” for many commercial applications and new architectures (e.g. in-memory compute) are on the horizon; (ii) significant progress on algorithms, networks and models down to 100kB and below; and (iii) initial low power applications in the vision and audio space. There is growing momentum demonstrated by technical progress and ecosystem development.
tinyML Summit 2020 will continue the tradition of high quality invited talks, poster and demo presentations, open and stimulating discussions, and significant networking opportunities. It will cover the whole stack of technologies (Systems-Hardware-Algorithms-Software-Applications) at the deep technical levels, a unique feature of the tinyML Summits. While the majority of the participants and speakers will come from industry, leading edge academic research will be represented as well as an important ingredient of the evolving tiny machine learning ecosystem. In 2020, special attention will be given to recent progress on algorithm development and tiny machine learning use-cases and applications. The program will be organized in four technical sessions: Hardware, Systems, Algorithms & Software, and Applications. There will be approximately twenty invited presentations selected by the Technical Program Committee and dedicated poster sessions and demos by tiny machine learning companies and sponsors. Overview and hands-on tutorials on hardware and software developments will be available the day before the main technical program starts. Registration will open in October 2019.
Hyatt Regency San Francisco Airport
1333 Bayshore Highway, Burlingame, CA 94010
7:30 am to 8:30 am
Registration and Breakfast
8:30 am to 10:00 am
NVIDIA Deep Learning Accelerator (NVDLA)
Frans SIJSTERMANS, Vice President Multimedia Arch/ASIC, NVIDIA
Mitch HARWELL, DLA SW, NVIDIA
Robin Paul PRAKASH, Senior System Architect, NVIDIA
Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is a complex and challenging problem. Innovation is required in both HW and SW domains, and we will be including topics from both today. This workshop will cover NVDLA HW’s design and methodology, leveraging domain-specific concepts to help achieve performance scalability as well as best-in-class computational efficiency. We will also be covering deep learning compiler concepts used to help convert NVDLA’s raw performance into accessible performance. By the completion of this workshop, attendees will be able to deploy their own NVDLA in the cloud and execute real-time inference with NVDLA’s open-source SW toolchain.
10:00 am to 10:30 am
10:30 am to 12:00 pm
12:00 pm to 1:00 pm
1:00 pm to 2:30 pm
tinyML SW Frameworks for tinyML: TF-Lite
Pete WARDEN, Technical Lead, Google
This workshop will show you how to run a magic wand and other machine learning examples in the TensorFlow Lite for Microcontrollers framework.
3:00 pm to 4:30 pm
Enabling Intelligent Edge Devices With Ultra Low-Power Arm MCUs and TensorFlow Lite
Wei XIAO, Principal Evangelist, Engineering Management, Artificial Intelligence, Arm
Advances in processing power and machine learning algorithms enable us to run machine learning models on tiny far edge devices. Arm’s latest improvements in SIMD and DSP extensions as well as our collaboration with Google TensorFlow Lite team is pushing machine smarts to our tiniest micro-controllers used in intelligent wireless sensors.
In this hands-on workshop, attendees will build a machine learning application with TensorFlow Lite Micro on Arm Cortex-M devices, then optimize our solution to unleash the unparalleled power of Arm microcontrollers.
6:00 pm to 9:00 pm
For Summit Speakers, Panelists, Tutorial Instructors, Sponsors and Committee Members
7:00 am to 8:00 am
Registration and Breakfast
8:00 am to 8:15 am
Welcome and Opening Remarks
Session Chair: Evgeni GOUSEV, Senior Director, Qualcomm Research
8:15 am to 10:20 am
Session #1: Advancing the Frontier of Deep Learning Algorithms at the Edge – Part I
Session Moderator: Kurt KEUTZER, Full Professor, University of California, Berkeley
Hardware-Aware Neural Architecture Search and Compression for Efficient Deep Learning
Han CAI, PHD Student, MIT
Efficient deep learning computing requires algorithm and hardware co-design to enable specialization. However, the extra degree of freedom creates a much larger design space. Human engineers can hardly exhaust the design space by heuristics. We propose AutoML techniques to architect efficient neural networks. We investigate automatically designing small and fast models (ProxylessNAS), auto channel pruning (AMC), and auto mixed-precision quantization (HAQ). We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200× than previous work, so that we can afford to design specialized neural network models for different hardware platforms. Finally, we accelerate computation-intensive AI applications including TSM for efficient video recognition and PVCNN for efficient 3D recognition on point clouds. Bio: Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han’s research focuses on efficient deep learning computing. He proposed “Deep Compression” and “ EIE Accelerator” that impacted the industry. His work received the best paper award in ICLR’16 and FPGA’17. He was the co-founder and chief scientist of DeePhi Tech which was acquired by Xilinx
Resource Efficient ML in a Few KBs of RAM
Prateek JAIN, Sr. Principal Researcher, Microsoft
Several critical applications require ML inference on resource-constrained devices, especially in the domain of Internet of Things like smartcity, smarthouse etc. Furthermore, many of these problems reduce to time-series classification. Unfortunately, existing techniques for time-series classification like recurrent neural networks are very difficult to deploy on the tiny devices due to computation and memory bottleneck. In this talk, we will discuss two new methods FastGRNN and SRNN that can enable time-series inference on devices as small as Arduino Uno that have 2KB of RAM. Our methods can provide as much as 70x speed-up and compression over state-of-the-art methods like LSTM, GRU, while also providing strong theoretical guarantees.
TinyML Audio Algorithms
Shih-Chii LIU, Professor, Institute of Neuroinformatics, University of Zurich
Development of audio TinyML algorithms and their implementation onto embedded hardware have enabled pre-ASIC study of the power-latency tradeoff of deep network architectures and feature representations. This talk presents algorithmic studies of TinyML deep networks on audio tasks including keyword spotting and speaker verification. We will compare the use of input features generated from the asynchronous events of a spiking audio sensor and input features generated from sampled audio. We will compare the throughput and energy efficiency of the FPGA implementations of recurrent neural network architectures used for different TinyML audio tasks. We will also discuss the system cost projection of implementing these networks onto an ASIC recurrent network accelerator.
MobileNets on the Edge
Andrew HOWARD, Staff Software Engineer, Google AI
MobileNets are used widely across industry and academia for mobile and embedded vision applications. In this talk I will present the background and technical details which lead to the design choices for MobileNet models including what makes them extremely efficient and well matched to the mobile use case. I will then present the newest MobileNetV3 models combining neural architecture search as well as new network design elements and their application to classification, object detection and semantic segmentation. I will conclude with results for adapting MobileNets to microcontrollers for IoT applications on the edge.
Efficient Deep Learning on the Edge
Bichen WU, Research Scientist, Facebook
10:20 am to 11:00 am
11:00 am to 11:15 am
11:15 am to 12:00 pm
How to Build a tinyML Company
Moderator: Chris ROWEN, VP of Engineering, Cisco
Rajeev MADHAVAN, Founder and General Partner, Clear Ventures
Mike PINELIS, President and CEO, Microtech Ventures, Inc.
Vidya RAMAN, Early-stage Investor, Enterprise Infrastructure and Cybersecurity software
Albert WANG, Investment Director, Qualcomm Ventures & AI Fund
When TinyML methods and use-cases push on power, they inherently break new ground on cost, volume and ubiquitousness of smart systems. Innovations that democratize and proliferate intelligent behavior are consistent hallmarks of disruptive technology. We’ve invited a remarkable group of experienced technology investors to discuss the opportunities and pitfalls of new ventures building new products and business models around TinyML.
12:00 pm to 12:30 pm
Day 1 Poster Pitches
12:30 pm to 1:30 pm
2:15 pm to 4:00 pm
Session 2: tinyML Systems and Applications
Session Chair: Hoi-Jun YOO, Professor, KAIST
Next Generation Machine Learning for Mobile and Embedded Platforms
Sang WON LEE, Co-founder and CEO, Qeexo
Modern day machine learning systems typically involve development of large networks/models that require deployment of huge cloud servers for the compute capacity which have big privacy implications and result in high costs. Small micro-controller powered devices are being overlooked for ML applications as achieving high model accuracies within the memory and latency constraints is a challenge. With the right set of tools and algorithms, this need not be the case. The technology presented here discusses ways to leverage the use of these devices that occupy our world in the billions and make them intelligent with light-weight machine learning. A platform has been developed that enables development of models optimized for these low power devices. This talk will discuss some exemplary commercial uses of this technology and the processes and suite of tools that are used to deliver new interactive experiences to hundreds of millions of end users.
Some Micro Robots that Need ML
Kris PISTER, Professor, University of California, Berkeley
All of the components necessary for making swarms of autonomous micro robots, or silicon insects, now exist. Like their biological inspiration, they will be able to scavenge and store energy, lift many times their own weight, sense and interact with their environment, and collaborate to achieve goals beyond the abilities of an individual. Two characteristics distinguish them from their biological cousins: communication, and intelligence. Silicon insects will use RF mesh networks which will enable them to communicate over distances and at bit rates orders of magnitude higher than real insects, and indeed to reach back to cloud-based resources unavailable to their chitinous friends. But their native intelligence will pale in comparison to the lowliest ant or laziest bee. Here we have an opportunity for TinyML.
Voice Separation With tinyML on the Edge
Niels PONTOPPIDAN, Research Manager, Augmented Hearing Science at Eriksholm Research Center
With the recent advances in many areas of tiny ML several use cases where tiny ML is an absolute requirement has emerged. Hearing devices is an area where tiny ML holds the potential to radically transform the functionality of a 1 mW always on device. People with hearing problems can benefit from hearing device processing that separates competing voices into individual channels followed by resynthesizing of the auditory scene with spatial augmentation. The first successful segregation enhancement of competing voices required deep neural networks to achieve enough separation for the spatial augmentation to enhance segregation. It is furthermore a requirement that the latency of processing is below 20 ms – preferably less – and thus the processing must take place at the ear level without uplink and downlink latencies. Thus, for voice separation to work on the ears of people with hearing problems tiny ML is a necessity.
Perception Needs for Extended Reality Headsets
Ashwin SWAMINATHAN, Senior Director, Perception at Magic Leap
This talk presents the importance of Computer Vision and Deep learning techniques for Spatial computing and in specific for an Extended reality headset to be an effective spatial computing platform. The four fundamental perception modalities are introduced: head pose tracking, world reconstruction, eye tracking and hand tracking; emphasizing on the two main general themes: Understanding the world (spatial localization, environment mapping) and Understanding user’s intent (eye, gaze and hands). This talk will provide a deep dive into the main modalities along with key challenges, compute needs and open problems.
4:00 pm to 5:00 pm
Poster and Demo Presentations/Networking
5:15 pm to 6:15 pm
6:30 pm to 8:00 pm
7:00 am to 8:00 am
8:00 am to 8:30 am
8:30 am to 10:15 am
Session 3: tinyML Hardware
Session Chair: Edith BEIGNÉ, Silicon Research Director, Facebook
Thinking Big with Tiny ML: Low Power High Performance DNN Accelerators for Mobile and IoT Applications
Hoi-Jun YOO, Professor, KAIST
The artificial intelligence (AI) revolution is being widely spread even to the IoT with the help of 5G wireless communication. Compared to the Cloud-based or Edge-based AI applications, Internet-ofThings (IoT) applications require more autonomous, adaptive, and cooperative operations with extremely limited power, computing and memory resources without stable communication channels. AI, especially deep neural network (DNN), is the key technology to support such autonomy and adaptivity of the IoT machines in an unpredictable environment with limited available information. The IoT machines should contain not only inference but also training capabilities to adapt to environmental changes based on their experiences. Therefore, software and hardware co-optimization for DNN training is necessary for low-power and high-speed accelerators, in the same way it brought a dramatic increase in the performance of DNN inference accelerators. In addition, deep reinforcement learning (DRL) accelerators will be an essential part of the tide, showing a lot of benefits at making continuous decisions in an unknown environment, where labeled data is difficult to acquire.
Energy-efficient On-device Processing for Next-generation Endpoint ML
Tomas EDSÖ, Senior Principal Design Engineer, Arm
This talk will show how Arm IP, including future processors based on Arm Helium technology, along with comprehensive software libraries and a widely supported ecosystem, provides new performance levels for system designers creating ML solutions of tomorrow. Combining signal processing and neural network acceleration brings multi-fold efficiency gains compared to existing microcontroller systems today. Coupled with enabling software, Arm IP enables mass deployment of next-generation AI platforms for IoT everywhere, within reach of every developer. We will discuss real-world examples and benchmarks to demonstrate the scalable performance of Arm system-on-chip (SoC) technologies to help you choose the right IP for your application. Attendees will walk away with an understanding of how to accelerate software development on a single Arm Cortex-M toolchain with optimized software libraries and the choice offered by the Arm ecosystem. All resulting in much more efficient, more secure solutions for AI in IoT that are easier to develop, deploy and maintain.
A ½ mWatt, 128-MAC Sparsity Aware Neural Processing Unit for Classification and Semantic Segmentation
Joseph HASSOUN, Sr. Director Neural Processor Architecture, Samsung Semiconductor
This Presentation describes an energy-efficient neural processing unit for battery-operated devices. The architecture utilizing threefold of parallelisms for computing Convolutional and Fully Connected layers to achieve object detection for the at-the far-edge-computation. In this presentation, we will present the underlying technology of co-designing neural net models and neural net accelerators to achieve the right tradeoff for the highest energy efficiency. This 128-MAC structure is capable of running a low-precision modified 2-bit Group-Net Network that can perform Image classification and accurate semantic segmentation of 23 frames per seconds while operating at one-half of one mWatt.
10:15 am to 10:45 am
10:45 am to 11:30 am
The Role of NVM, Emerging Memories and In-Memory Compute for Edge A1
Boris MURMANN, Professor of Electrical Engineering, Stanford University
Geoffrey BURR, Distinguished Research Staff Member, IBM Almaden Research Center
Manar EL-CHAMMAS, VP of Engineering, Omni Design Technologies, Inc.
Jae-sun SEO, Associate Professor, Arizona State University
Joseph WANG, Sr Director of Engineering, Qualcomm
The design of energy-efficient hardware for edge AI revolves primarily around data movement and memory access. Especially for small models, it is not only feasible to have all memory on chip, but also reduce data movement further using in-memory compute cores. The community is currently investigating a variety of options in this space using compute-SRAM, Flash, RRAM, MRAM, phase change, and other emerging memory technologies. Will these ideas lead to a breakthrough in hardware efficiency? Will they deliver the programmability needed for practical deployment and real-world applications? A panel of experts from industry, academia and research institutes will debate these questions and spell out their future predictions for our audience.
11:30 am to 12:00 pm
Day 2 Poster Pitches
12:00 pm to 1:00 pm
1:00 pm to 2:15 pm
Poster and Demo Presentations/Networking
2:15 pm to 4:00 pm
Session 4: Advancing the Frontier of Deep Learning Algorithms at the Edge – Part II
Session Chair: Ofer DEKEL, Partner Research Area Manager, Microsoft Research
Deep Model Compression and Acceleration Towards On-Sensor AI
Changkyu CHOI, Senior Vice President, Samsung Advanced Institute of Technology
ML Inference on embedded hardware has been attracting attention for many applications. Limitations on computing power, memory usage, and power consumption are major bottleneck to deploy deep neural networks on resource-constrained devices. Existing techniques to mitigate the limitations often involve power-latency tradeoff and yield degraded accuracy. This talk presents algorithm studies of deep model compression and computational acceleration on facial recognition tasks including face detection and anti-spoofing. A trainable quantizer is proposed to learn intervals to quantize activations and weights. This quantization-interval-learning allows the quantized networks to maintain the accuracy of the full precision (32-bit) networks with bit-width of activations and weights as low as 4-bit. The effectiveness of our trainable quantizer on ImageNet dataset will be demonstrated with various network architectures such as AlexNet, VGG-16, ResNet-50, Inception-V3, and so on. Furthermore, this talk proposes ‘On-sensor AI’ computing architecture while exploring the advantages of 4-bit (or less) operations of activations and weights. 4-bit MAC operation can be implemented by utilizing AND and COUNT operations only. The speed of computation of AND/COUNT implementation is similar to the one of Multiplier/Adder implementation. However, it becomes way faster if the bit-width goes down to 3-bit or less. Moreover, replacing the Multiplier & Adder with logical AND & COUNT operators can reduce the number of transistor counts by over 40 times. This ‘On-sensor AI’ is challenging but promising. This will pave the way to deploy highly accurate tiny ML models to things in everyday life.
Making optimizing and deploying Tiny Machine Learning on STM32 Microcontrollers easy
Matthieu DURERIN, AI Applications Manager, STMicroelectronics
Providing machine learning based algorithms that are able to run efficiently on Tiny devices like microcontrollers is already a challenge. Making it easy and affordable for data scientists and embedded software experts is a key step for market adoption of this technology. STMicroelectronics has developed tool STM32Cube.AI so that customers have an easy path to enable neural networks on any device across the broad STM32 microcontroller portfolio. The tool maps and runs pre-trained Neural Networks on STM32 microcontrollers and supports a wide range of popular deep-learning training tools like Keras, Tensor Flow Lite, Caffe… and ONNX format. It also takes advantage of quantization by supporting post-training quantization and quantized-aware learning models. The presentation will include the latest features of the STM32Cube.AI tool. To complement STM32Cube.AI, STMicroelectronics has developed software packages for quick and easy prototyping with end-to-end Audio, Motion and Vision examples. Audio and Motion use cases, such as human activity recognition and audio-scene classification, are running on STM32L4 ultra-low-power microcontrollers. Computer Vision examples, like food classification, are running on STM32H7 microcontrollers. Examples cover a wide range of options like quantized or float models and different memory configurations.
Optimizing Inference Efficiency for Tiny DNNs
Harris TEAGUE, Principal Engineer , Qualcomm, Inc.
In this talk, I will explore some of the ways that we are working on improving model inference efficiency for tiny devices – where power, area, memory, compute resources are limited. I will present results for a few of these: compute scheduling optimization, model compression, quantized inference, and inmemory computing. Finally, I will discuss our plans for next research steps to further understand and develop the technology.
tinyMLPerf: Benchmarking Ultra-low Power Machine Learning Systems
Vijay JANAPA REDDI, Associate Professor, Harvard University
Tiny machine learning (ML) is poised to drive enormous growth within the IoT hardware and software industry. Measuring the performance of these rapidly proliferating systems, and comparing them in a meaningful way presents a considerable challenge; the complexity and dynamicity of the field obscure the measurement of progress and make embedded ML application and system design and deployment intractable. To foster more systematic development, while enabling innovation, a fair, replicable, and robust method of evaluating tinyML systems is required. A reliable and widely accepted tinyML benchmark is needed. To fulfill this need, tinyMLPerf is a community-driven effort to extend the scope of the existing MLPerf benchmark suite (mlperf.org) to include tinyML systems. With the broad support of over 75 member organizations, the tinyMLPerf group has begun the process of creating a benchmarking suite for tinyML systems. The talk presents the goals, objectives, and lessons learned (thus far), and welcomes others to join and contribute to tinyMLPerf.
Using ML for ML to Span the Gamut of TinyML Hardware
Jason KNIGHT, Co-founder and CPO, OctoML
4:00 pm to 5:00 pm
Poster and Demo Presentations/Networking
Schedule subject to change without notice.
University of California, Berkeley
IBM Almaden Research Center
Omni Design Technologies, Inc.
Samsung Advanced Institute of Technology
Sang WON LEE
Institute of Neuroinformatics, University of Zurich
Microtech Ventures, Inc.
University of California, Berkeley
Augmented Hearing Science at Eriksholm Research Center
Robin Paul PRAKASH
Enterprise Infrastructure and Cybersecurity software
Vijay JANAPA REDDI
Arizona State University
Perception at Magic Leap
Qualcomm Ventures & AI Fund