About
The tinyML Summit will be held virtually the week of March 22, 2021. We are in the process of re-envisioning our flagship event as a highly interactive online experience.
In conjunction with the Summit, we are also pleased to announce that we have added a new event for 2021: the tinyML Research Symposium.
The tinyML Summit is the premier annual gatherings of senior level technical experts and decision makers representing fast growing global tinyML community. This diverse ecosystem is composed of professionals from industry, academia, start-ups, and government labs worldwide working on leading-edge ultra-low power machine learning technologies for end-to-end (hardware –system –software applications full stack) solutions.
News
tinyML Awards 2021 Finalists
The tinyML Summit committee is pleased to announce the Finalists for the Best Product of the Year, and for the Best Innovation of the Year! Please join us on Monday March 22, 2021 at noon Pacific time to hear presentations from the finalists. (Please see the schedule below for details.)
tinyML Summit 2021 Breakout Sessions
New for 2021, tinyML is developing a series of breakout sessions. Breakouts are focused on bringing focused topics directly to the audience that needs them. They will be practical and interactive discussions to foster better understanding of design and application issues, best practices, tools, and funding opportunities to accelerate the deploymen
Breaking News On Disruptive Products And Tools
An exciting new session at the tinyML Summit will allow a small number of time slots to companies / experts / academia to share the very latest substantial and disruptive developments and upcoming products of significance in the field of tiny machine learning. Submit your Breaking News On Disruptive Products And Tools today!
Schedule
Pacific Daylight Time / UTC-7
8:00 am to 8:15 am
Open / Welcome
Evgeni GOUSEV, Senior Director, Qualcomm Research
Marian VERHELST, Associate Professor, KU Leuven
8:15 am to 9:45 am
Tutorial: Training a Magic Wand
Pete WARDEN, Technical Lead, Google
This tutorial will show how to gather data, train, and deploy an IMU-based model for recognizing gestures on an Arduino Nano BLE Sense 33. It will use the Arduino IDE and Colab scripts to develop the model and will explain the feature generation needed to go from raw accelerometer and gyroscope data to input suitable for a neural network. Using TensorFlow Lite Micro and Arm’s CMSIS-NN library, you will learn how to create a practical application from scratch. It is recommended that you purchase the Arduino TinyML Kit to be able to follow along virtually.
Tutorial: Image sensors for low power applications
Song CHEN, Research Scientist, Facebook Reality Labs Research
Image sensors are the front end of many computer-vision based input modalities. These human-machine input modalities usually need to run on a mobile platform which has stringent power requirement. This tutorial will cover both low power image sensor design from a designer’s perspective and some useful practices to save sensor power from a user’s perspective especially in ML applications. We will start by laying out basics including the operation principle of pixels, readout chain and other common blocks in an image sensor. Then, the trade-off between power consumption and general sensor performance will be discussed. Following the discussion, the effectiveness of power reduction techniques like subsampling, low frame rate, etc. and the impact on following ML processing stages will be evaluated with examples. Finally, an ultra-low power global-shutter digital pixel sensor developed at Facebook Reality Labs Research will be introduced.
9:45 am to 10:00 am
Break
10:00 am to 11:30 am
Tutorial: Advanced network quantization and compression through the AI Model Efficiency Toolkit (AIMET)
Abhijit KHOBARE, Director of Software Engineering, Qualcomm Technologies, Inc. (QTI)
Chirag PATEL, Principal Engr./Mgr, Qualcomm Technologies, Inc. (QTI)
AI is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within a tight power and thermal budget. Quantization and compression help address these issues. In this tutorial, we’ll discuss:
- The existing quantization and compression challenges
- Our research in novel quantization and compression techniques to overcome these challenges
- How developers and researchers can implement these techniques through the AI Model Efficiency Toolkit
Tutorial: Build Industrial-Grade tinyML applications with Edge Impulse!
Jan JONGBOOM, CTO, Edge Impulse
Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse
In this free live workshop, you will build a full tinyML application, end-to-end, using the latest best practices in embedded machine learning. You will learn how to collect a dataset, design and train a tiny (but accurate) model, evaluate its performance, optimize it for embedded use, and integrate it into a real embedded application running on a genuine MCU.
Anyone registered for the Summit may join. To actually build an application you may either purchase the Thunderboard kit or if you’d like you can simply observe. If you order the kit, in addition you may download the Open Source Firmware for the Thunderboard companion development board which is hosted on GitHub.
12:00 pm to 1:30 pm
tinyML Awards 2021
Awards Finalist Presentations
Moderator: Wei XIONG, ,
François de ROCHEBOUËT, Founder CTO, Cartesiam
Jeff HENCKELS, Director, Product Management & Business Development, Qualcomm
Joseph HASSOUN, Sr. Director Neural Processor Architecture, Samsung Semiconductor
Sean MCGREGOR, Member of Technical Staff, Syntiant
Brandom RUMBERG, Founder and CTO, Aspinity
Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse
Best Product of the Year Finalists
(in alphabetical order)
- Cartesiam
- Qualcomm Always-On Vision
- Samsung Exynos 2100 NPU
- Syntiant NDP120
Best Innovation of the Year Finalists
(in alphabetical order)
- Aspinity AnalogML
- Edge Impulse EON Compiler
Pacific Daylight Time / UTC-7
8:00 am to 8:15 am
Open / Welcome
Evgeni GOUSEV, Senior Director, Qualcomm Research
Marian VERHELST, Associate Professor, KU Leuven
8:15 am to 9:00 am
Keynote: Putting AI on a Diet: TinyML and Efficient Deep Learning
Song HAN, Assistant Professor, MIT EECS
Abstract (English)
Deep learning is computation-hungry and data-hungry. We aim to improve the computation efficiency and data efficiency of deep learning. First, I’ll present MCUNet[1] that brings deep learning to IoT devices. MCUNet is a framework that jointly designs the efficient neural architecture (TinyNAS) and the light-weight inference engine (TinyEngine), enabling ImageNet-scale inference on IoT devices that have only 1MB of Flash. Next I will talk about TinyTL[2] that enables on-device transfer learning, reducing the memory footprint by 7-13x. Finally, I will describe Differentiable Augmentation[3] that enables data-efficient GAN training, generating photo-realistic images using only 100 images, which used to require tens of thousand of images will be discribed. It is hopeful that such TinyML techniques can make AI greener, faster, and more sustainable.
[1] MCUNet: Tiny Deep Learning on IoT Devices, NeurIPS’20, spotlight.
[2] TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning, NeurIPS’20
[3] Differentiable Augmentation for Data-Efficient GAN Training, NeurIPS’20
9:00 am to 9:45 am
Keynote: Many shades of acceleration – an Open TinyML Platform Perspective
Luca BENINI, Chair of digital Circuits and systems | Full Professor, ETHZ | University of Bologna
Abstract (English)
The next wave of “Extreme Edge AI” pushes signal processing and machine learning aggressively towards sensors and actuators, with sub mW (TinyML) power budgets, while at the same time raising the bar in terms of accuracy and flexibility. To succeed in this balancing act, we need principled ways to walk the line between general-purpose and highly specialized architectures. In this talk I will detail on how to walk the line, drawing from the 40+ chips tape-out experience of the open PULP (parallel ultra-low power) platform, based on RISC-V processors coupled with domain-specific acceleration engines.
9:45 am to 10:00 am
Break
10:00 am to 10:15 am
Today’s Breakout Pitches
10:15 am to 10:45 am
tiny Talks
Compute-in-Memory Hardware Accelerator for Always-On TinyML
Sameer WADHWA, Senior Director, Qualcomm
Abstract (English)
Always-ON tiny-ML use-cases rely on an efficient hardware accelerator to maximize battery run-time while enabling increasingly complex models.
The energy efficiency limitations of Von-Neumann architectures while executing memory bandwidth intensive compute use-cases presented by DNN are well understood. It has also been shown that a large subset of DNN models can function with little or no accuracy degradation down to 8-bit or even lower quantization levels for activations and weights.
Compute-In-Memory (CIM) is an active research area in academia and industry to achieve a significant compute energy efficiency improvement by reducing the memory bandwidth requirements when executing DNN models and taking advantage of analog compute to improve MAC computation efficiency.
This work details a CIM based stand-alone DNN hardware accelerator chip that is particularly well-suited to executing always-ON tiny-ML models. It supports convolution, fully-connected, pool, Relu, Sigmoid, Tanh layers with 1/2/4/8-bit quantized activations and weights.
Multiple CIM cores on the chip can operate in parallel to support always-ON voice keyword detection and human-detect computer-vision models while consuming very low power.
The chip comes with a tool flow to support quantizing, training, compiling off-the-shelf models to efficiently map them on the hardware. Both voice UI and CV use cases are used to demonstrate the chip’s low power performance.
Supporting Tensorflow Lite MCU in tiny low power FPGAs
Hoon CHOI, Fellow, Lattice Semiconductor
Abstract (English)
The arena of cost optimized, high performance Edge accelerators is growing increasingly competitive with a variety of architectures to choose from when implementing an AI capable system. As new generation of Edge applications emerges, designers are increasingly pressed to develop solutions that combine low power and low latency, they require easy to use and flexible accelerators.
Lattice’s FPGAs are uniquely positioned to address the rapidly changing world for Edge devices. This class of FPGAs possess the parallel processing capabilities inherent in FPGAs to accelerate neural network performance and are HW programmable to keep up with the fast pace of changing ML algorithms. They are designed with higher on chip memory, optimized DSP blocks and compute resources distributed through the fabric for workload acceleration resulting in a low power system implementation.
To provide a software programmable solution that is easy to use, support for TF Lite with soft RISC-V was implemented on the FPGA fabric. Creating best of both world, programmable device with flexible acceleration blocks running on HW to enable developers with or without FPGA expertise to build their systems more quickly. Comparing TF Light implementation on ARM M4 based CPU vs. FPGA of comparable size/cost, the FPGA runs 2~10x faster than the MCU for a comparable power consumption.
In this presentation, we cover the details of the accelerators we designed, the limitations we faced that hindered further optimizations in accelerators, and possible solutions to the limitations.
10:45 am to 12:00 pm
Panel Discussion
Opportunities at the Edge: Venture and tinyML
Moderator: Kurt KEUTZER, Full Professor, University of California, Berkeley
Chris ROWEN, VP of Engineering, Cisco
Bill COUGHRAN, Founders' Coach and Partner, Sequoia Capital
Samir KUMAR, Managing Director, M12
Luis CEZE, Co-founder and CEO, OctoML
Eileen TANGHAL, Senior Partner, In-Q-Tel
Pushing machine learning into ultra-low-power applications at the edge isn’t just an academically compelling idea, it is a potentially disruptive shift in mass-market technology. In this panel we’ve gathered four distinguished venture capitalists to look at tinyML opportunities through an entrepreneurial lens. In particular we have asked them to consider:
- What makes you interested in investment opportunities for machine learning at the edge?
- What is your general advice to tech entrepreneurs: build a horizontal platform for broad application or target a particular vertical?
- What are some of the particular near-term opportunities for venture investment at the edge that you find especially exciting?
- How might VC’s value a tinyML-startup – how much is it driven by the target market, the team, the technology, or the data?
- And the $6.4B question: what is the future killer app at the edge?
12:00 pm to 1:00 pm
tinyTalks Hardware Optimization
Performing Inference on Binarized Neural Networks with xcore.ai
Laszlo KINDRAT, Senior Technologist, XMOS
Andrew STANFORD-JASON, Engineer, XMOS
Adam HILLIER, Deep Learning Scientist, Plumerai
Abstract (English)
Ultra-low bitwidth neural networks have been a hot topic in tinyML community, both in terms of novel hardware accelerators, as well as software solutions for training and deployment. In particular, binarized neural networks (BNNs) show large potential due to their simple hardware requirements. xcore.ai (a fast, economical crossover processor from XMOS), has a vector unit with specialized instructions for performing inference on BNNs, which to the best of our knowledge makes it the first MCU class chip with a BNN accelerator in mass production. In this talk we describe these instructions in detail, and how they enable a theoretical maximum of 286GOps/s when executing binarized neural networks. Secondly, we give an overview of our machine learning model deployment toolchain that seamlessly integrates with Larq, a popular open-source framework for training binarized neural networks. Finally, we present performance benchmarks on image classification models with various combinations of binarized and 8bit quantized layers.
Ultra-low Power and Scalable Compute-In-Memory AI Accelerator for Next Generation Edge Inference
Behdad YOUSSEFI, Founder and CEO, Areanna AI
Abstract (English)
Edge AI hardware accelerators are either deployed on Edge servers where sophisticated AI models run on a power budget between 1-10 Watts or on Edge devices where simple AI models run at milliwatts of power. But implementing more sophisticated AI models on Edge devices requires further development of ultra-low power architectures.
Research has shown that power consumption is dominated by data communication between memory and processor. To minimize data movement, the Compute-In-Memory (CIM) architecture has been explored by companies/academics. CIM is inherently a mixed signal architecture and hence requires data converters to interface between layers of network. However, data converters have proven to be the Achilles’ heel of this architecture as they take up to 98% of overall chip area and consume more than 85% of overall power consumption, defeating the whole purpose of CIM architecture. CIM also suffers from analog nonidealities which can degrade AI performance. Furthermore, the extra processing steps needed to fabricate the memory array in CIM limits the scalability of this architecture.
Areanna’s architecture addresses these issues using our proprietary Compute-and-Quantize-In-Memory (CQIM) architecture where SRAM bit-cells are repurposed to construct data converters, improving power/area efficiency by over an order of magnitude. Using logic gates as its building blocks, CQIM is inherently a digital architecture and scales well with the latest process nodes. High power efficiency and scalability of this architecture brings deployment of sophisticated real-time AI models with mW power budget within reach. A CQIM prototype is implemented and taped out in standard CMOS process.
CUTIE: Multi-PetaOP/s/W Ternary DNN inference Engine for TinyML
Moritz SCHERER, PhD Student, ETH Zürich
Abstract (English)
With the surge in demand for deeply embedded deep learning on increasingly power-constrained, devices, neural network inference engines must continue to improve in terms of energy efficiency. In recent years especially, accelerators for networks with binary and ternary weights and activations have been addressing this demand, achieving energy efficiencies that are orders of magnitude higher than byte precision accelerators. We address the main bottlenecks for energy efficiency in binary and ternary neural networks accelerators and present CUTIE, the Completely Unrolled Ternary Inference Engine.
The design of CUTIE is focused on minimizing non-computational energy and switching activity so that dynamic power spent on storing intermediate results is minimized. We achieve this by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) exploiting an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity.
We demonstrate that our architecture achieves better-than-binary inference accuracy at dramatically higher energy efficiency. We present power simulation data showing an average energy efficiency of 2.1 POp/s/W, while achieving 88% inference accuracy on CIFAR-10 at an energy cost of 520 nJ, outperforming the state-of-the-art, including compute-in-memory (CIM) approaches, by a factor of 4.8x.
Hardware aware Dynamic Inference Technology
Urmish THAKKER, Principal Engineer , SambaNova Systems Inc
Abstract (English)
There has been a recent surge in research in dynamic inference technologies to reduce the cost of inference without sacrificing the accuracy of the model. These models are based on the assumption that not all parts of the output feature map (OFM) are equally important for all inputs. The parts of the output feature maps that are deemed unimportant for a certain input can be skipped entirely or computed at a lower precision leading to reduced number of computation. This can enable faster inference of a large network leading to high accuracy. However, we show that the two popular methods that optimize different aspects of the OFM (channel and spatial) lead to sparse matrix multiplication during inference on a CPU which can lead to poor run-time characteristics in-spite of reduced number of MAC operations. We show a way to make these techniques SIMD Vector Length aware leading to block sparse matrices which can run more efficiently on a hardware with vector compute units. Our technique allows these models to create blocks of vector length 2, 4 and 8 with minimal loss in accuracy beating traditional pruning methods by a large margin for image classification task.
Partner Sessions - Tools & Algorithms
These sessions will be an opportunity to hear from commercial companies in the tinyML ecosystem on market and technology trends they are addressing to enable the exponential growth of tinyML solutions. These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points.
Innovative and Convolutional-Friendly tinyML Architecture for Small-Silicon, Low-Power Devices
Moshe HAIUT, CTO Staff, DSP Group
Abstract (English)
The concept of Neural Networks (NNs) has evolved from the basic perceptron to a fully connected (FC) NN layer. FC layers are based on a simple math operation – a vector multiplied by a matrix. Most existing DSPs have embedded multiply-accumulate (MAC) logic to handle these tasks. However, FC layers require a large number of parameters, which challenges the tinyML solutions with their limited silicon footprint. In contrast, convolutional layers use fewer parameters, making them a more compelling approach for memory-constrained applications, such as tinyML edge-based solutions. However, 2D convolutional layers add complexity to the computation algorithm, especially when there are multiple channels and when padding, dilation, and stride operations are applied.
To address this problem, what’s needed is a way to reduce the number of cycles and power consumption required to compute 2D convolutional and ConvTranspose layers.
This tinyML talk will show how to do this using the nNetLite, an ultra-low-power programmable NN processor that was developed within DSP Group to solve many of the issues associated with edge processing. The discussion will focus on a certain part of the processor – the Address Generation Unit (AGU) – that was designed specifically to accelerate the computation of complicated convolutional layers in memory and power-constrained ICs. The AGU is also capable of merging convolution and consecutive MaxPooling layers into a single layer which further conserves valuable memory space in tinyML hardware.
Attendees will understand how to use approaches such as the convolutional-friendly AGU hardware architecture to minimize the total number of cycles in convolutional-heavy NNs to design ultra-low-power AI devices that consume only microwatts of power.
Tree Ensemble Model Compression for Embedded Machine Learning Applications
Leslie SCHRADIN, Principal Machine Learning Engineer, Qeexo, Co.
Abstract (English)
Embedded machine learning models need to have low memory footprint without compromising the classification performance. Tree-based ensemble models are very effective for sensor data machine learning. Depending on the application, they are often superior than neural-network-based models in terms of embedded metrics such as memory footprint, latency, and model performance, and often need less data to reach the same level of accuracy. In this webinar we will discuss generating tree-based ensemble models using well-known algorithms and then performing intelligent pruning and quantization particularly suitable for tinyML applications. Qeexo’s patent-pending algorithms first perform ensemble model compression by selecting the best candidate boosters; these boosters reduce the model size by almost 80% and still capture the classify-ability of full ensemble model. The compression is followed by 16-bit/8-bit quantization to further reduce the memory footprint. Using these techniques, Qeexo AutoML has compressed and quantized Gradient Boosting Machine (GBM), Random Forest (RF), Isolation Forest (IF), eXtreme Gradient Boosting (XGBoost), and Decision Trees (DT), making them much easier to fit into embedded targets. As a result, models generated by Qeexo AutoML have best-in-class latency and memory footprint without sacrificing performance.
A VM/Containerized Approach for Scaling TinyML Applications
Kartik THAKORE, Founder, HOTG
Abstract (English)
Although deep neural networks are typically computationally expensive to use, technological advances in both the design of hardware platforms and of neural network architectures, have made it possible to use powerful models on edge devices. To enable widespread adoption of edge based machine learning, we introduce a set of open-source tools that make it easy to deploy, update and monitor machine learning models on a wide variety of edge devices. Our tools bring the concept of containerization to the TinyML world. We propose to package ML and application logic as containers called Runes to deploy onto edge devices. The containerization allows us to target a fragmented Internet-of-Things (IoT) ecosystem by providing a common platform for Runes to run across devices.
tinyML is not tiny anymore
Mallik P. MOTURI, VP Product and Business Development, Syntiant
Abstract (English)
As always-on tinyML devices proliferate they are having an outsize impact on the world. From always-on voice in low cost phones, earphones, AR/VR glasses to always-on sensing of the environment in devices, machines, automobiles, we are seeing them in the “things” we are used to in our daily life at home, office, factories or outside. We will see more than 10 billion such TinyML devices being introduced into the market every year by 2023. TinyML is not tiny anymore.
Partner Sessions - Edge Applications
A real application of TinyML in Intelligent Building Sensors
Martin CROOME, Vice President Marketing, GreenWaves
Abstract (English)
In this session, we will look at a real-life example of using TinyML in smart building sensors. Up until now building sensors counting people have been expensive and difficult to install or have lacked accuracy. TinyML makes reliable, battery operated people counting possible with sensor battery lives of over five years in typical scenarios. This allows for inexpensive, easy to install products that bring a lot of value to facility managers. The applications enabled by sophisticated image and sound analysis in building sensors don’t just stop at counting. We will also look at some of the other use cases that are possible and how they are being enabled by GAP processors, now.
Leveraging sparsity to drive fast response times at the edge
Orlando MOREIRA, Fellow and Chief Architect, GrAI Matter Labs
Abstract (English)
Sparsity is the idea that changes in the real world don’t happen everywhere, or all at once. NeuronFlow is novel multi-core processor architecture that exploits all forms of sparsity to deliver a scalable dataflow processing engine for AI applications at the Edge. In this presentation, we will discuss
The importance of fast responses or low latency in Edge AI applications
Metrics for latency and how they map to the Edge AI application performance
How the unique sparsity-exploitation characteristics of NeuronFlow enable real-time live AI applications where fast response times are essential.
Gesture-controlled in-ear headphones, presentation and live demo
Johan MALM, AI Engineer, imagimob
Abstract (English)
Imagimob is a pioneer in tinyML with experience from 25+ tinyML customer projects, including projects with Scania, Husqvarna, Autoliv, Veoneer, Flir and many others. The first commercial product using an Imagimob tinyML application was commercially launched in 2018. This presentation is a case study where we are demonstrating how we are using Imagimob AI, our end-to-end toolchain, to develop an advanced audio application.
TinyML journey: from face detection demo to real-life commercial deployment
Elad BARAM, VP Products, Emza Visual Sense
Abstract (English)
This is to tell about our experience in driving TinyML from POC level to a design win, planned to be deployed in millions of Notebooks. This is probably the first widely deployed commercial consumer case study.
One of the main topics we intend to cover, beyond the application itself, is the gap between available demos and benchmark, to what it takes to accommodate real life use cases – addressing different distances of objects, robustness in terms of light conditions etc.
While TinyML holds the potential to be extremely successful, through its inherent advantage of using low cost MCUs, bridging the technology gap is what will convert the demos to real business.
Market Opportunities for Edge AI
Michael Azoff, Chief Analyst, Kisaco Research
Lee CARTER , Principal, Momenta Ventures
Abstract (English)
In this session we will explore market opportunities for edge ML with Kisaco’s chief analyst Michael Azoff, Momenta VC partner Lee Carter and co-founder and CEO of Edge Impulse, Zach Shelby. The speakers will explore growth opportunities in the market, what this will look like in 10 years’ time and what technologies look promising.
tinyML for Good — Conservation & Climate
Moderator: Kate KALLOT, Head of emerging Areas, NVIDIA
Thalia SPEAKER, Program Officer, WILDLABS
Christopher B. ROGERS, CEO, SensiML Corp
Abstract (English)
TinyML has the potential to have a big impact on climate change and nature conservation work. In this session led by Kate Kallot, Head of Emerging Areas at NVIDIA and with guest Talia Speaker, Program Officer from WILDLABS and the WWF, we will hear about leading applications for tinyML that are having a real impact, and how you can get involved to support solutions in this space.
1:00 pm to 1:30 pm
Partner Hangouts
The Partner Hangout sessions will be an opportunity to hear from commercial companies in the tiny machine learning ecosystem on market and technology trends they are addressing to enable the exponential growth of tiny machine learning solutions. These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points. Expect to hear how problems and gaps are being solved and what still needs to be done and why.
Please see the schedule in the virtual event platform to see each day’s room assignments.
Pacific DaylightTime / UTC-7
8:00 am to 8:15 am
Opening and Award Announcements
8:15 am to 9:00 am
Keynote: miliJoules for 1000 Inferences: Machine Learning Systems “on the Cheap”
Diana MARCULESCU, Professor and Department Chair, The University of Texas at Austin
Abstract (English)
Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. While the holy grail for judging the quality of a ML model has largely been accuracy and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk uncovers the need for designing efficient convolutional neural networks (CNNs) for deep learning mobile applications that operate under stringent energy and latency constraints. We show that, while CNN model quantization and pruning are effective tools in bringing down the model size and resulting energy cost by up to 1000x while maintaining baseline accuracy, the interplay between bitwidth, channel count, and CNN memory footprint uncovers a non-trivial trade-off. Surprisingly, our results show that when the channel count is allowed to change, a single weight bitwidth can be sufficient for model compression, which greatly reduces the software and hardware optimization costs for CNN-based ML systems.
9:00 am to 9:45 am
Keynote: Adaptive Neural Networks for Agile TinyML
Sek CHAI, Co-founder and CTO, Latent AI
Abstract (English)
We present a new way to run your neural network that dynamically minimizes the working footprint for both memory and compute horsepower. Such a formulation requires retraining the network in a way that offers runtime flexibility during inference. Ultimately, the dynamic neural network is highly agile and can self-regulate to minimize computational needs.
9:45 am to 10:00 am
Break
10:00 am to 10:15 am
Today’s Breakout Pitches
10:15 am to 10:45 am
tiny Talks
Using Neural Architecture Search for Speech Recognition on the Edge
Vikrant TOMAR, Founder and CTO, Fluent.ai
Abstract (English)
Despite recent developments in machine learning, finding an optimal solution for a given task remains a challenging and time-consuming task often requiring significant efforts in designing and tuning the neural architectures by an expert instead. This problem is more pronounced for TinyML solutions, where, due to limited computational resources, specific models are needed for a given task. To this end, we present a two-step solution. The first step employs GNASIL[2], a novel automated machine learning solution, for discovering an optimal neural architecture within a predefined limit of device specifications in FLOPS. The second step compresses the discovered architecture and make it even smaller.
GNASIL trains a soft actor-critic [2] reinforcement learning agent that expedites the discovery process by extending learning with planning options based upon past experiences and imitation learning through available expert-designed architectures on similar tasks. The architectures discovered by GNASIL are then compressed with automatic model compression (AMC)[3]. AMC uses DDPG [4] to learn the ratio of pruning for each layer. Reward is a function of accuracy and FLOPS. Optimal pruning is achieved in a way that has minimal effect on accuracy of the model despite often reducing the overall model footprint.
Our experiments on a series of on-device speech recognition tasks demonstrate that GNASIL can design neural models with competitive performance in terms of both discovery speed and the accuracy of the discovered architectures, all within the predefined FLOPS restrictions. Further, AMC is able to reduce the size of the model up to 40% without compromising accuracy.
References:
[1] Farzaneh S Fard, Arash Rad, Vikrant Singh Tomar. Nasil: Neural Architecture Search with Imitation Learning, ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[2] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1856–1865, 2018.
[3] Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Pro- ceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018.
[4] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
Person Detection under Extreme Constraints: Lessons from the Field
Koen HELWEGEN, Deep Learning Scientist, Plumerai
Abstract (English)
We present various computer vision applications on microcontrollers that are enabled by Binarized Neural Networks (BNNs). This includes state-of-the-art models on the Arm Cortex-M4 architecture for the Visual Wake Words benchmark task (84.5% accuracy with under 170ms latency on a STM32F407VG) and person detection with bounding boxes. Moving beyond artificial benchmarks, we demonstrate the performance in real-world settings by deploying on an off-the-shelf Arm Cortex-M4 microcontroller with an inexpensive, low-power OV2680 camera. These applications are built using our integrated stack for training and inference of BNNs as well as through the collection, labeling and monitoring of custom designed datasets for TinyML. This combination results in highly-accurate and highly-efficient BNN models for cheap, low-power microcontrollers. We discuss practical tips for developing demanding computer vision applications on microcontrollers and highlight some of the lessons we learnt while developing BNNs for the real-world, such as our emphasis on high-quality, richly annotated data and powerful, hardware-based neural architecture search.
10:45 am to 12:00 pm
Panel Discussion
tinyML inference SW – where do we go from here?
Moderator: Ian BRATT, Distinguished Engineer & Fellow, Arm
Moderator: Ofer DEKEL, Partner Research Area Manager, Microsoft Research
Chris LATTNER, President, Engineering and Product, SiFive
Tianqi CHEN, CTO, OctoML
Raziel ALVAREZ, Technical Lead, PyTorch, Facebook
Pete WARDEN, Technical Lead, Google
Join a collection of industry experts as we discuss the current state and potential future of tinyML inference SW. What is missing today, what new technologies will impact tinyML inference SW, and how do we go forward as a community?
12:00 pm to 1:00 pm
tinyTalks Algorithms and Tools
Session Moderator: Joseph HASSOUN, Sr. Director Neural Processor Architecture, Samsung Semiconductor
Neutrino: A BlackBox Framework for Constrained Deep Learning Model Optimization
Davis SAWYER, Co-Founder & Chief Product Officer, Deeplite
Abstract (English)
Designing modern deep learning-based solutions requires deeper models with a greater number of layers. While a larger, deeper model can provide competitive accuracy, it creates several logistical challenges and unreasonable resource requirements during development and deployment. This has been one of the key reasons for deep learning models not being excessively used in various production environments, especially in tinyML devices. There is an immediate requirement for optimizing and compressing these deep learning models to enable on-device intelligence. In this research, we introduce a black-box framework, Neutrino- for production-ready optimization of deep learning models. The framework provides an easy mechanism for users to provide constraints such as a tolerable drop in accuracy or target size of the optimized models to guide the optimization process. The framework is easy to include in an existing production pipeline and is available as a Python Package or Docker image, supporting PyTorch and Tensorflow libraries. The optimization performance of the framework is shown across multiple benchmark datasets and popular deep learning models, providing a 3-30x reduction in model size (pre-quantization). Furthermore, we will share how the framework is currently used in production and results from several tinyML applications like visual wake words are summarized.
Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware
Chris ELIASMITH, Co-CEO, Applied Brain Research
Abstract (English)
Keyword spotting (KWS) provides a critical user interface for many mobile and edge applications, including phones, wearables, and cars. As KWS systems are typically ‘always on’, maximizing both accuracy and power efficiency are central to their utility. In this work we use hardware aware training (HAT) to build new KWS neural networks based on the Legendre Memory Unit (LMU) that achieve
state-of-the-art (SotA) accuracy and low parameter counts. This allows the neural network to run efficiently on standard hardware (212 µW). We also characterize the power requirements of custom designed accelerator hardware that achieves SotA power efficiency of 8.79 µW, beating general purpose low power hardware (a microcontroller) by 24x and special purpose ASICs by 16x.
Low-precision Winograd Convolution over Residue Number System
Zhi-Gang LIU, Research Engineer, Arm
Abstract (English)
The low-precision (8 or sub-8bit) convolutional neural networks consume a fraction of memory footprint and power comparing to high-precision models running on mobile or embedded devices. The classical fast Winograd convolution algorithm requires high-precision floating-point operation and thus fails to accelerate the low-precision CNN. So, the current state-of-the-art low-precision convolution is a GEMM based approach relying on im2col or im2row transformations to convert the convolution into GEMM operation and each output demands 9 MAC operations for popular 3×3 filter, 25 ops for 5×5 filter. This work extends the Winograd algorithm to modular arithmetic and explores the optimized implementation of the fast low-precision convolution for ultra-low power machine learning (ML) at the edge. The new approach has arithmetic reduction up to 6.8x corresponding to 16×16 transformation tiles and only relies on int8 or int16 op which are well supported by commodity edge devices. We evaluated the performance of proposal with sub-8bit VGG16 and ResNet50v1 models on ImageNet dataset using Arm cortex A53 cpu and M7 mcu and observed more than 2x convolution latency reduction.
An Introduction to an Open-Source Fixed-Point Inference Framework – NNoM
Jianjia MA, Research Fellow, University of Southampton
Abstract (English)
Recent years, the optimization on Neural Networks (NN) structures (such as Inception, ResNet) has effectively reduce the overall computational complexity of an NN model, which brings more potential tinyML applications. However, the complexity of deploying NNs also increased due to more arguments in deeper models, complex structure management and memory management. Neural Network on Microcontroller (NNoM) is a high-level inference framework which aims for providing an easy to use interface for developers to deploy complex NN model while the framework can manage the layer structure, content related arguments and memory. The use of NNoM minimizes the effort of deployment thus developers can focus on optimizing the structure to improve model efficiency.
NNoM is written in C (ISO/IEC 9899:1999) for compatibility to different tinyML development environments. We provided a set of Python scripts which calibrate and quantize parses Keras’ model and write into a single C header for inference. NNoM has a unique compiling process to minimize the memory cost and inter-layer switching time. There are two backends supported, a local C backend and the optimized CMSIS-NN provided by ARM.
A typical footprint for a VGG type model NNoM is 8.9k ROM excluding weights. NNoM supports 30+ different layers, including convolutional layer, fully connected layer, recurrent layers, activations, and others. Evaluation of the quantized model is necessary, NNoM provides many evaluation APIs for evaluating the performance and accuracy on the targeted platform. Besides, NNoM comes with many examples including Speech Keyword Spotting and Speech Noise Suppression. Source code available at https://github.com/majianjia/nnom.
Partner Sessions - Edge Hardware
TinyML is more than Model Building
Stuart FEFFER, Co-founder and CEO, Reality AI
Abstract (English)
The “data science”-driven approach to Tiny ML starts with the data. It’s about finding a machine learning model that gives the most accurate predictions based on the data for a target footprint size. But an “engineering”-driven approach to Tiny ML understands that data is a product of instrumentation, and for many applications — particularly those that involve non-visual, non-voice applications — the right approach iterates on the instrumentation and hardware, informed by the accuracy of the ML models. In this talk we’ll talk about “engineering-driven TinyML”, how to use TinyML model performance to improve your hardware design, and demonstrate TinyML tool support for hardware design and sensor optimization. Let me know if you have any questions or feedback.
Low-power vision processing and signal processing for IoT and edge devices
Dylan MUIR, Director for Algorithms and Applications, SynSense
Abstract (English)
New NN accelerator hardware based on binarised asynchronous communication — sometimes referred to as spiking neural networks (SNNs) — promises to deliver complex sensory processing, including vision processing, for energy-constrained devices. In this session we look at several low-power edge use cases for audio and vision processing, with an overview of training and optimisation approaches, and example deployment on accelerator hardware.
It’s an SNN future: Are you ready for it? Converting CNN’s to SNN’s
Kristofor CARLSON, Manager of Applied Research, BrainChip Inc.
Abstract (English)
CNN’s take too much time and consume too much power and area for today’s neural networks. With event based Spiking Neural Networks (SNN’s) one is able to address the deficiencies that CNN’s are not able to address with the current CNN architecture. Converting CNN’s to SNN’s will enable today’s designers achieve tomorrow’s efficient and effective technology solutions.
The intersection of hardware and software and the shift left of algorithm development
Karl FEZER, AI Ecosystem Evangelist, Arm
Abstract (English)
The future of tinyML relies on a strong collaboration between hardware and software developers. Algorithm development adds a layer of complexity and optimization that needs to keep pace with the rapidly advancing field of ML. Arm will lead a discussion on the opportunities we have as an industry to bring these two communities together and build innovative tinyML applications of the future, including increased access to hardware virtualization earlier in the life cycle of target platforms, where algorithms and hardware are developed congruently.
Partner Sessions - Edge Applications
Session Moderator: Steve WHALLEY, CEO, Strategic World Ventures
How Adaptive AI Solves Big Challenges for tinyML
Jags KANDASAMY, Co-Founder & Chief Executive Officer, Latent AI Inc.
Abstract (English)
Edge AI is already powering billions of smart devices generating zettabytes of data. This market dynamic presents tremendous opportunities as well as significant challenges, requiring new Edge AI solutions, especially for enterprise. Come and learn how Adaptive AI can help build and deploy tinyML models.
Low power computer vision with Eta Compute AI Vision board
Semir HADDAD, Senior Director Product Marketing, Eta Compute
Abstract (English)
In this session, we will explain how to design low-power computer vision applications with Eta Compute’s ECM3532 AI Vision board. We will show some examples and demos of low power vision in action and advise how to get started to build your own battery-operated computer vision solution.
Pushing the AI Envelope at Cisco
Chris ROWEN, VP of Engineering, Cisco
Abstract (English)
The life-blood of machine learning is data, so it’s natural that a data-obsessed company like Cisco would be ripe with potential AI applications. Come hear about some of the latest initiatives in video and speech AI, especially for large scale collaboration. We’ll dive into applications and methods in speech-based assistants, speech enhancement, gesture recognition, and video segmentation, especially in complex edge + cloud systems. We’ll wrap up with discussion of emerging principles for responsible AI development, especially as real concerns on fairness and privacy are shaping the environment.
Edge ML hardware for every application
Moderator: Kevin KREWELL, Principal Analyst, TIRIAS Research
Karl FEZER, AI Ecosystem Evangelist, Arm
Mallik P. MOTURI, VP Product and Business Development, Syntiant
Abstract (English)
There are a huge range of workloads in tinyML, from sensor based anomaly detection to image object detection. In this session industry analyst Kevin Krewell will lead a discussion on tinyML hardware and how it will evolve to support a range of advanced applications and lower power.
tinyML vision challenge
Kwabena AGYEMAN, President & Co-Founder, OpenMV, LLC
Zach SHELBY, Co-founder and CEO, Edge Impulse
Abstract (English)
Introducing the upcoming tinyML computer vision challenge. Learn how to get involved and create inspiring new applications using tinyML on computer vision and win up to $6k in prizes and recognition from the tinyML Foundation!
1:00 pm to 1:30 pm
Partner Hangouts
The Partner Hangout sessions will be an opportunity to hear from commercial companies in the tiny machine learning ecosystem on market and technology trends they are addressing to enable the exponential growth of tiny machine learning solutions. These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points. Expect to hear how problems and gaps are being solved and what still needs to be done and why.
Please see the schedule in the virtual event platform to see each day’s room assignments.
Pacific Daylight Time / UTC-7
8:00 am to 8:15 am
Opening and Awards Ceremony
8:15 am to 9:00 am
Keynote: Efficient Audio-Visual Understanding on AR Devices
Vikas CHANDRA, Senior Director, Meta Reality Labs
Abstract (English)
Augmented reality (AR) is a set of technologies that will fundamentally change the way we interact with our environment. It represents a merging of the physical and the digital worlds into a rich, context aware user interface delivered through a socially acceptable form factor such as eyeglasses. The majority of these novel experiences in AR systems will be powered by AI because of their superior ability to handle in-the-wild scenarios. A key AR use case is a personalized, proactive and context-aware Assistant that can understand the user’s activity and their environment using audio-visual understanding models. In this presentation, we will discuss the challenges and opportunities in both training and deployment of efficient audio-visual understanding on AR glasses. We will discuss enabling always-on experiences within a constrained power budget using cascaded multimodal models, and co-designing them with the target hardware platforms. We will present our early work to demonstrate the benefits and potential of such a co-design approach and discuss open research areas that are promising for the research community to explore.
9:00 am to 9:45 am
Keynote: Data-Free Model Compression
Mohammad RASTEGARI, Senior AI/ML Technical Leader, Apple
Abstract (English)
Efficient method for compressing a trained neural network without using any data is very challenging. Our data-free method requires 14x-450x fewer FLOPs than comparable state-of-the-art methods. We break the problem of data-free network compression into a number of independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. We show state-of-the-art performance on MobileNetV1 for data-free low-bit-width quantization. We also show state-of-the-art performance on data-free pruning of EfficientNet B0 when combining our method with end-to-end generative methods.
10:00 am to 10:15 am
Today’s Breakout Pitches
10:15 am to 10:45 am
tiny Talks
TinyML Software Runtime for Hybrid Multicore Architecture
Nilanjan ROYCHOWDHURY, Principal Software Architect, Eta Compute
Abstract (English)
A lot of emphasis in tinyML has been in designing the best neural network and optimizing it to reduce the number of operations and memory needs. Yet, training a very efficient neural network is only one piece of the equation for TinyML. The other piece is how to run it on actual embedded hardware.
Indeed, the tinyML hardware is very often complex, including many cores for the sake of efficiency. Moreover, because sensor processing requires a combination of signal processing, procedural computing and neural network acceleration, the hybrid multicore architecture is becoming popular for edge AI hardware with a combination of heterogenous cores: CPU, DSP and NPU.
To run efficiently on these hybrid multicore systems, there must be a runtime that allocates resources, core and memory, in the most optimized way, while minimizing processing overhead and memory transfers.
In this presentation we will review the various ways the industry is addressing this challenge and how Eta Compute solved it with the TENSAI Flow runtime and executors.
Insights from a Multi-Purpose Self-Learning Smart Sensor
Kaustubh GANDHI, Senior Product Manager Software, Bosch Sensortec
Abstract (English)
Edge-AI devices need to ensure context-sensitive adaptation and real-time personalization for end-users. In this talk, we introduce some insights gained while designing Bosch’s novel self-learning sensor.
The sensor’s self-learning function enables the device to learn new motion patterns in-use directly from the end-user, to personalize built-in patterns directly for an end-user and automatically classify and count the movement types in real-time, all within the sensor itself.
In spite of delivering an AI experience, the function runs on sensor’s co-processor with ca. 300 µA and memory under 50 KB, while yet delivering over 90% accuracy for personalized home workouts. This is significant improvement for learning at the edge on wrist and in-ear wearables.
Secondly, as the sensor is capable of switching to a different function in run-time, sensor purpose can change depending on user’s context, such as the orientation and position tracking during running, style classification during swimming or personalization during fitness workouts.
Thirdly, the design allows the self-learning feature to utilize an expandable list of virtual sensors from sensor data fusion (e.g. quaternions) and peripherals (e.g. magnetometer, pressure sensors).
This enables faster and robust pattern detection from an expandable list of input sources, chosen according to target application, as against to pre-programmed AI solutions with fixed inputs.
In summary, in order to realize true potential of edge-AI, it is important to design the software with capabilities to learn and adapt to the end-user while maintaining scalability for diverse applications.
10:45 am to 12:00 pm
Breaking News on Disruptive Products and Tools
Sean MCGREGOR, Member of Technical Staff, Syntiant
Ravishankar SIVALINGAM, Sr. Staff Engineer/Manager, Qualcomm
Jan JONGBOOM, CTO, Edge Impulse
Meng LI, Senior AI Research Scientist, Facebook Inc.
Harsha VISWANATH, Principal AI Technical Leader - Azure Edge Device, Platform and Solutions Group, Microsoft
Abstract (English)
Syntiant:
“TinyML Solution Power without Tiny Models: the NDP120″
Syntiant made international headlines in 01/2021 announcing that its Syntiant® Core 2™ neural network inference engine can process multiple concurrent heterogeneous networks simultaneously while drawing < 1mW power consumption.
Embedded in the Syntiant® NDP120™, the company’s newest generation deep learning processor for audio and sensor applications in edge devices, the Syntiant Core 2, delivers 25x the tensor throughput of the Syntiant Core 1™ found in the Syntiant® NDP100™ and Syntiant® NDP101™ devices, which achieve 100x efficiency and 10x the throughput over traditional CPUs and DSPs.
Whether running far-field speech processing applications such as audio filtering and echo cancelation to multi-modal sensor fusion and infrared detection, data scientists at Syntiant can 1) explain how a purpose-built compute engine for neural inference solves the power consumption problem; and 2) illustrate how edge AI is creating opportunities to better connect people through “smarter” devices, free from the cloud, and with minimal drain on battery consumption.”
Qualcomm Technologies Inc
“New tinyML use-case: Ultra-low Power Eyetracking with Qualcomm QCC112″
In resource-constrained AR/VR applications, accurate pupil detection enables downstream applications such as eyetracking and iris recognition, by drastically reducing the input image size and allowing for focused compute on relevant regions of interest. Qualcomm QCC112 is an ultra-low power computer vision sensor capable of running real-time object detection at ~1 mW system power. We showcase pupil detection operating at 60-100 fps on this hardware, with hardware accelerated object detection on qqVGA resolution (160 x 120 pixels) grayscale input images. The pupil detection model is approximately 40 kB and is robust to low light, which allows for accurate detection at less than 1 millisecond exposure with infrared LED illumination. The model is able to precisely localize the pupil despite occlusions by eyelids/eye corners. We also provide training tools for users to train custom models on their own datasets if so desired.”
Edge Impulse
EON Tuner: Find the best model with your device constraints in mind, from signal processing parameters to ML architecture
Around the date of the TinyML summit we’ll be releasing EON Tuner – it’s an AutoML pipeline specifically for sensor data on constrained devices. Rather than just finding the right hyperparameters within a neural network it has a search space that includes signal processing algorithms + parameters to preprocess the data, and can consider both classic ML and neural networks for classification. And to prevent finding models that won’t fit your usecase you set device constraints at the start (e.g. needs to run 5x a second on a Cortex-M0+ @ 48MHz in max. 20K RAM).
Facebook Inc
Improve weight-sharing NAS with better search space and better supernet training
Weight-sharing neural architecture search (NAS) is effective to automate efficient model design. Weight-sharing NAS builds a supernet that assembles all the architectures as sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily depends on 1) the search space design, 2) the sub-network sampling strategy, and 3) in-place distillation. Though important, these key factors are not well studied in prior works. In this presentation, we introduce our recent works on improving the weight-sharing NAS. We introduce a multi-scale search space to better capture the scale variance prominent in the image inputs. We then introduce a new sampling strategy that focuses supernet training to sub-networks on the pareto front. We further propose a generalized alpha divergence for distillation to guide the supernet optimization. The discovered model family achieves SOTA results on various visual tasks, including image classification (on ImageNet), bottom-up pose estimation (on Coco and CrowdPose).
Azure Percept & TinyML
While most developers and organizations can stand behind the benefits of edge AI, they often face costly and timely challenges when it comes to end-to-end development, deployment, and management. These potential roadblocks include training AI models, creating low power yet high-performance hardware, seamless provisioning of workloads, management and updating of devices and applications, integrating with existing applications, and helping to ensure the data and models are secured. That is why we’re introducing Azure Percept—the most comprehensive, easy-to-use platform with added security for creating edge AI solutions. We will introduce Azure Percept, it’s capabilities with respect to ML and we will also talk about how we can leverage this to scale down to a smaller footprint device which can potentially run using AA batteries. Here we will address the requirements and challenges in terms of not only the ML models required to run in this environment but also the overall system requirements.
12:00 pm to 1:00 pm
tiny Talks & Partner Sessions – Edge Applications
Environmental Noise Classification on Microcontrollers
Jon NORDBY, CTO, Soundsensing
Abstract (English)
Noise is a growing problem in urban areas, and according to the WHO is the second environmental cause of health problems in Europe.
Noise monitoring using Wireless Sensor Networks are being applied in order to understand and help mitigate these noise problems. It is desirable that these sensor systems, in addition to logging the sound level, can indicate what the likely sound source is. Performing such Environmental Noise Classification directly in the sensor is desirable in order to avoid sending audio data to the cloud, which may have negative impacts on data transfer amounts, battery lifetime and privacy.
In this talk we will explain how we tested several different Convolutional Neural Networks for this task on the STM32L476 low-power microcontroller, and the results we were able to achieve on the Urbansound8k dataset. Several techniques such Depthwise-Separable convolutions, striding for downsampling, reducing input dimensionality was tested in order to make the CNN models as efficient as possible, and these will likely be useful also for other audio or image tasks.
The research was initially carried out as part of a master thesis at the Norwegian University of Life Sciences (NMBU). Since then, we have continued to work on this topic at Soundsensing, and we will share some of the progress and challenges in bringing this kind of research to market.
Real-World Performance Analysis of Visual Wake Words
Luke BERNDT, Senior Director, In-Q-Tel
Abstract (English)
The Google Visual Wake Words paper (Chowdhery et al., 2019) proposes techniques for creating object recognition models appropriate for microcontrollers. The paper demonstrates an accurate person detection model trained using the Microsoft Common Objects in Context (COCO) dataset. Because the COCO dataset is built on photographs found internet photography sites and because these images are composed by a photographer, the COCO dataset, we hypothesize, may be ill-suited for tinyML visual sensors. Typical visual sensors often have unusual perspectives of an object, which can result in poor object recognition.
We therefore investigated model performance on classes other than persons, evaluated performance by deploying the model on hardware in the wild, and then built a novel dataset for real world testing. In certain real-world environments, we found a decrease in accuracy of over 50%. Additionally, we investigated transfer learning and techniques for identifying blind spots in models to better target the augmentation of objects in the dataset. We find that extra care is needed when using general-purpose image datasets, like COCO, to train models for tinyML based visual sensors.
Always watching, sensing and listening by Himax WE-I Plus at the edge
Mark CHEN, Vice President, Himax Technologies
Abstract (English)
Himax’s WE-I Plus, an ultra-low power AI accelerator-embedded processor, is designed to accommodate a wide selection of TinyML Neural Network models with programmable DSP running up to 400MHz clock and 2MB internal SRAM. WE-I Plus supports TensorFlow Lite for Microcontrollers framework and is able to run inferences such as open-source Google Examples that are available at Google’s Github. Facilitated by ultra-low power always-on image sensor, and ultra-low power AI processor with built-in support for Google Tensor flow lite for microcontrollers framework, WE-I Plus has been proven to make AI algorithm development easier than ever.
Production Worthy Tools for Creating AI at the IoT Edge
Christopher B. ROGERS, CEO, SensiML Corp
Partner Sessions - Processing Engines
Tiny and Flexible ML with Lattice FPGA
Sreepada. V. HEGADE, Senior Manager, Lattice Semiconductor
Abstract (English)
The inference of neural networks with resource constrained devices that is fueling the growth of ML at edge is part of entire solution that involves other essential components like data aggregation, augmentation and post processing of inference output. Along with this, introduction of new network topologies at rapid phase to meet every growing demand for accuracy and performance, requires that solutions that supports “Tiny ML” is flexible. Also, the engine that does network inference needs to be tuned for different type of network topology. For example, MobileNet introduced to efficiently implement neural networks on resource constraint devices cannot be efficiently implemented with NN engines designed for normal convolution.
The configurable nature of FPGA devices allow for quick adoption of emerging neural network topologies. The flexible IO also helps to implement data aggregation and other peripheral operations. The soft core implemented on Lattice FPGAs can be changed and/or optimized depending on target network topology. In this talk we discuss how we optimize network topologies and software compiler to get best out of FPGA for end applications.
Machine Learning in Wireless IoT Applications
Peter SCHULMEYER, Senior Director, Silicon Labs
Abstract (English)
Over the next few years, artificial intelligence and machine learning are expected to become ubiquitous for devices that are part of the Internet of Things [IoT].
Silicon Labs is addressing the challenges by making it possible to run machine learning models on small, low-power wireless SoCs to address this market. We will cover market tends and how to identify IoT applications where single chip solutions that integrate machine learning and wireless connectivity make sense.
System Level Energy Considerations for Battery Powered AI
Afshin NIKTASH, Senior Principal Software Engineer, Maxim Integrated
Abstract (English)
To bring complex AI inferencing to battery powered applications, several approaches are available to improve the energy consumption of convolutional neural network (CNN) computations. This presentation will overview the MAX78000 AI microcontroller and outline a number of system (not just ML) factors that can be considered to achieve audio and visual AI inferencing at battery power levels, and specific examples of face identification and keyword spotting will be highlighted.
TinyML: The power / cost conundrum
Mark LIPPET, CEO and President , XMOS
Abstract (English)
When we think about cost, we have our eye firmly on the customer: what are the products and features they need now and, in the future – and can we develop compelling solutions at a price point that’s attractive for both of us? When it comes to power, needs and attitudes are much more nuanced. There’s a clear focus on battery life and sustainability, but power is often viewed at a component level versus a system level.
Focusing on low power at a single system component level can lead customers to overlook the energy consumption / cost of the system as a whole (as well as the long-term cost and value it brings to the end user).
In this session, we will explore a more representative way of measuring energy consumption in TinyML processors, and the system trade-offs that must be made to minimise energy consumption and lower cost.
Always-on AI vision: The path to disruptive, high-scale applications
Moderator: Jeff HENCKELS, Director, Product Management & Business Development, Qualcomm
Peter BERNARD, Sr. Director, Silicon and Telecom, Azure Edge Devices, Platform & Services, Microsoft
Lian Jye SU, Principal Analyst, ABI Research
Edwin PARK, Principal Engineer, QUALCOMM Inc
Evan PETRIDIS, Chief Product Officer, EVP of Systems Engineering, Eta Compute
Tony CHIANG, Sr. Director of Marketing, Himax Imaging
Abstract (English)
Vision is the most challenging AI/ML task to tackle in power and resource-constrained battery-operated devices. This panel will focus on the state-of-the-art and the innovation roadmap ahead, discussing which/how/when specific R&D breakthroughs will enable disruptive, high-scale use cases and applications in the future.
• Which use cases/applications are driving always-on AI vision? Both today and in the future?
• What are the biggest gaps in achieving the long-term potential of always-on AI vision, and how is industry addressing it?
• What does the innovation roadmap look like? When and how will technology advances open up new applications and drive scale, adoption, and new investments?
ML in Smart Homes and Buildings
Stacey HIGGINBOTHAM, Founder , Stacey on IoT
Zach SHELBY, Co-founder and CEO, Edge Impulse
Abstract (English)
We have seen a huge impact in smart homes and buildings from IoT technology and more recently from voice assistants. Stacey Higginbotham, founder of Stacey on IoT, will lead of discussion on what applications we might expect to see next thanks to tinyML in the home and office.
1:00 pm to 1:30 pm
Partner Hangouts
The Partner Hangout sessions will be an opportunity to hear from commercial companies in the tiny machine learning ecosystem on market and technology trends they are addressing to enable the exponential growth of tiny machine learning solutions. These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points. Expect to hear how problems and gaps are being solved and what still needs to be done and why.
Please see the schedule in the virtual event platform to see each day’s room assignments.
Schedule subject to change without notice.
Committee
Marian VERHELST
Technical Program Chair
KU Leuven
Peter VAJDA
Technical Program Vice-Chair
Edith BEIGNÉ
Ian BRATT
Arm
Ofer DEKEL
Microsoft Research
Ira FELDMAN
tinyML Foundation
Adam FUKS
NXP
Evgeni GOUSEV
General Chair
Qualcomm Research, USA
Joseph HASSOUN
Samsung Semiconductor
Kurt KEUTZER
University of California, Berkeley
Boris MURMANN
Stanford University
Chris ROWEN
Cisco
Moritz SCHERER
ETH Zürich
Zach SHELBY
Edge Impulse
Steve WHALLEY
Strategic World Ventures
Wei XIONG
Hoi-Jun YOO
KAIST
Huichu LIU
Meta
Speakers
Kwabena AGYEMAN
OpenMV, LLC
Raziel ALVAREZ
Panelist
Michael Azoff
Kisaco Research
Elad BARAM
Emza Visual Sense
Luca BENINI
ETHZ | University of Bologna
Peter BERNARD
Microsoft
Luke BERNDT
In-Q-Tel
Kristofor CARLSON
BrainChip Inc.
Lee CARTER
Momenta Ventures
Luis CEZE
Panelist
OctoML
Sek CHAI
Latent AI
Vikas CHANDRA
Meta Reality Labs
Mark CHEN
Himax Technologies
Song CHEN
Tutorial
Facebook Reality Labs Research
Tianqi CHEN
Panelist
OctoML
Hoon CHOI
Lattice Semiconductor
Bill COUGHRAN
Panelist
Sequoia Capital
Martin CROOME
GreenWaves
Chris ELIASMITH
Applied Brain Research
Stuart FEFFER
Reality AI
Karl FEZER
Arm
Kaustubh GANDHI
Bosch Sensortec
Semir HADDAD
Eta Compute
Song HAN
MIT EECS
Sreepada. V. HEGADE
Lattice Semiconductor
Koen HELWEGEN
Plumerai
Jeff HENCKELS
Qualcomm
Stacey HIGGINBOTHAM
Stacey on IoT
Jan JONGBOOM
Tutorial
Edge Impulse
Kate KALLOT
NVIDIA
Jags KANDASAMY
Latent AI Inc.
Kurt KEUTZER
University of California, Berkeley
Laszlo KINDRAT
XMOS
Abhijit KHOBARE
Tutorial
Qualcomm Technologies, Inc. (QTI)
Kevin KREWELL
TIRIAS Research
Samir KUMAR
Panelist
M12
Chris LATTNER
Panelist
SiFive
Meng LI
Facebook Inc.
Mark LIPPET
XMOS
Zhi-Gang LIU
Arm
Jianjia MA
University of Southampton
Johan MALM
imagimob
Diana MARCULESCU
The University of Texas at Austin
Sean MCGREGOR
Syntiant
Orlando MOREIRA
GrAI Matter Labs
Moshe HAIUT
DSP Group
Mallik P. MOTURI
Syntiant
Dylan MUIR
SynSense
Afshin NIKTASH
Maxim Integrated
Jon NORDBY
Soundsensing
Edwin PARK
QUALCOMM Inc
Chirag PATEL
Tutorial
Qualcomm Technologies, Inc. (QTI)
Mohammad RASTEGARI
Apple
Christopher B. ROGERS
SensiML Corp
Nilanjan ROYCHOWDHURY
Eta Compute
Chris ROWEN
Cisco
Brandom RUMBERG
Aspinity
Davis SAWYER
Deeplite
Moritz SCHERER
ETH Zürich
Leslie SCHRADIN
Qeexo, Co.
Peter SCHULMEYER
Silicon Labs
Zach SHELBY
Edge Impulse
Daniel SITUNAYAKE
Tutorial
Edge Impulse
Ravishankar SIVALINGAM
Qualcomm
Thalia SPEAKER
WILDLABS
Lian Jye SU
ABI Research
Eileen TANGHAL
Panelist
In-Q-Tel
Urmish THAKKER
SambaNova Systems Inc
Tianqi CHEN
OctoML
Vikrant TOMAR
Fluent.ai
Harsha VISWANATH
Platform and Solutions Group, Microsoft
Sameer WADHWA
Qualcomm
Pete WARDEN
Tutorial
Behdad YOUSSEFI
Areanna AI