tinyML Summit 2021

Presentation slides are posted below in the schedule. Recordings will also be linked as they become available.

March 22-26, 2021

About

The tinyML Summit will be held virtually the week of March 22, 2021. We are in the process of re-envisioning our flagship event as a highly interactive online experience.

In conjunction with the Summit, we are also pleased to announce that we have added a new event for 2021: the tinyML Research Symposium.

The tinyML Summit is the premier annual gatherings of senior level technical experts and decision makers representing fast growing global tinyML community. This diverse ecosystem is composed of professionals from industry, academia, start-ups, and government labs worldwide working on leading-edge ultra-low power machine learning technologies for end-to-end (hardware –system –software applications full stack) solutions.

Venue

Virtual - online

Contact us

Bette COOPER

News

March 21, 2021

tinyML Awards 2021 Finalists

The tinyML Summit committee is pleased to announce the Finalists for the Best Product of the Year, and for the Best Innovation of the Year! Please join us on Monday March 22, 2021 at noon Pacific time to hear presentations from the finalists. (Please see the schedule below for details.)

February 12, 2021

tinyML Summit 2021 Breakout Sessions

New for 2021, tinyML is developing a series of breakout sessions. Breakouts are focused on bringing focused topics directly to the audience that needs them. They will be practical and interactive discussions to foster better understanding of design and application issues, best practices, tools, and funding opportunities to accelerate the deploymen

February 12, 2021

Breaking News On Disruptive Products And Tools

An exciting new session at the tinyML Summit will allow a small number of time slots to companies / experts / academia to share the very latest substantial and disruptive developments and upcoming products of significance in the field of tiny machine learning. Submit your Breaking News On Disruptive Products And Tools today!

Schedule

Pacific Daylight Time / UTC-7

8:00 am to 8:15 am

Open / Welcome

Evgeni GOUSEV, Senior Director, Qualcomm Research

Marian VERHELST, Associate Professor, KU Leuven

8:15 am to 9:45 am

Room 1

Tutorial: Training a Magic Wand

Pete WARDEN, Technical Lead, Google

This tutorial will show how to gather data, train, and deploy an IMU-based model for recognizing gestures on an Arduino Nano BLE Sense 33. It will use the Arduino IDE and Colab scripts to develop the model and will explain the feature generation needed to go from raw accelerometer and gyroscope data to input suitable for a neural network. Using TensorFlow Lite Micro and Arm’s CMSIS-NN library, you will learn how to create a practical application from scratch. It is recommended that you purchase the Arduino TinyML Kit to be able to follow along virtually.

Room 2

Tutorial: Image sensors for low power applications

Song CHEN, Research Scientist, Facebook Reality Labs Research

Image sensors are the front end of many computer-vision based input modalities. These human-machine input modalities usually need to run on a mobile platform which has stringent power requirement. This tutorial will cover both low power image sensor design from a designer’s perspective and some useful practices to save sensor power from a user’s perspective especially in ML applications. We will start by laying out basics including the operation principle of pixels, readout chain and other common blocks in an image sensor. Then, the trade-off between power consumption and general sensor performance will be discussed. Following the discussion, the effectiveness of power reduction techniques like subsampling, low frame rate, etc. and the impact on following ML processing stages will be evaluated with examples. Finally, an ultra-low power global-shutter digital pixel sensor developed at Facebook Reality Labs Research will be introduced.

  • YouTube

9:45 am to 10:00 am

Break

10:00 am to 11:30 am

Room 1

Tutorial: Advanced network quantization and compression through the AI Model Efficiency Toolkit (AIMET)

Abhijit KHOBARE, Director of Software Engineering, Qualcomm Technologies, Inc. (QTI)

Chirag PATEL, Principal Engr./Mgr. in Corp. R&D AI Research team, Qualcomm Technologies, Inc. (QTI)

AI is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within a tight power and thermal budget. Quantization and compression help address these issues. In this tutorial, we’ll discuss:

  • The existing quantization and compression challenges
  • Our research in novel quantization and compression techniques to overcome these challenges
  • How developers and researchers can implement these techniques through the AI Model Efficiency Toolkit
Room 2

Tutorial: Build Industrial-Grade tinyML applications with Edge Impulse!

Jan JONGBOOM, CTO, Edge Impulse

Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse

 In this free live workshop, you will build a full tinyML application, end-to-end, using the latest best practices in embedded machine learning. You will learn how to collect a dataset, design and train a tiny (but accurate) model, evaluate its performance, optimize it for embedded use, and integrate it into a real embedded application running on a genuine MCU.

Anyone registered for the Summit may join. To actually build an application you may either purchase the Thunderboard kit or if you’d like you can simply observe. If you order the kit, in addition you may download the Open Source Firmware for the Thunderboard companion development board which is hosted on GitHub.

  • YouTube

12:00 pm to 1:30 pm

Room 1

tinyML Awards 2021

Awards Finalist Presentations

Moderator: Wei XIONG, ,

François de ROCHEBOUËT, Founder CTO, Cartesiam

Jeff HENCKELS, Director, Product Management & Business Development, Qualcomm

Joseph HASSOUN, Sr. Director Neural Processor Architecture, Samsung Semiconductor

Sean MCGREGOR, Member of Technical Staff, Syntiant

Brandom RUMBERG, Founder and CTO, Aspinity

Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse

Best Product of the Year Finalists

(in alphabetical order)

  • Cartesiam
  • Qualcomm Always-On Vision
  • Samsung Exynos 2100 NPU
  • Syntiant NDP120

Best Innovation of the Year Finalists

(in alphabetical order)

  • Aspinity AnalogML
  • Edge Impulse EON Compiler

Pacific Daylight Time / UTC-7

8:00 am to 8:15 am

Open / Welcome

Evgeni GOUSEV, Senior Director, Qualcomm Research

Marian VERHELST, Associate Professor, KU Leuven

8:15 am to 9:00 am

Keynote: Putting AI on a Diet: TinyML and Efficient Deep Learning

Song HAN, Assistant Professor, MIT EECS

Abstract (English)

Deep learning is computation-hungry and data-hungry. We aim to improve the computation efficiency and data efficiency of deep learning. First, I’ll present MCUNet[1] that brings deep learning to IoT devices. MCUNet is a framework that jointly designs the efficient neural architecture (TinyNAS) and the light-weight inference engine (TinyEngine), enabling ImageNet-scale inference on IoT devices that have only 1MB of Flash. Next I will talk about TinyTL[2] that enables on-device transfer learning, reducing the memory footprint by 7-13x. Finally, I will describe Differentiable Augmentation[3] that enables data-efficient GAN training, generating photo-realistic images using only 100 images, which used to require tens of thousand of images will be discribed. It is hopeful that such TinyML techniques can make AI greener, faster, and more sustainable.

[1] MCUNet: Tiny Deep Learning on IoT Devices, NeurIPS’20, spotlight.
[2] TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning, NeurIPS’20
[3] Differentiable Augmentation for Data-Efficient GAN Training, NeurIPS’20

9:00 am to 9:45 am

Keynote: Many shades of acceleration – an Open TinyML Platform Perspective

Luca BENINI, Chair of digital Circuits and systems | Full Professor, ETHZ | University of Bologna

Abstract (English)

The next wave of “Extreme Edge AI” pushes signal processing and machine learning aggressively towards sensors and actuators, with sub mW (TinyML) power budgets, while at the same time raising the bar in terms of accuracy and flexibility. To succeed in this balancing act, we need principled ways to walk the line between general-purpose and highly specialized architectures. In this talk I will detail on how to walk the line, drawing from the 40+ chips tape-out experience of the open PULP (parallel ultra-low power) platform, based on RISC-V processors coupled with domain-specific acceleration engines.

9:45 am to 10:00 am

Break

10:00 am to 10:15 am

Today’s Breakout Pitches

10:15 am to 10:45 am

tiny Talks

Compute-in-Memory Hardware Accelerator for Always-On TinyML

Sameer WADHWA, Senior Director, Qualcomm

Abstract (English)

Always-ON tiny-ML use-cases rely on an efficient hardware accelerator to maximize battery run-time while enabling increasingly complex models.
The energy efficiency limitations of Von-Neumann architectures while executing memory bandwidth intensive compute use-cases presented by DNN are well understood. It has also been shown that a large subset of DNN models can function with little or no accuracy degradation down to 8-bit or even lower quantization levels for activations and weights.
Compute-In-Memory (CIM) is an active research area in academia and industry to achieve a significant compute energy efficiency improvement by reducing the memory bandwidth requirements when executing DNN models and taking advantage of analog compute to improve MAC computation efficiency.
This work details a CIM based stand-alone DNN hardware accelerator chip that is particularly well-suited to executing always-ON tiny-ML models. It supports convolution, fully-connected, pool, Relu, Sigmoid, Tanh layers with 1/2/4/8-bit quantized activations and weights.
Multiple CIM cores on the chip can operate in parallel to support always-ON voice keyword detection and human-detect computer-vision models while consuming very low power.
The chip comes with a tool flow to support quantizing, training, compiling off-the-shelf models to efficiently map them on the hardware. Both voice UI and CV use cases are used to demonstrate the chip’s low power performance.

Supporting Tensorflow Lite MCU in tiny low power FPGAs

Hoon CHOI, Fellow, Lattice Semiconductor

Abstract (English)

The arena of cost optimized, high performance Edge accelerators is growing increasingly competitive with a variety of architectures to choose from when implementing an AI capable system. As new generation of Edge applications emerges, designers are increasingly pressed to develop solutions that combine low power and low latency, they require easy to use and flexible accelerators.

Lattice’s FPGAs are uniquely positioned to address the rapidly changing world for Edge devices. This class of FPGAs possess the parallel processing capabilities inherent in FPGAs to accelerate neural network performance and are HW programmable to keep up with the fast pace of changing ML algorithms. They are designed with higher on chip memory, optimized DSP blocks and compute resources distributed through the fabric for workload acceleration resulting in a low power system implementation.

To provide a software programmable solution that is easy to use, support for TF Lite with soft RISC-V was implemented on the FPGA fabric. Creating best of both world, programmable device with flexible acceleration blocks running on HW to enable developers with or without FPGA expertise to build their systems more quickly. Comparing TF Light implementation on ARM M4 based CPU vs. FPGA of comparable size/cost, the FPGA runs 2~10x faster than the MCU for a comparable power consumption.
In this presentation, we cover the details of the accelerators we designed, the limitations we faced that hindered further optimizations in accelerators, and possible solutions to the limitations.

  • YouTube

10:45 am to 12:00 pm

Panel Discussion

Opportunities at the Edge: Venture and tinyML

Moderator: Kurt KEUTZER, Full Professor, University of California, Berkeley

Chris ROWEN, VP of Engineering, Cisco

Bill COUGHRAN, Founders' Coach and Partner, Sequoia Capital

Samir KUMAR, Managing Director, M12

Luis CEZE, Co-founder and CEO, OctoML

Eileen TANGHAL, Senior Partner, In-Q-Tel

Pushing machine learning into ultra-low-power applications at the edge isn’t just an academically compelling idea, it is a potentially disruptive shift in mass-market technology. In this panel we’ve gathered four distinguished venture capitalists to look at tinyML opportunities through an entrepreneurial lens. In particular we have asked them to consider:

  • What makes you interested in investment opportunities for machine learning at the edge?
  • What is your general advice to tech entrepreneurs: build a horizontal platform for broad application or target a particular vertical?
  • What are some of the particular near-term opportunities for venture investment at the edge that you find especially exciting?
  • How might VC’s value a tinyML-startup – how much is it driven by the target market, the team, the technology, or the data?
  • And the $6.4B question: what is the future killer app at the edge?

12:00 pm to 1:00 pm

Room 1

tinyTalks Hardware Optimization

Performing Inference on Binarized Neural Networks with xcore.ai

Laszlo KINDRAT, Senior Technologist, XMOS

Andrew STANFORD-JASON, Engineer, XMOS

Adam HILLIER, Deep Learning Scientist, Plumerai

Abstract (English)

Ultra-low bitwidth neural networks have been a hot topic in tinyML community, both in terms of novel hardware accelerators, as well as software solutions for training and deployment. In particular, binarized neural networks (BNNs) show large potential due to their simple hardware requirements. xcore.ai (a fast, economical crossover processor from XMOS), has a vector unit with specialized instructions for performing inference on BNNs, which to the best of our knowledge makes it the first MCU class chip with a BNN accelerator in mass production. In this talk we describe these instructions in detail, and how they enable a theoretical maximum of 286GOps/s when executing binarized neural networks. Secondly, we give an overview of our machine learning model deployment toolchain that seamlessly integrates with Larq, a popular open-source framework for training binarized neural networks. Finally, we present performance benchmarks on image classification models with various combinations of binarized and 8bit quantized layers.

Ultra-low Power and Scalable Compute-In-Memory AI Accelerator for Next Generation Edge Inference

Behdad YOUSSEFI, Founder and CEO, Areanna AI

Abstract (English)

Edge AI hardware accelerators are either deployed on Edge servers where sophisticated AI models run on a power budget between 1-10 Watts or on Edge devices where simple AI models run at milliwatts of power. But implementing more sophisticated AI models on Edge devices requires further development of ultra-low power architectures.

Research has shown that power consumption is dominated by data communication between memory and processor. To minimize data movement, the Compute-In-Memory (CIM) architecture has been explored by companies/academics. CIM is inherently a mixed signal architecture and hence requires data converters to interface between layers of network. However, data converters have proven to be the Achilles’ heel of this architecture as they take up to 98% of overall chip area and consume more than 85% of overall power consumption, defeating the whole purpose of CIM architecture. CIM also suffers from analog nonidealities which can degrade AI performance. Furthermore, the extra processing steps needed to fabricate the memory array in CIM limits the scalability of this architecture.

Areanna’s architecture addresses these issues using our proprietary Compute-and-Quantize-In-Memory (CQIM) architecture where SRAM bit-cells are repurposed to construct data converters, improving power/area efficiency by over an order of magnitude. Using logic gates as its building blocks, CQIM is inherently a digital architecture and scales well with the latest process nodes. High power efficiency and scalability of this architecture brings deployment of sophisticated real-time AI models with mW power budget within reach. A CQIM prototype is implemented and taped out in standard CMOS process.

  • YouTube

CUTIE: Multi-PetaOP/s/W Ternary DNN inference Engine for TinyML

Moritz SCHERER, PhD Student, ETH Zürich

Abstract (English)

With the surge in demand for deeply embedded deep learning on increasingly power-constrained, devices, neural network inference engines must continue to improve in terms of energy efficiency. In recent years especially, accelerators for networks with binary and ternary weights and activations have been addressing this demand, achieving energy efficiencies that are orders of magnitude higher than byte precision accelerators. We address the main bottlenecks for energy efficiency in binary and ternary neural networks accelerators and present CUTIE, the Completely Unrolled Ternary Inference Engine.

The design of CUTIE is focused on minimizing non-computational energy and switching activity so that dynamic power spent on storing intermediate results is minimized. We achieve this by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) exploiting an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity.

We demonstrate that our architecture achieves better-than-binary inference accuracy at dramatically higher energy efficiency. We present power simulation data showing an average energy efficiency of 2.1 POp/s/W, while achieving 88% inference accuracy on CIFAR-10 at an energy cost of 520 nJ, outperforming the state-of-the-art, including compute-in-memory (CIM) approaches, by a factor of 4.8x.

  • YouTube

Hardware aware Dynamic Inference Technology

Urmish THAKKER, Principal Engineer , SambaNova Systems Inc

Abstract (English)

There has been a recent surge in research in dynamic inference technologies to reduce the cost of inference without sacrificing the accuracy of the model. These models are based on the assumption that not all parts of the output feature map (OFM) are equally important for all inputs. The parts of the output feature maps that are deemed unimportant for a certain input can be skipped entirely or computed at a lower precision leading to reduced number of computation. This can enable faster inference of a large network leading to high accuracy. However, we show that the two popular methods that optimize different aspects of the OFM (channel and spatial) lead to sparse matrix multiplication during inference on a CPU which can lead to poor run-time characteristics in-spite of reduced number of MAC operations. We show a way to make these techniques SIMD Vector Length aware leading to block sparse matrices which can run more efficiently on a hardware with vector compute units. Our technique allows these models to create blocks of vector length 2, 4 and 8 with minimal loss in accuracy beating traditional pruning methods by a large margin for image classification task.

  • YouTube
Room 2

Partner Sessions - Tools & Algorithms

These sessions will be an opportunity to hear from commercial companies in the tinyML ecosystem on market and technology trends they are addressing to enable the exponential growth of tinyML solutions. These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points.

Innovative and Convolutional-Friendly tinyML Architecture for Small-Silicon, Low-Power Devices

Moshe HAIUT, CTO Staff, DSP Group

Abstract (English)

The concept of Neural Networks (NNs) has evolved from the basic perceptron to a fully connected (FC) NN layer. FC layers are based on a simple math operation – a vector multiplied by a matrix. Most existing DSPs have embedded multiply-accumulate (MAC) logic to handle these tasks. However, FC layers require a large number of parameters, which challenges the tinyML solutions with their limited silicon footprint. In contrast, convolutional layers use fewer parameters, making them a more compelling approach for memory-constrained applications, such as tinyML edge-based solutions. However, 2D convolutional layers add complexity to the computation algorithm, especially when there are multiple channels and when padding, dilation, and stride operations are applied.

To address this problem, what’s needed is a way to reduce the number of cycles and power consumption required to compute 2D convolutional and ConvTranspose layers.

This tinyML talk will show how to do this using the nNetLite, an ultra-low-power programmable NN processor that was developed within DSP Group to solve many of the issues associated with edge processing. The discussion will focus on a certain part of the processor – the Address Generation Unit (AGU) – that was designed specifically to accelerate the computation of complicated convolutional layers in memory and power-constrained ICs. The AGU is also capable of merging convolution and consecutive MaxPooling layers into a single layer which further conserves valuable memory space in tinyML hardware.

Attendees will understand how to use approaches such as the convolutional-friendly AGU hardware architecture to minimize the total number of cycles in convolutional-heavy NNs to design ultra-low-power AI devices that consume only microwatts of power.

  • YouTube

Tree Ensemble Model Compression for Embedded Machine Learning Applications

Leslie SCHRADIN, Principal Machine Learning Engineer, Qeexo, Co.

Abstract (English)

Embedded machine learning models need to have low memory footprint without compromising the classification performance. Tree-based ensemble models are very effective for sensor data machine learning. Depending on the application, they are often superior than neural-network-based models in terms of embedded metrics such as memory footprint, latency, and model performance, and often need less data to reach the same level of accuracy. In this webinar we will discuss generating tree-based ensemble models using well-known algorithms and then performing intelligent pruning and quantization particularly suitable for tinyML applications. Qeexo’s patent-pending algorithms first perform ensemble model compression by selecting the best candidate boosters; these boosters reduce the model size by almost 80% and still capture the classify-ability of full ensemble model. The compression is followed by 16-bit/8-bit quantization to further reduce the memory footprint. Using these techniques, Qeexo AutoML has compressed and quantized Gradient Boosting Machine (GBM), Random Forest (RF), Isolation Forest (IF), eXtreme Gradient Boosting (XGBoost), and Decision Trees (DT), making them much easier to fit into embedded targets. As a result, models generated by Qeexo AutoML have best-in-class latency and memory footprint without sacrificing performance.

  • YouTube

A VM/Containerized Approach for Scaling TinyML Applications

Kartik THAKORE, Founder, HOTG

Abstract (English)

Although deep neural networks are typically computationally expensive to use, technological advances in both the design of hardware platforms and of neural network architectures, have made it possible to use powerful models on edge devices. To enable widespread adoption of edge based machine learning, we introduce a set of open-source tools that make it easy to deploy, update and monitor machine learning models on a wide variety of edge devices. Our tools bring the concept of containerization to the TinyML world. We propose to package ML and application logic as containers called Runes to deploy onto edge devices. The containerization allows us to target a fragmented Internet-of-Things (IoT) ecosystem by providing a common platform for Runes to run across devices.

  • YouTube

tinyML is not tiny anymore

Mallik P. MOTURI, VP Product and Business Development, Syntiant

Abstract (English)

As always-on tinyML devices proliferate they are having an outsize impact on the world. From always-on voice in low cost phones, earphones, AR/VR glasses to always-on sensing of the environment in devices, machines, automobiles, we are seeing them in the “things” we are used to in our daily life at home, office, factories or outside. We will see more than 10 billion such TinyML devices being introduced into the market every year by 2023. TinyML is not tiny anymore.

  • YouTube
Room 3

Partner Sessions - Edge Applications

A real application of TinyML in Intelligent Building Sensors

Martin CROOME, Vice President Marketing, GreenWaves

Abstract (English)

In this session, we will look at a real-life example of using TinyML in smart building sensors. Up until now building sensors counting people have been expensive and difficult to install or have lacked accuracy. TinyML makes reliable, battery operated people counting possible with sensor battery lives of over five years in typical scenarios. This allows for inexpensive, easy to install products that bring a lot of value to facility managers. The applications enabled by sophisticated image and sound analysis in building sensors don’t just stop at counting. We will also look at some of the other use cases that are possible and how they are being enabled by GAP processors, now.

  • YouTube

Leveraging sparsity to drive fast response times at the edge

Orlando MOREIRA, Fellow and Chief Architect, GrAI Matter Labs

Abstract (English)

Sparsity is the idea that changes in the real world don’t happen everywhere, or all at once. NeuronFlow is novel multi-core processor architecture that exploits all forms of sparsity to deliver a scalable dataflow processing engine for AI applications at the Edge. In this presentation, we will discuss
The importance of fast responses or low latency in Edge AI applications
Metrics for latency and how they map to the Edge AI application performance
How the unique sparsity-exploitation characteristics of NeuronFlow enable real-time live AI applications where fast response times are essential.

  • YouTube

Gesture-controlled in-ear headphones, presentation and live demo

Johan MALM, AI Engineer, imagimob

Abstract (English)

Imagimob is a pioneer in tinyML with experience from 25+ tinyML customer projects, including projects with Scania, Husqvarna, Autoliv, Veoneer, Flir and many others. The first commercial product using an Imagimob tinyML application was commercially launched in 2018. This presentation is a case study where we are demonstrating how we are using Imagimob AI, our end-to-end toolchain, to develop an advanced audio application.

  • YouTube

TinyML journey: from face detection demo to real-life commercial deployment

Elad BARAM, VP Products, Emza Visual Sense

Abstract (English)

This is to tell about our experience in driving TinyML from POC level to a design win, planned to be deployed in millions of Notebooks. This is probably the first widely deployed commercial consumer case study.
One of the main topics we intend to cover, beyond the application itself, is the gap between available demos and benchmark, to what it takes to accommodate real life use cases – addressing different distances of objects, robustness in terms of light conditions etc.

While TinyML holds the potential to be extremely successful, through its inherent advantage of using low cost MCUs, bridging the technology gap is what will convert the demos to real business.

  • YouTube
Room 4

Market Opportunities for Edge AI

Michael Azoff, Chief Analyst, Kisaco Research

Lee CARTER , Principal, Momenta Ventures

Abstract (English)

In this session we will explore market opportunities for edge ML with Kisaco’s chief analyst Michael Azoff, Momenta VC partner Lee Carter and co-founder and CEO of Edge Impulse, Zach Shelby. The speakers will explore growth opportunities in the market, what this will look like in 10 years’ time and what technologies look promising.

  • YouTube
Room 5

tinyML for Good — Conservation & Climate

Moderator: Kate KALLOT, Head of emerging Areas, NVIDIA

Thalia SPEAKER, Program Officer, WILDLABS

Christopher B. ROGERS, CEO, SensiML Corp

Abstract (English)

TinyML has the potential to have a big impact on climate change and nature conservation work. In this session led by Kate Kallot, Head of Emerging Areas at NVIDIA and with guest Talia Speaker, Program Officer from WILDLABS and the WWF, we will hear about leading applications for tinyML that are having a real impact, and how you can get involved to support solutions in this space.

  • YouTube

1:00 pm to 1:30 pm

Partner Hangouts

The Partner Hangout sessions will be an opportunity to hear from commercial companies in the tiny machine learning ecosystem on market and technology trends they are addressing to enable the exponential growth of tiny machine learning solutions.  These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points.  Expect to hear how problems and gaps are being solved and what still needs to be done and why.

Please see the schedule in the virtual event platform to see each day’s room assignments.

Pacific DaylightTime / UTC-7

8:00 am to 8:15 am

Opening and Award Announcements

8:15 am to 9:00 am

Keynote: miliJoules for 1000 Inferences: Machine Learning Systems “on the Cheap”

Diana MARCULESCU, Professor and Department Chair, The University of Texas at Austin

Abstract (English)

Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. While the holy grail for judging the quality of a ML model has largely been accuracy and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk uncovers the need for designing efficient convolutional neural networks (CNNs) for deep learning mobile applications that operate under stringent energy and latency constraints. We show that, while CNN model quantization and pruning are effective tools in bringing down the model size and resulting energy cost by up to 1000x while maintaining baseline accuracy, the interplay between bitwidth, channel count, and CNN memory footprint uncovers a non-trivial trade-off. Surprisingly, our results show that when the channel count is allowed to change, a single weight bitwidth can be sufficient for model compression, which greatly reduces the software and hardware optimization costs for CNN-based ML systems.

9:00 am to 9:45 am

Keynote: Adaptive Neural Networks for Agile TinyML

Sek CHAI, Co-founder and CTO, Latent AI

Abstract (English)

We present a new way to run your neural network that dynamically minimizes the working footprint for both memory and compute horsepower. Such a formulation requires retraining the network in a way that offers runtime flexibility during inference. Ultimately, the dynamic neural network is highly agile and can self-regulate to minimize computational needs.

9:45 am to 10:00 am

Break

10:00 am to 10:15 am

Today’s Breakout Pitches

10:15 am to 10:45 am

tiny Talks

Using Neural Architecture Search for Speech Recognition on the Edge

Vikrant TOMAR, Founder and CTO, Fluent.ai

Abstract (English)

Despite recent developments in machine learning, finding an optimal solution for a given task remains a challenging and time-consuming task often requiring significant efforts in designing and tuning the neural architectures by an expert instead. This problem is more pronounced for TinyML solutions, where, due to limited computational resources, specific models are needed for a given task. To this end, we present a two-step solution. The first step employs GNASIL[2], a novel automated machine learning solution, for discovering an optimal neural architecture within a predefined limit of device specifications in FLOPS. The second step compresses the discovered architecture and make it even smaller.
GNASIL trains a soft actor-critic [2] reinforcement learning agent that expedites the discovery process by extending learning with planning options based upon past experiences and imitation learning through available expert-designed architectures on similar tasks. The architectures discovered by GNASIL are then compressed with automatic model compression (AMC)[3]. AMC uses DDPG [4] to learn the ratio of pruning for each layer. Reward is a function of accuracy and FLOPS. Optimal pruning is achieved in a way that has minimal effect on accuracy of the model despite often reducing the overall model footprint.
Our experiments on a series of on-device speech recognition tasks demonstrate that GNASIL can design neural models with competitive performance in terms of both discovery speed and the accuracy of the discovered architectures, all within the predefined FLOPS restrictions. Further, AMC is able to reduce the size of the model up to 40% without compromising accuracy.
References:

[1] Farzaneh S Fard, Arash Rad, Vikrant Singh Tomar. Nasil: Neural Architecture Search with Imitation Learning, ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[2] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1856–1865, 2018.
[3] Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Pro- ceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018.
[4] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.

Person Detection under Extreme Constraints: Lessons from the Field

Koen HELWEGEN, Deep Learning Scientist, Plumerai

Abstract (English)

We present various computer vision applications on microcontrollers that are enabled by Binarized Neural Networks (BNNs). This includes state-of-the-art models on the Arm Cortex-M4 architecture for the Visual Wake Words benchmark task (84.5% accuracy with under 170ms latency on a STM32F407VG) and person detection with bounding boxes. Moving beyond artificial benchmarks, we demonstrate the performance in real-world settings by deploying on an off-the-shelf Arm Cortex-M4 microcontroller with an inexpensive, low-power OV2680 camera. These applications are built using our integrated stack for training and inference of BNNs as well as through the collection, labeling and monitoring of custom designed datasets for TinyML. This combination results in highly-accurate and highly-efficient BNN models for cheap, low-power microcontrollers. We discuss practical tips for developing demanding computer vision applications on microcontrollers and highlight some of the lessons we learnt while developing BNNs for the real-world, such as our emphasis on high-quality, richly annotated data and powerful, hardware-based neural architecture search.

  • YouTube

10:45 am to 12:00 pm

Panel Discussion

tinyML inference SW – where do we go from here?

Moderator: Ian BRATT, Distinguished Engineer & Fellow, Arm

Moderator: Ofer DEKEL, Partner Research Area Manager, Microsoft Research

Chris LATTNER, President, Engineering and Product, SiFive

Tianqi CHEN, CTO, OctoML

Raziel ALVAREZ, Technical Lead, PyTorch, Facebook

Pete WARDEN, Technical Lead, Google

Join a collection of industry experts as we discuss the current state and potential future of tinyML inference SW. What is missing today, what new technologies will impact tinyML inference SW, and how do we go forward as a community?

12:00 pm to 1:00 pm

Room 1

tinyTalks Algorithms and Tools

Session Moderator: Joseph HASSOUN, Sr. Director Neural Processor Architecture, Samsung Semiconductor

Neutrino: A BlackBox Framework for Constrained Deep Learning Model Optimization

Davis SAWYER, Co-Founder & Chief Product Officer, Deeplite

Abstract (English)

Designing modern deep learning-based solutions requires deeper models with a greater number of layers. While a larger, deeper model can provide competitive accuracy, it creates several logistical challenges and unreasonable resource requirements during development and deployment. This has been one of the key reasons for deep learning models not being excessively used in various production environments, especially in tinyML devices. There is an immediate requirement for optimizing and compressing these deep learning models to enable on-device intelligence. In this research, we introduce a black-box framework, Neutrino- for production-ready optimization of deep learning models. The framework provides an easy mechanism for users to provide constraints such as a tolerable drop in accuracy or target size of the optimized models to guide the optimization process. The framework is easy to include in an existing production pipeline and is available as a Python Package or Docker image, supporting PyTorch and Tensorflow libraries. The optimization performance of the framework is shown across multiple benchmark datasets and popular deep learning models, providing a 3-30x reduction in model size (pre-quantization). Furthermore, we will share how the framework is currently used in production and results from several tinyML applications like visual wake words are summarized.

Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware

Chris ELIASMITH, Co-CEO, Applied Brain Research

Abstract (English)

Keyword spotting (KWS) provides a critical user interface for many mobile and edge applications, including phones, wearables, and cars. As KWS systems are typically ‘always on’, maximizing both accuracy and power efficiency are central to their utility. In this work we use hardware aware training (HAT) to build new KWS neural networks based on the Legendre Memory Unit (LMU) that achieve
state-of-the-art (SotA) accuracy and low parameter counts. This allows the neural network to run efficiently on standard hardware (212 µW). We also characterize the power requirements of custom designed accelerator hardware that achieves SotA power efficiency of 8.79 µW, beating general purpose low power hardware (a microcontroller) by 24x and special purpose ASICs by 16x.

  • YouTube

Low-precision Winograd Convolution over Residue Number System

Zhi-Gang LIU, Research Engineer, Arm

Abstract (English)

The low-precision (8 or sub-8bit) convolutional neural networks consume a fraction of memory footprint and power comparing to high-precision models running on mobile or embedded devices. The classical fast Winograd convolution algorithm requires high-precision floating-point operation and thus fails to accelerate the low-precision CNN. So, the current state-of-the-art low-precision convolution is a GEMM based approach relying on im2col or im2row transformations to convert the convolution into GEMM operation and each output demands 9 MAC operations for popular 3×3 filter, 25 ops for 5×5 filter. This work extends the Winograd algorithm to modular arithmetic and explores the optimized implementation of the fast low-precision convolution for ultra-low power machine learning (ML) at the edge. The new approach has arithmetic reduction up to 6.8x corresponding to 16×16 transformation tiles and only relies on int8 or int16 op which are well supported by commodity edge devices. We evaluated the performance of proposal with sub-8bit VGG16 and ResNet50v1 models on ImageNet dataset using Arm cortex A53 cpu and M7 mcu and observed more than 2x convolution latency reduction.

  • YouTube

An Introduction to an Open-Source Fixed-Point Inference Framework – NNoM

Jianjia MA, Research Fellow, University of Southampton

Abstract (English)

Recent years, the optimization on Neural Networks (NN) structures (such as Inception, ResNet) has effectively reduce the overall computational complexity of an NN model, which brings more potential tinyML applications. However, the complexity of deploying NNs also increased due to more arguments in deeper models, complex structure management and memory management. Neural Network on Microcontroller (NNoM) is a high-level inference framework which aims for providing an easy to use interface for developers to deploy complex NN model while the framework can manage the layer structure, content related arguments and memory. The use of NNoM minimizes the effort of deployment thus developers can focus on optimizing the structure to improve model efficiency.
NNoM is written in C (ISO/IEC 9899:1999) for compatibility to different tinyML development environments. We provided a set of Python scripts which calibrate and quantize parses Keras’ model and write into a single C header for inference. NNoM has a unique compiling process to minimize the memory cost and inter-layer switching time. There are two backends supported, a local C backend and the optimized CMSIS-NN provided by ARM.
A typical footprint for a VGG type model NNoM is 8.9k ROM excluding weights. NNoM supports 30+ different layers, including convolutional layer, fully connected layer, recurrent layers, activations, and others. Evaluation of the quantized model is necessary, NNoM provides many evaluation APIs for evaluating the performance and accuracy on the targeted platform. Besides, NNoM comes with many examples including Speech Keyword Spotting and Speech Noise Suppression. Source code available at https://github.com/majianjia/nnom.

  • YouTube
Room 2

Partner Sessions - Edge Hardware

TinyML is more than Model Building

Stuart FEFFER, Co-founder and CEO, Reality AI

Abstract (English)

The “data science”-driven approach to Tiny ML starts with the data. It’s about finding a machine learning model that gives the most accurate predictions based on the data for a target footprint size. But an “engineering”-driven approach to Tiny ML understands that data is a product of instrumentation, and for many applications — particularly those that involve non-visual, non-voice applications — the right approach iterates on the instrumentation and hardware, informed by the accuracy of the ML models. In this talk we’ll talk about “engineering-driven TinyML”, how to use TinyML model performance to improve your hardware design, and demonstrate TinyML tool support for hardware design and sensor optimization. Let me know if you have any questions or feedback.

  • YouTube

Low-power vision processing and signal processing for IoT and edge devices

Dylan MUIR, Director for Algorithms and Applications, SynSense

Abstract (English)

New NN accelerator hardware based on binarised asynchronous communication — sometimes referred to as spiking neural networks (SNNs) — promises to deliver complex sensory processing, including vision processing, for energy-constrained devices. In this session we look at several low-power edge use cases for audio and vision processing, with an overview of training and optimisation approaches, and example deployment on accelerator hardware.

  • YouTube

It’s an SNN future: Are you ready for it? Converting CNN’s to SNN’s

Kristofor CARLSON, Manager of Applied Research, BrainChip Inc.

Abstract (English)

CNN’s take too much time and consume too much power and area for today’s neural networks. With event based Spiking Neural Networks (SNN’s) one is able to address the deficiencies that CNN’s are not able to address with the current CNN architecture. Converting CNN’s to SNN’s will enable today’s designers achieve tomorrow’s efficient and effective technology solutions.

  • YouTube

The intersection of hardware and software and the shift left of algorithm development

Karl FEZER, AI Ecosystem Evangelist, Arm

Abstract (English)

The future of tinyML relies on a strong collaboration between hardware and software developers. Algorithm development adds a layer of complexity and optimization that needs to keep pace with the rapidly advancing field of ML. Arm will lead a discussion on the opportunities we have as an industry to bring these two communities together and build innovative tinyML applications of the future, including increased access to hardware virtualization earlier in the life cycle of target platforms, where algorithms and hardware are developed congruently.

  • YouTube
Room 3

Partner Sessions - Edge Applications

Session Moderator: Steve WHALLEY, CEO, Strategic World Ventures

How Adaptive AI Solves Big Challenges for tinyML

Jags KANDASAMY, Co-Founder & Chief Executive Officer, Latent AI Inc.

Abstract (English)

Edge AI is already powering billions of smart devices generating zettabytes of data. This market dynamic presents tremendous opportunities as well as significant challenges, requiring new Edge AI solutions, especially for enterprise. Come and learn how Adaptive AI can help build and deploy tinyML models.

  • YouTube

Low power computer vision with Eta Compute AI Vision board

Semir HADDAD, Senior Director Product Marketing, Eta Compute

Abstract (English)

In this session, we will explain how to design low-power computer vision applications with Eta Compute’s ECM3532 AI Vision board. We will show some examples and demos of low power vision in action and advise how to get started to build your own battery-operated computer vision solution.

  • YouTube

Pushing the AI Envelope at Cisco

Chris ROWEN, VP of Engineering, Cisco

Abstract (English)

The life-blood of machine learning is data, so it’s natural that a data-obsessed company like Cisco would be ripe with potential AI applications. Come hear about some of the latest initiatives in video and speech AI, especially for large scale collaboration. We’ll dive into applications and methods in speech-based assistants, speech enhancement, gesture recognition, and video segmentation, especially in complex edge + cloud systems. We’ll wrap up with discussion of emerging principles for responsible AI development, especially as real concerns on fairness and privacy are shaping the environment.

  • YouTube
Room 4

Edge ML hardware for every application

Moderator: Kevin KREWELL, Principal Analyst, TIRIAS Research

Karl FEZER, AI Ecosystem Evangelist, Arm

Mallik P. MOTURI, VP Product and Business Development, Syntiant

Abstract (English)

There are a huge range of workloads in tinyML, from sensor based anomaly detection to image object detection. In this session industry analyst Kevin Krewell will lead a discussion on tinyML hardware and how it will evolve to support a range of advanced applications and lower power.

Room 5

tinyML vision challenge

Kwabena AGYEMAN, President & Co-Founder, OpenMV, LLC

Zach SHELBY, Co-founder and CEO, Edge Impulse

Abstract (English)

Introducing the upcoming tinyML computer vision challenge. Learn how to get involved and create inspiring new applications using tinyML on computer vision and win up to $6k in prizes and recognition from the tinyML Foundation!

  • YouTube

1:00 pm to 1:30 pm

Partner Hangouts

The Partner Hangout sessions will be an opportunity to hear from commercial companies in the tiny machine learning ecosystem on market and technology trends they are addressing to enable the exponential growth of tiny machine learning solutions.  These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points.  Expect to hear how problems and gaps are being solved and what still needs to be done and why.

Please see the schedule in the virtual event platform to see each day’s room assignments.

Pacific Daylight Time / UTC-7

8:00 am to 8:15 am

Opening and Awards Ceremony

8:15 am to 9:00 am

Keynote: Efficient Audio-Visual Understanding on AR Devices

Vikas CHANDRA, Director, AI, Facebook Reality Labs

Abstract (English)

Augmented reality (AR) is a set of technologies that will fundamentally change the way we interact with our environment. It represents a merging of the physical and the digital worlds into a rich, context aware user interface delivered through a socially acceptable form factor such as eyeglasses. The majority of these novel experiences in AR systems will be powered by AI because of their superior ability to handle in-the-wild scenarios. A key AR use case is a personalized, proactive and context-aware Assistant that can understand the user’s activity and their environment using audio-visual understanding models. In this presentation, we will discuss the challenges and opportunities in both training and deployment of efficient audio-visual understanding on AR glasses. We will discuss enabling always-on experiences within a constrained power budget using cascaded multimodal models, and co-designing them with the target hardware platforms. We will present our early work to demonstrate the benefits and potential of such a co-design approach and discuss open research areas that are promising for the research community to explore.

9:00 am to 9:45 am

Keynote: Data-Free Model Compression

Mohammad RASTEGARI, Senior AI/ML Technical Leader, Apple

Abstract (English)

Efficient method for compressing a trained neural network without using any data is very challenging. Our data-free method requires 14x-450x fewer FLOPs than comparable state-of-the-art methods. We break the problem of data-free network compression into a number of independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. We show state-of-the-art performance on MobileNetV1 for data-free low-bit-width quantization. We also show state-of-the-art performance on data-free pruning of EfficientNet B0 when combining our method with end-to-end generative methods.

10:00 am to 10:15 am

Today’s Breakout Pitches

10:15 am to 10:45 am

tiny Talks

TinyML Software Runtime for Hybrid Multicore Architecture

Nilanjan ROYCHOWDHURY, Principal Software Architect, Eta Compute

Abstract (English)

A lot of emphasis in tinyML has been in designing the best neural network and optimizing it to reduce the number of operations and memory needs. Yet, training a very efficient neural network is only one piece of the equation for TinyML. The other piece is how to run it on actual embedded hardware.
Indeed, the tinyML hardware is very often complex, including many cores for the sake of efficiency. Moreover, because sensor processing requires a combination of signal processing, procedural computing and neural network acceleration, the hybrid multicore architecture is becoming popular for edge AI hardware with a combination of heterogenous cores: CPU, DSP and NPU.
To run efficiently on these hybrid multicore systems, there must be a runtime that allocates resources, core and memory, in the most optimized way, while minimizing processing overhead and memory transfers.
In this presentation we will review the various ways the industry is addressing this challenge and how Eta Compute solved it with the TENSAI Flow runtime and executors.

Insights from a Multi-Purpose Self-Learning Smart Sensor

Kaustubh GANDHI, Senior Product Manager Software, Bosch Sensortec

Abstract (English)

Edge-AI devices need to ensure context-sensitive adaptation and real-time personalization for end-users. In this talk, we introduce some insights gained while designing Bosch’s novel self-learning sensor.

The sensor’s self-learning function enables the device to learn new motion patterns in-use directly from the end-user, to personalize built-in patterns directly for an end-user and automatically classify and count the movement types in real-time, all within the sensor itself.

In spite of delivering an AI experience, the function runs on sensor’s co-processor with ca. 300 µA and memory under 50 KB, while yet delivering over 90% accuracy for personalized home workouts. This is significant improvement for learning at the edge on wrist and in-ear wearables.

Secondly, as the sensor is capable of switching to a different function in run-time, sensor purpose can change depending on user’s context, such as the orientation and position tracking during running, style classification during swimming or personalization during fitness workouts.

Thirdly, the design allows the self-learning feature to utilize an expandable list of virtual sensors from sensor data fusion (e.g. quaternions) and peripherals (e.g. magnetometer, pressure sensors).
This enables faster and robust pattern detection from an expandable list of input sources, chosen according to target application, as against to pre-programmed AI solutions with fixed inputs.

In summary, in order to realize true potential of edge-AI, it is important to design the software with capabilities to learn and adapt to the end-user while maintaining scalability for diverse applications.

  • YouTube

10:45 am to 12:00 pm

Breaking News on Disruptive Products and Tools

Sean MCGREGOR, Member of Technical Staff, Syntiant

Ravishankar SIVALINGAM, Sr. Staff Engineer/Manager, Qualcomm

Jan JONGBOOM, CTO, Edge Impulse

Meng LI, Senior AI Research Scientist, Facebook Inc.

Harsha VISWANATH, Principal AI Technical Leader - Azure Edge Device, Platform and Solutions Group, Microsoft

Abstract (English)

Syntiant:
“TinyML Solution Power without Tiny Models: the NDP120″
Syntiant made international headlines in 01/2021 announcing that its Syntiant® Core 2™ neural network inference engine can process multiple concurrent heterogeneous networks simultaneously while drawing < 1mW power consumption.

Embedded in the Syntiant® NDP120™, the company’s newest generation deep learning processor for audio and sensor applications in edge devices, the Syntiant Core 2, delivers 25x the tensor throughput of the Syntiant Core 1™ found in the Syntiant® NDP100™ and Syntiant® NDP101™ devices, which achieve 100x efficiency and 10x the throughput over traditional CPUs and DSPs.

Whether running far-field speech processing applications such as audio filtering and echo cancelation to multi-modal sensor fusion and infrared detection, data scientists at Syntiant can 1) explain how a purpose-built compute engine for neural inference solves the power consumption problem; and 2) illustrate how edge AI is creating opportunities to better connect people through “smarter” devices, free from the cloud, and with minimal drain on battery consumption.”

Qualcomm Technologies Inc
“New tinyML use-case: Ultra-low Power Eyetracking with Qualcomm QCC112″

In resource-constrained AR/VR applications, accurate pupil detection enables downstream applications such as eyetracking and iris recognition, by drastically reducing the input image size and allowing for focused compute on relevant regions of interest. Qualcomm QCC112 is an ultra-low power computer vision sensor capable of running real-time object detection at ~1 mW system power. We showcase pupil detection operating at 60-100 fps on this hardware, with hardware accelerated object detection on qqVGA resolution (160 x 120 pixels) grayscale input images. The pupil detection model is approximately 40 kB and is robust to low light, which allows for accurate detection at less than 1 millisecond exposure with infrared LED illumination. The model is able to precisely localize the pupil despite occlusions by eyelids/eye corners. We also provide training tools for users to train custom models on their own datasets if so desired.”

Edge Impulse
EON Tuner: Find the best model with your device constraints in mind, from signal processing parameters to ML architecture

Around the date of the TinyML summit we’ll be releasing EON Tuner – it’s an AutoML pipeline specifically for sensor data on constrained devices. Rather than just finding the right hyperparameters within a neural network it has a search space that includes signal processing algorithms + parameters to preprocess the data, and can consider both classic ML and neural networks for classification. And to prevent finding models that won’t fit your usecase you set device constraints at the start (e.g. needs to run 5x a second on a Cortex-M0+ @ 48MHz in max. 20K RAM).

Facebook Inc
Improve weight-sharing NAS with better search space and better supernet training

Weight-sharing neural architecture search (NAS) is effective to automate efficient model design. Weight-sharing NAS builds a supernet that assembles all the architectures as sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily depends on 1) the search space design, 2) the sub-network sampling strategy, and 3) in-place distillation. Though important, these key factors are not well studied in prior works. In this presentation, we introduce our recent works on improving the weight-sharing NAS. We introduce a multi-scale search space to better capture the scale variance prominent in the image inputs. We then introduce a new sampling strategy that focuses supernet training to sub-networks on the pareto front. We further propose a generalized alpha divergence for distillation to guide the supernet optimization. The discovered model family achieves SOTA results on various visual tasks, including image classification (on ImageNet), bottom-up pose estimation (on Coco and CrowdPose).

Azure Percept & TinyML

While most developers and organizations can stand behind the benefits of edge AI, they often face costly and timely challenges when it comes to end-to-end development, deployment, and management. These potential roadblocks include training AI models, creating low power yet high-performance hardware, seamless provisioning of workloads, management and updating of devices and applications, integrating with existing applications, and helping to ensure the data and models are secured. That is why we’re introducing Azure Percept—the most comprehensive, easy-to-use platform with added security for creating edge AI solutions. We will introduce Azure Percept, it’s capabilities with respect to ML and we will also talk about how we can leverage this to scale down to a smaller footprint device which can potentially run using AA batteries. Here we will address the requirements and challenges in terms of not only the ML models required to run in this environment but also the overall system requirements.

12:00 pm to 1:00 pm

Room 1

tiny Talks & Partner Sessions – Edge Applications

Environmental Noise Classification on Microcontrollers

Jon NORDBY, CTO, Soundsensing

Abstract (English)

Noise is a growing problem in urban areas, and according to the WHO is the second environmental cause of health problems in Europe.
Noise monitoring using Wireless Sensor Networks are being applied in order to understand and help mitigate these noise problems. It is desirable that these sensor systems, in addition to logging the sound level, can indicate what the likely sound source is. Performing such Environmental Noise Classification directly in the sensor is desirable in order to avoid sending audio data to the cloud, which may have negative impacts on data transfer amounts, battery lifetime and privacy.

In this talk we will explain how we tested several different Convolutional Neural Networks for this task on the STM32L476 low-power microcontroller, and the results we were able to achieve on the Urbansound8k dataset. Several techniques such Depthwise-Separable convolutions, striding for downsampling, reducing input dimensionality was tested in order to make the CNN models as efficient as possible, and these will likely be useful also for other audio or image tasks.

The research was initially carried out as part of a master thesis at the Norwegian University of Life Sciences (NMBU). Since then, we have continued to work on this topic at Soundsensing, and we will share some of the progress and challenges in bringing this kind of research to market.

Real-World Performance Analysis of Visual Wake Words

Luke BERNDT, Senior Director, In-Q-Tel

Abstract (English)

The Google Visual Wake Words paper (Chowdhery et al., 2019) proposes techniques for creating object recognition models appropriate for microcontrollers. The paper demonstrates an accurate person detection model trained using the Microsoft Common Objects in Context (COCO) dataset. Because the COCO dataset is built on photographs found internet photography sites and because these images are composed by a photographer, the COCO dataset, we hypothesize, may be ill-suited for tinyML visual sensors. Typical visual sensors often have unusual perspectives of an object, which can result in poor object recognition.
We therefore investigated model performance on classes other than persons, evaluated performance by deploying the model on hardware in the wild, and then built a novel dataset for real world testing. In certain real-world environments, we found a decrease in accuracy of over 50%. Additionally, we investigated transfer learning and techniques for identifying blind spots in models to better target the augmentation of objects in the dataset. We find that extra care is needed when using general-purpose image datasets, like COCO, to train models for tinyML based visual sensors.

  • YouTube

Always watching, sensing and listening by Himax WE-I Plus at the edge

Mark CHEN, Vice President, Himax Technologies

Abstract (English)

Himax’s WE-I Plus, an ultra-low power AI accelerator-embedded processor, is designed to accommodate a wide selection of TinyML Neural Network models with programmable DSP running up to 400MHz clock and 2MB internal SRAM. WE-I Plus supports TensorFlow Lite for Microcontrollers framework and is able to run inferences such as open-source Google Examples that are available at Google’s Github. Facilitated by ultra-low power always-on image sensor, and ultra-low power AI processor with built-in support for Google Tensor flow lite for microcontrollers framework, WE-I Plus has been proven to make AI algorithm development easier than ever.

  • YouTube

Production Worthy Tools for Creating AI at the IoT Edge

Christopher B. ROGERS, CEO, SensiML Corp

  • YouTube
Room 2

Partner Sessions - Processing Engines

Tiny and Flexible ML with Lattice FPGA

Sreepada. V. HEGADE, Senior Manager, Lattice Semiconductor

Abstract (English)

The inference of neural networks with resource constrained devices that is fueling the growth of ML at edge is part of entire solution that involves other essential components like data aggregation, augmentation and post processing of inference output. Along with this, introduction of new network topologies at rapid phase to meet every growing demand for accuracy and performance, requires that solutions that supports “Tiny ML” is flexible. Also, the engine that does network inference needs to be tuned for different type of network topology. For example, MobileNet introduced to efficiently implement neural networks on resource constraint devices cannot be efficiently implemented with NN engines designed for normal convolution.
The configurable nature of FPGA devices allow for quick adoption of emerging neural network topologies. The flexible IO also helps to implement data aggregation and other peripheral operations. The soft core implemented on Lattice FPGAs can be changed and/or optimized depending on target network topology. In this talk we discuss how we optimize network topologies and software compiler to get best out of FPGA for end applications.

  • YouTube

Machine Learning in Wireless IoT Applications

Peter SCHULMEYER, Senior Director, Silicon Labs

Abstract (English)

Over the next few years, artificial intelligence and machine learning are expected to become ubiquitous for devices that are part of the Internet of Things [IoT].

Silicon Labs is addressing the challenges by making it possible to run machine learning models on small, low-power wireless SoCs to address this market. We will cover market tends and how to identify IoT applications where single chip solutions that integrate machine learning and wireless connectivity make sense.

  • YouTube

System Level Energy Considerations for Battery Powered AI

Afshin NIKTASH, Senior Principal Software Engineer, Maxim Integrated

Abstract (English)

To bring complex AI inferencing to battery powered applications, several approaches are available to improve the energy consumption of convolutional neural network (CNN) computations. This presentation will overview the MAX78000 AI microcontroller and outline a number of system (not just ML) factors that can be considered to achieve audio and visual AI inferencing at battery power levels, and specific examples of face identification and keyword spotting will be highlighted.

  • YouTube

TinyML: The power / cost conundrum

Mark LIPPET, CEO and President , XMOS

Abstract (English)

When we think about cost, we have our eye firmly on the customer: what are the products and features they need now and, in the future – and can we develop compelling solutions at a price point that’s attractive for both of us? When it comes to power, needs and attitudes are much more nuanced. There’s a clear focus on battery life and sustainability, but power is often viewed at a component level versus a system level.

Focusing on low power at a single system component level can lead customers to overlook the energy consumption / cost of the system as a whole (as well as the long-term cost and value it brings to the end user).

In this session, we will explore a more representative way of measuring energy consumption in TinyML processors, and the system trade-offs that must be made to minimise energy consumption and lower cost.

  • YouTube
Room 3

Always-on AI vision: The path to disruptive, high-scale applications

Moderator: Jeff HENCKELS, Director, Product Management & Business Development, Qualcomm

Peter BERNARD, Sr. Director, Silicon and Telecom, Azure Edge Devices, Platform & Services, Microsoft

Lian Jye SU, Principal Analyst, ABI Research

Edwin PARK, Principal Engineer, QUALCOMM Inc

Evan PETRIDIS, Chief Product Officer, EVP of Systems Engineering, Eta Compute

Tony CHIANG, Sr. Director of Marketing, Himax Imaging

Abstract (English)

Vision is the most challenging AI/ML task to tackle in power and resource-constrained battery-operated devices. This panel will focus on the state-of-the-art and the innovation roadmap ahead, discussing which/how/when specific R&D breakthroughs will enable disruptive, high-scale use cases and applications in the future.
• Which use cases/applications are driving always-on AI vision? Both today and in the future?
• What are the biggest gaps in achieving the long-term potential of always-on AI vision, and how is industry addressing it?
• What does the innovation roadmap look like? When and how will technology advances open up new applications and drive scale, adoption, and new investments?

  • YouTube
Room 4

ML in Smart Homes and Buildings

Stacey HIGGINBOTHAM, Founder , Stacey on IoT

Zach SHELBY, Co-founder and CEO, Edge Impulse

Abstract (English)

We have seen a huge impact in smart homes and buildings from IoT technology and more recently from voice assistants. Stacey Higginbotham, founder of Stacey on IoT, will lead of discussion on what applications we might expect to see next thanks to tinyML in the home and office.

  • YouTube

1:00 pm to 1:30 pm

Partner Hangouts

The Partner Hangout sessions will be an opportunity to hear from commercial companies in the tiny machine learning ecosystem on market and technology trends they are addressing to enable the exponential growth of tiny machine learning solutions.  These will not be detailed company product or marketing talks but more interesting discussions on what these companies see happening given their particular vantage points.  Expect to hear how problems and gaps are being solved and what still needs to be done and why.

Please see the schedule in the virtual event platform to see each day’s room assignments.

8:00 am to 1:20 pm

tinyML Research Symposium

Schedule for the inaugural tinyML Research Symposium.

Schedule subject to change without notice.

Committee

Marian VERHELST

Technical Program Chair

KU Leuven

Peter VAJDA

Technical Program Vice-Chair

Facebook

Edith BEIGNÉ

Facebook

Ian BRATT

Arm

Ofer DEKEL

Microsoft Research

Ira FELDMAN

tinyML Foundation

Adam FUKS

NXP

Evgeni GOUSEV

General Chair

Qualcomm Research

Joseph HASSOUN

Samsung Semiconductor

Kurt KEUTZER

University of California, Berkeley

Boris MURMANN

Stanford University

Chris ROWEN

Cisco

Moritz SCHERER

ETH Zürich

Zach SHELBY

Edge Impulse

Steve WHALLEY

Strategic World Ventures

Wei XIONG

Hoi-Jun YOO

KAIST

Huichu LIU

Facebook Inc.

Speakers

Kwabena AGYEMAN

OpenMV, LLC

Raziel ALVAREZ

Panelist

Facebook

Michael Azoff

Kisaco Research

Elad BARAM

Emza Visual Sense

Luca BENINI

ETHZ | University of Bologna

Peter BERNARD

Microsoft

Luke BERNDT

In-Q-Tel

Kristofor CARLSON

BrainChip Inc.

Lee CARTER

Momenta Ventures

Luis CEZE

Panelist

OctoML

Sek CHAI

Latent AI

Vikas CHANDRA

Facebook Reality Labs

Mark CHEN

Himax Technologies

Song CHEN

Tutorial

Facebook Reality Labs Research

Tianqi CHEN

Panelist

OctoML

Hoon CHOI

Lattice Semiconductor

Bill COUGHRAN

Panelist

Sequoia Capital

Martin CROOME

GreenWaves

Chris ELIASMITH

Applied Brain Research

Stuart FEFFER

Reality AI

Karl FEZER

Arm

Kaustubh GANDHI

Bosch Sensortec

Semir HADDAD

Eta Compute

Song HAN

MIT EECS

Sreepada. V. HEGADE

Lattice Semiconductor

Koen HELWEGEN

Plumerai

Jeff HENCKELS

Qualcomm

Stacey HIGGINBOTHAM

Stacey on IoT

Jan JONGBOOM

Tutorial

Edge Impulse

Kate KALLOT

NVIDIA

Jags KANDASAMY

Latent AI Inc.

Kurt KEUTZER

University of California, Berkeley

Laszlo KINDRAT

XMOS

Abhijit KHOBARE

Tutorial

Qualcomm Technologies, Inc. (QTI)

Kevin KREWELL

TIRIAS Research

Samir KUMAR

Panelist

M12

Chris LATTNER

Panelist

SiFive

Meng LI

Facebook Inc.

Mark LIPPET

XMOS

Zhi-Gang LIU

Arm

Jianjia MA

University of Southampton

Johan MALM

imagimob

Diana MARCULESCU

The University of Texas at Austin

Sean MCGREGOR

Syntiant

Orlando MOREIRA

GrAI Matter Labs

Moshe HAIUT

DSP Group

Mallik P. MOTURI

Syntiant

Dylan MUIR

SynSense

Afshin NIKTASH

Maxim Integrated

Jon NORDBY

Soundsensing

Edwin PARK

QUALCOMM Inc

Chirag PATEL

Tutorial

Qualcomm Technologies, Inc. (QTI)

Mohammad RASTEGARI

Apple

Christopher B. ROGERS

SensiML Corp

Nilanjan ROYCHOWDHURY

Eta Compute

Chris ROWEN

Cisco

Brandom RUMBERG

Aspinity

Davis SAWYER

Deeplite

Moritz SCHERER

ETH Zürich

Leslie SCHRADIN

Qeexo, Co.

Peter SCHULMEYER

Silicon Labs

Zach SHELBY

Edge Impulse

Daniel SITUNAYAKE

Tutorial

Edge Impulse

Ravishankar SIVALINGAM

Qualcomm

Thalia SPEAKER

WILDLABS

placeholder

Lian Jye SU

ABI Research

Eileen TANGHAL

Panelist

In-Q-Tel

Urmish THAKKER

SambaNova Systems Inc

Tianqi CHEN

OctoML

Vikrant TOMAR

Fluent.ai

Harsha VISWANATH

Platform and Solutions Group, Microsoft

Sameer WADHWA

Qualcomm

Pete WARDEN

Tutorial

Google

Behdad YOUSSEFI

Areanna AI

Sponsors

( Click on a logo to get more information)