To date, most ultra-low power machine learning (ML) applications at the edge are trained “off device” (typically in the cloud where virtually unlimited computing assets are available) while the edge devices perform the inferencing. Many successful applications have been deployed in this fashion as demonstrated by the rapid growth of the tinyML community and the support from the industry.
It’s time to move to the next milestone: On Device Learning (ODL). The ambition is to replace off-device training with localized training and adaptive “intelligence”. Industry and academic experts are actively exploring how to fit better the edge devices and applications into time-varying environments in which they are expected to be deployed for a long time.
To support and further accelerate this ground-breaking evolution, the tinyML Foundation created the On Device Learning (ODL) working group which enthusiastically started its activity. The ODL working group is very excited to invite everyone to join the first-ever virtual event to learn from experts about their current research and state-of-the-art solutions for ODL!
Never before has ML been characterized by such innovative waves of technology. And the tinyML Foundation is accelerating the growth of this vibrant ecosystem of skills and technology resulting in new applications and end uses.
8:00 am to 11:00 am
On-Device Learning Under 256KB Memory
Song HAN, Assistant Professor, MIT EECS
On-device learning enables the model to adapt to new data collected from the sensors. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make finet uning neural networks possible with only 256KB of memory. On-device learning faces two unique challenges: the quantized graphs of neural networks are hard to optimize due to mixed bit-precision and the lack of normalization; the limited hardware resource (memory and computation) does not allow full backward computation. To cope with the optimization difficulty, we propose quantization-aware scaling to calibrate the gradient scales and stabilize quantized training. To reduce the memory footprint, we propose sparse update to skip the gradient computation of less important layers and sub-tensors. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Our framework is the first practical solution for on-device transfer learning of visual recognition on tiny IoT devices (e.g., a microcontroller with only 256KB SRAM), using less than 1/100 of the memory of existing frameworks and matching the accuracy of cloud training+edge deployment for the tinyML application VWW. Our study suggests that tiny IoT devices not only can perform inference but also continuously adapt to new data for lifelong learning.
Neural Network ODL for Wireless Sensor Nodes
Hiroki MATSUTANI, Professor, Keio University
In real-world TinyML applications, their accuracy is often affected by various environmental factors, such as noises, location/calibration of sensors, and time-related changes. This talk will introduce a neural network based on-device learning (ODL) approach to address this issue without going deep. Our approach is different from de facto backpropagation based training but tailored for low-end IoT devices. We will introduce its algorithm and implementation on wireless sensor nodes consisting of Raspberry Pi Pico, low-power wireless (LoRa) module, small battery, magnet, and sensors. We will show some case studies and demonstrate that retraining by the ODL algorithm significantly improves an anomaly detection accuracy at a noisy environment while saving computation and communication costs for low power.
Scalable, Heterogeneity-Aware and Trustworthy Federated Learning
Yiran CHEN, Professor, Duke University
Federated learning has become a popular distributed machine learning paradigm for developing on-device AI applications. However, the data residing on the devices is intrinsically statistically heterogeneous (i.e., non-IID data distribution) and the mobile devices usually have limited communication bandwidth to transfer local updates. Such statistical heterogeneity and communication limitation are two major bottlenecks that hinder applying federated learning in practice. In addition, recent works have demonstrated that sharing model updates makes federated learning vulnerable to inference attacks and model poisoning attacks. In this talk, we will present our recent works on novel federated learning frameworks to address the scalability and heterogeneity issues simultaneously. In addition, we will also reveal the essential reason of privacy leakage and model poisoning attacks in federated learning procedures, and provide the defense mechanisms accordingly towards trustworthy federated learning.
On-Device Learning For Natural Language Processing with BERT
Warren J. GROSS, Professor, McGill University
Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. In this talk we discuss techniques for fine-tuning BERT models that reduces fine-tuning time and optimizes memory accesses on mobile GPUs, while maintaining accuracy of the deep neural networks.
Is on-device learning the next “big thing” in TinyML?
Manuel ROVERI, Associate Professor, Politecnico di Milano
On-device tiny machine learning represents one of the most challenging and relevant research directions in Tiny Machine Learning (TinyML) with a strong impact from both the theoretical and the technological perspective. Indeed, On-device tiny machine learning will allow the design of smart objects and devices that will be able to learn TinyML models during the operational life, hence being able to adapt to evolving data-generating processes (e.g., due to periodicity or seasonality effect, faults or malfunctioning affecting sensors or actuators, or changes in the users’ behavior), a common situation in real-world application scenarios.
The aim of this talk is to explore on-device TinyML and introduce an on-device TinyML learning algorithm to support the incremental learning of TinyML models and their adaptation in presence of evolving data-generating processes directly on-device. Experimental results on two application scenarios and two off-the-shelf hardware platforms show the feasibility and effectiveness of the proposed solution.
ODL Professors Panel
In this part of the agenda, the professors will be asked to share their views and recommendations for the industries and developers to take benefit. Subjects of discussion will be how to approach ODL methodologically with tools/frameworks/libraries, what are their efforts in educating a new generation of TinyML engineers, what the fundamental skills the industry workforce is required to deal with to be innovative on ODL, the kind of impact ODL will have on the ML ecosystem/on the applications, and the research perspectives or the fundamental questions to invest on for further advancements.
8:00 am to 11:00 am
TinyML ODL in industrial IoT
Haoyu REN, PhD Student, Technical University of Munich
Tiny machine learning (TinyML) has gained widespread popularity in the industry where machine learning (ML) is democratized on ubiquitous Internet of Things (IoT) devices, processing sensor data everywhere in real-time. Challenged by the constraints on power, memory, and computation, TinyML has achieved significant advancement in the last few years. However, most current TinyML solutions are based on batch/offline setting and support only the machine learning inference on IoT devices. Besides, TinyML ecosystem is fragmented. To deploy TinyML in the industry, where mass deployment happens, we must consider the hardware and software constraints, ranging from available onboard sensors and memory size to ML-model architectures and runtime platforms. To address these issues, this talk introduces our relevant research effort at Siemens, including the TinyOL (TinyML with Online-Learning) system to enable incremental on-device training, the synergy of TinyML and complex event processing (CEP) to adapt on-device ML models and CEP reasoning rules flexibly on the fly, and a semantic management system to facilitate the joint management of TinyML models and IoT devices at scale.
NeuroMem® wearable, hardwired sub milliwatt real time machine learning with wholy parallel access to “neuron memories” fully explainable
Guy PAILLET, Co-founder, General Vision
NeuroMem is a fully hardwired, instant, incremental “learn and recognize” ML technology featuring response such as identified (one or many categories), uncertain and unknown (e.g. anomalies detection among others) with incremental learning without software of the unknown. NeuroMem fulfill the request of DARPA for explainable AI (capability of
“behavior justification”). The ZISC was born at Paris in IBM Labs in 1993 co-patented by IBM Corp. and Guy Paillet as ZISC (https://en.wikipedia.org/wiki/No_instruction_set_computing )
The TinyML technology had followed the semiconductor geometry evolution from 1 micron 36 neurons (ZISC36) to the incoming ANM5500 neurons (55 nm by TSMC) and more to come. Thousands of NeuroMem chips are in operation at customers site around the World mostly since 2010 but some since 2000 (some at sea on fishing vessel trained by sailors)
It is possible to put the real time learning and recognition inside of miniature devices such as a MEMS microphone, ball bearing or an image sensor (patented MIPD – Monolithic Image Perception Device). Accurate learning and recognition can be achieved at either 25 MHZ but can be also efficient at few dozens of hertz. Neurostamp, featuring a small low power FPGA and 4032 is a perfect example of the smallest device capable of making
anomalies detection which can be generalized in factories, automotive or more. Typically depending on operation power consumption, for exemple on NM500 (on the NeuroStamp can be from 50 microwatt to 3 milliwatts as there is no sofwareuning operation can be fully asynchronous.
Using Coral Dev Board Micro for ODL innovations
Bill LUAN, Senior Program Manager, Google
Coral, the edge AI platform from Google, just released a unique microcontroller, the Coral Dev Board Micro, which leverages TFLite and TF Micro ML models for low-power and dual-mode on-device ML operations, further expanding the capabilities and flexibilities of the Coral product line, to enable developers and business for more edge AI innovations. This session will introduce the product details as well as give a brief overview of the Coral product suite.
Platform for Next Generation Analog AI Hardware Acceleration
Kaoutar EL MAGHRAOUI, Principal Research Scientist, IBM T.J Watson Research Center
The next step in the evolution of specialized hardware for AI is rooted in addressing the performance efficiency loss from data movement between computational units and memory. This can be achieved through analog in-memory computing which eliminates the Von Neuman bottleneck and allows highly parallel computations directly in memory using memristive crossbar arrays. Although memristive crossbar arrays are a promising future Analog technology for accelerating AI workloads, their inherent noise, and non-idealities demand improved algorithmic solutions.
We introduce the IBM Analog Hardware Acceleration Kit , a first-of-a-kind open-source toolkit to simulate crossbar arrays from within PyTorch, to conveniently estimate the impact of material properties and non-idealities on the accuracy for arbitrary ANNs (freely available at https://github.com/IBM/aihwkit). This platform allows understanding, evaluating, and experimenting with emerging analog AI accelerators. Our roadmap and capabilities include algorithmic innovations from IBM Research around hardware-aware training, mixed-precision training, advanced analog training optimizers using parallel rank-update in analog, and allowing inference on real research Phase-change memory (PCM)-based analog AI chip prototypes, as well as allowing the research community to extend the toolkit with new devices, analog presets, algorithms, etc.
We will show an interactive demo of how the toolkit can be used online through our web front-end cloud composer. The composer provides a set of templates and a no-code experience to introduce the concepts of analog AI, configure experiments, simulate training and inference experiments on memristive hardware.
Enabling on-device learning at scale
Joseph SORIAGA, Sr. Director of Technology, Qualcomm
The need for intelligent, personalized experiences powered by AI is ever-growing. Our devices are producing more and more data that could help improve our AI experiences. How do we learn and efficiently process all this data from edge devices while maintaining privacy? On-device learning rather than cloud training can address these challenges. In this webinar, we’ll discuss:
- Why on-device learning is crucial for providing intelligent, personalized experiences without sacrificing privacy
- Our latest research in on-device learning, including few-shot learning, continuous learning, and federated learning
- How we are solving system and feasibility challenges to move from research to commercialization
Training models on tiny edge devices
Valeria TOMASELLI, Senior Engineer, STMicroelectronics
The current approach for deploying Machine Learning (ML) in tiny edge devices mainly assumes a cloud-based paradigm, where the ML models are trained in the cloud, where resources are nearly unlimited, and then deployed into edge devices (e.g. MCUs, sensors) for inference execution only.
However, this paradigm has also some drawbacks that will become more pronounced as Machine Learning becomes an ubiquitous aspect of our lives: the privacy of data is not preserved, a stable connection is required and the trained models are not tailored for the specific device. For these reasons, the need to empower edge devices with training functionalities is arising as an important emerging challenge.
Different scenarios are possible to address this topic: adapt cloud-based strategies to cope with constrained memory and computational resources of the edge device or exploit some new paradigms.
In this talk we firstly present a brief review of some research activities that are being carried on in STMicroelectronics, in the context of on-device learning: personalization of a pre-trained model, by means of back-propagation, and model update by means of partial trainings on multiple tiny edge devices. Then we focus on training from scratch on the tiny edge device, introducing an approach based on Echo State Network (ESN) and ridge regression, that enables the on-device training directly on the MCUs of the STM32 family.
Schedule subject to change without notice.
Christopher B. ROGERS
Qualcomm Research, USA
NSF AI Institute for Edge Computing Leveraging the Next-generation Networks
Warren J. GROSS
Kaoutar EL MAGHRAOUI
IBM T.J Watson Research Center
Technical University of Munich
Politecnico di Milano