Papers are posted below in the schedule. Recordings will also be linked as they become available. Please subscribe to our tinyML YouTube channel to get notifications when the recordings are up.
Tiny machine learning (tinyML) is a fast-growing field of machine learning technologies enabling on-device sensor data analytics at extremely low power, typically in the milliwatt range and below. The tinyML ecosystem is fueled by (i) emerging commercial applications and new systems concepts on the horizon; (ii) significant progress on algorithms, networks, and models down to 100 kB and below; and (iii) current low-power applications in vision and audio that are already becoming mainstream and commercially available. There is growing momentum demonstrated by technical progress and ecosystem development in all of these areas. The tinyML research symposium serves as a flagship venue for related research at the intersection of machine learning applications, algorithms, software, and hardware in deeply embedded machine learning systems.
The tinyML Research Symposium is held in conjunction with the tinyML Summit, the premier annual gathering of senior level technical experts and decision makers representing fast growing global tinyML community.
Hyatt Regency San Francisco Airport
1333 Bayshore Highway, Burlingame, CA 94010
9:00 am to 9:30 am
Welcome and Opening Statement
9:30 am to 10:50 am
tinyML Applications and Systems
Session Chair: Zain ASGAR, Adjunct Professor Of Computer Science, Stanford University
Session Chair: Wolfgang FURTNER, Distinguished Engineer System Architecture, Infineon Technologies
Millimeter-Scale Ultra-Low-Power Imaging System for Intelligent Edge Monitoring
Andrea BEJARNO-CARBO, PhD Student, University of Michigan, Ann Arbor MI
Millimeter-scale embedded sensing systems have unique advantages over larger devices as they are able to capture, analyze, store, and transmit data at the source while being unobtrusive and covert. However, area-constrained systems pose several challenges, including a tight energy budget and peak power, limited data storage, costly wireless communication, and physical integration at a miniature scale. This paper proposes a novel 6.7×7×5mm imaging system with deep-learning and image processing capabilities for intelligent edge applications, and is demonstrated in a home-surveillance scenario. The system is implemented by vertically stacking custom ultra-low-power (ULP) ICs and uses techniques such as dynamic behavior-specific power management, hierarchical event detection, and a combination of data compression methods. It demonstrates a new image-correcting neural network that compensates for non-idealities caused by a mm-scale lens and ULP front-end. The system can store 74 frames or offload data wirelessly, consuming 49.6μW on average for an expected battery lifetime of 7 days.
How to Manage Tiny Machine Learning at Scale – An Industrial Perspective
Haoyu REN, PhD Student, Technical University of Munich
Tiny machine learning (TinyML) has gained widespread popularity where machine learning (ML) is democratized on ubiquitous microcontrollers, processing sensor data everywhere in real-time. To manage TinyML in the industry, where mass deployment happens, we consider the hardware and software constraints, ranging from available onboard sensors and memory size to ML-model architectures and runtime platforms. However, Internet of Things (IoT) devices are typically tailored to specific tasks and are subject to heterogeneity and limited resources. Moreover, TinyML models have been developed with different structures and are often distributed without a clear understanding of their working principles, leading to a fragmented ecosystem. Considering these challenges, we propose a framework using Semantic Web technologies to enable the joint management of TinyML models and IoT devices at scale, from modeling information to discovering possible combinations and benchmarking, and eventually facilitate TinyML component exchange and reuse. We present an ontology (semantic schema) for neural network models aligned with the World Wide Web Consortium (W3C) Thing Description, which semantically describes IoT devices. Furthermore, a Knowledge Graph of 23 publicly available ML models and six IoT devices were used to demonstrate our concept in three case studies, and we shared the code and examples to enhance reproducibility here.
Distributed On-Sensor Compute System for AR/VR Devices: A Semi-Analytical Simulation Framework for Power Estimation
Jorge GOMEZ, Research Scientist, Reality Labs, Meta
Augmented Reality/Virtual Reality (AR/VR) glasses are widely foreseen as the next generation computing platform. AR/VR glasses are a complex “system of systems” which must satisfy stringent form factor, computing-, power- and thermal- requirements. In this paper, we will show that a novel distributed on-sensor compute architecture, coupled with new semiconductor technologies (such as dense 3D-IC interconnects and Spin-Transfer Torque Magneto Random Access Memory, STT-MRAM) and, most importantly, a full hardware-software co-optimization are the solutions to achieve attractive and socially acceptable AR/VR glasses. To this end, we developed a semi-analytical simulation framework to estimate the power consumption of novel AR/VR distributed on-sensor computing architectures. The model allows the optimization of the main technological features of the system modules, as well as the computer-vision algorithm partition strategy across the distributed compute architecture. We show that, in the case of the compute-intensive machine learning based Hand Tracking algorithm, the distributed on-sensor compute architecture can reduce the system power consumption compared to a centralized system, with the additional benefits in terms of latency and privacy.
IMU Preintegrated Features for Efficient Deep Inertial Odometry
Rooholla Khorrambakht, Robotics Researcher, K.N.Toosi University of Technology
MEMS Inertial Measurement Units (IMUs) as ubiquitous proprioceptive motion measurement devices are available on various everyday gadgets and robotic platforms. Nevertheless, the direct inference of geometrical transformations or odometry based on these data alone is a challenging task. This is due to the hard-to-model imperfections and high noise characteristics of the sensor, which has motivated research in formulating the system as an end-to-end learning problem, where the motion patterns of the agent are exploited to facilitate better odometry estimates. However, this benefit comes at the cost of high computation and memory requirements, which makes deep inertial odometry unsuitable for low-power and edge applications. This paper attempts to address this conflict by proposing the IMU preintegrated features as a replacement for the raw IMU data in deep inertial odometry. Exploiting the manifold structure of the IMU motion model, these features provide a temporally compressed motion representation that preserves important geometrical information. We demonstrate the effectiveness and efficiency of this approach for the task of inertial odometry on two applications of pedestrian motion estimation and autonomous vehicles. We show a performance improvement compared to raw inputs while reducing the computational burdens. Additionally, we demonstrate the efficiency of this approach through an embedded implementation on a resource-constrained microcontroller.
10:50 am to 11:10 am
11:10 am to 11:50 am
Session 1 - Short papers
Session Chair: Amey Kulkarni, Senior Software Engineer, NVIDIA
A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators
Bingqian Lu, UC Riverside
Shaolei Ren, UC Riverside
Yiyu, Shi, Notre Dame
Zheyu Yan, Notre Dame
Combinatorial-Randomness-Based Power Amplifier Datasets with RF Fingerprint Classification
Jiachen Xu, Carnegie Mellon University
Ethan Chen, Carnegie Mellon University
Vanessa Chen, Carnegie Mellon University
Yuyi Shen, Carnegie Mellon University
Jinho Yi, Carnegie Mellon University
Combinatorial RL-based Scheduling for Pipelined Edge TPUs
Jiaqi Yin, University of Utah
Yingjie Li, University of Utah
Cunxi Yu, University of Utah
Qiwei Yuan, University of Utah
A Fast Network Exploration Strategy to Profile Low Energy Consumption for Keyword Spotting
Arnab Neelim Mazumder, University of Maryland, Baltimore County
Tinoosh Mohsenin, University of Maryland, Baltimore County
11:50 am to 12:50 pm
12:50 pm to 2:30 pm
Session Chair: Yingyan Lin, Assistant Professor, Rice University
Session Chair: Laura Galindez, Postdoctoral Researcher, University of California - Berkeley
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference
Zhongzhi Yu, Ph.D. student, Rice University
Low precision deep neural network (DNN) training is one of the most effective techniques for boosting DNNs’ training efficiency, as it trims down the training cost from the finest bit level. While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help DNNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts. However, existing dynamic low precision training methods rely on manually designed precision schedules to achieve advantageous efficiency and accuracy trade-offs, limiting their more comprehensive practical applications and achievable performance. To this end, we propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs. It is worth noting that LDP-trained DNNs are by nature efficient during inference. Furthermore, we visualize the resulting temporal and spatial precision schedule and distribution of LDP trained DNNs on different tasks to better understand the corresponding DNNs’ characteristics at different training stages and DNN layers both during and after training, drawing insights for promoting further innovations. Extensive experiments and ablation studies (seven networks, five datasets, and three tasks) show that the proposed LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs. For example, in addition to having the advantage of being automated, our LDP achieves a 0.31\% higher accuracy with a 39.1\% lower computational cost when training ResNet-20 on CIFAR-10 as compared with the best SOTA method.
PocketNN: Integer-only Training and Inference of Neural Networks via Direct Feedback Alignment and Pocket Activations in Pure C++
Jaewoo SONG, PhD Student, Hong Kong University of Science and Technology (HKUST)
Standard deep learning algorithms are implemented using floating-point real numbers. This presents an obstacle for implementing them on low-end devices which may not have dedicated floating-point units (FPUs). As a result, researchers in tinyML have considered machine learning algorithms that can train and run a deep neural network (DNN) on a low-end device using integer operations only. In this paper we propose PocketNN, a light and self-contained proof-of-concept framework in pure C++ for the training and inference of DNNs using only integers. Unlike other approaches, PocketNN directly operates on integers without requiring any explicit quantization algorithms or customized fixed-point formats. This was made possible by pocket activations, which are a family of activation functions devised for integer-only DNNs, and an emerging DNN training algorithm called direct feedback alignment (DFA). Unlike the standard backpropagation (BP), DFA trains each layer independently, thus avoiding integer overflow which is a key problem when using BP with integer-only operations. We used PocketNN to train some DNNs on two well-known datasets, MNIST and Fashion-MNIST. Our experiments show that the DNNs trained with our PocketNN achieved 96.98% and 87.7% accuracies on MNIST and Fashion-MNIST datasets, respectively. The accuracies are very close to the equivalent DNNs trained using BP with floating-point real number operations, such that accuracy degradations were just 1.02%p and 2.09%p, respectively. Finally, our PocketNN has high compatibility and portability for low-end devices as it is open source and implemented in pure C++ without any dependencies.
tinyMAN: Lightweight Energy Manager using Reinforcement Learning for Energy Harvesting Wearable IoT Devices
Toygun BASAKLAR, PhD Student, University of Wisconsin - Madison
Advances in low-power electronics and machine learning techniques lead to many novel wearable IoT devices. These devices have limited battery capacity and computational power. Thus, energy harvesting from ambient sources is a promising solution to power these low-energy wearable devices. They need to manage the harvested energy optimally to achieve energy-neutral operation, which eliminates recharging requirements. Optimal energy management is a challenging task due to the dynamic nature of the harvested energy and the battery energy constraints of the target device. To address this challenge, we present a reinforcement learning-based energy management framework, tinyMAN, for resource-constrained wearable IoT devices. The framework maximizes the utilization of the target device under dynamic energy harvesting patterns and battery constraints. Moreover, tinyMAN does not rely on forecasts of the harvested energy which makes it a prediction-free approach. We deployed tinyMAN on a wearable device prototype using TensorFlow Lite for Micro thanks to its small memory footprint of less than 100 KB. Our evaluations show that tinyMAN achieves less than 2.36 ms and 27.75 μJ while maintaining up to 45% higher utility compared to prior approaches.
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana JELČICOOVÁ, Industrial PhD student, Oticon
Multi-head self-attention forms the core of Transformer networks. However, their quadratically growing complexity with respect to the input sequence length impedes their deployment on resource- constrained edge devices. We address this challenge by proposing a dynamic pruning method, which exploits the temporal stability of data across tokens to reduce inference cost. The threshold-based method only retains significant differences between the subsequent tokens, effectively reducing the number of multiply-accumulates, as well as the internal tensor data sizes. The approach is evaluated on the Google Speech Commands Dataset for keyword spotting, and the performance is compared against the baseline Keyword Transformer. Our experiments show that we can reduce ∼ 80% of operations while maintaining the original 98.4% accuracy. Moreover, a reduction of ∼ 87 − 94% operations can be achieved when only degrading the accuracy by 1-4%, speeding up the MHSA inference by a factor of ∼ 7.5 − 16.
Toward Compact Deep Neural Networks via Energy-Aware Pruning
Seul-Ki Yeom, Machine Learning, nota.ai
Despite the remarkable performance, modern deep neural networks are inevitably accompanied by a significant amount of computational cost for learning and deployment, which may be incompatible with their usage on edge devices. Recent efforts to reduce these overheads involve pruning and decomposing the parameters of various layers without performance deterioration. Inspired by several decomposition studies, in this paper, we propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN). Proposed energy-aware pruning leads to state-of-the-art performance for Top-1 accuracy, FLOPs, and parameter reduction across a wide range of scenarios with multiple network architectures on CIFAR-10 and ImageNet after fine-grained classification tasks. On toy experiment, without fine-tuning, we can visually observe that NN has a minute change in decision boundaries across classes and outperforms the previous popular criteria. We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10, respectively. In addition, our observations are consistent for a variety of different pruning setting in terms of data size as well as data quality which can be emphasized in the stability of the acceleration and compression with negligible accuracy loss.
2:30 pm to 2:50 pm
2:50 pm to 4:30 pm
Session Chair: Priyanka RAINA, Assistant Professor, Stanford University
Improving the Energy Efficiency and Robustness of tinyML Computer Vision using Log-Gradient Input Images
Qianyun LU, PhD Student, Stanford University
This paper studies the merits of applying log-gradient input images to convolutional neural networks (CNNs) for tinyML computer vision (CV). We show that log gradients enable: (i) aggressive 1.5-bit quantization of first-layer inputs, (ii) potential CNN resource reductions, and (iii) inherent robustness to illumination changes (1.7% accuracy loss across 1/32…8 brightness variation vs. up to 10% for JPEG). We establish these results using the PASCAL RAW image data set and through a combination of experiments using neural architecture search and a fixed three-layer network. The latter reveal that training on log-gradient images leads to higher filter similarity, making the CNN more prunable. The combined benefits of aggressive first-layer quantization, CNN resource reductions, and operation without tight exposure control and image signal processing (ISP) are helpful for pushing tinyML CV toward its ultimate efficiency limits.
An Empirical Study of Low Precision Quantization for TinyML
Shaojie Zhuo, Staff Engineer, Qualcomm Canada ULC
Tiny machine learning (tinyML) has emerged during the past few years aiming to deploy machine learning models to embedded AI processors with highly constrained memory and computation capacity. Low precision quantization is an important model compression technique that can greatly reduce both memory consumption and computation cost of model inference. In this study, we focus on post-training quantization (PTQ) algorithms that quantize a model to low-bit (less than 8-bit) precision with only a small set of calibration data and benchmark them on different tinyML use cases. To achieve a fair comparison, we build a simulated quantization framework to investigate recent PTQ algorithms. Furthermore, we break down those algorithms into essential components and re-assembled a generic PTQ pipeline. With ablation study on different alternatives of components in the pipeline, we reveal key design choices when performing low precision quantization. We hope this work could provide useful data points and shed lights on the future research of low precision quantization.
Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks
Dominika Przewlocka-Rus, Researcher, Meta Reality Lab Research
Deploying Deep Neural Networks in low-power embedded devices for real time-constrained applications requires optimization of memory and computational complexity of the networks, usually by quantizing the weights. Most of the existing works employ linear quantization which causes considerable degradation in accuracy for weight bit widths lower than 8. Since the distribution of weights is usually non-uniform (with most weights concentrated around zero), other methods, such as logarithmic quantization, are more suitable as they are able to preserve the shape of the weight distribution more precise. Moreover, using base-2 logarithmic representation allows optimizing the multiplication by replacing it with bit shifting. In this paper, we explore non-linear quantization techniques for exploiting lower bit precision and identify favorable hardware implementation options. We developed the Quantization Aware Training (QAT) algorithm that allowed training of low bit width Power-of-Two (PoT) networks and achieved accuracies on par with state-of-the-art floating point models for different tasks. We explored PoT weight encoding techniques and investigated hardware designs of MAC units for three different quantization schemes – uniform, PoT and Additive-PoT (APoT) – to show the increased efficiency when using the proposed approach. Eventually, the experiments showed that for low bit width precision, non-uniform quantization performs better than uniform, and at the same time, PoT quantization vastly reduces the computational complexity of the neural network.
A Brain-Inspired Low-Dimensional Computing Classifier for Inference on Tiny Devices
Shijin Duan, Ph.D. Student, Northeastern University
By mimicking brain-like cognition and exploiting parallelism, hyperdimensional computing (HDC) classifiers have been emerging as a lightweight framework to achieve efficient on-device inference. Nonetheless, they have two fundamental drawbacks, heuristic training process and ultra-high dimension, which result in sub-optimal inference accuracy and large model sizes beyond the capability of tiny devices with stringent resource constraints. In this paper, we address these fundamental drawbacks and propose a low-dimensional computing (LDC) alternative. Specifically, by mapping our LDC classifier into an equivalent neural network, we optimize our model using a principled training approach. Most importantly, we can improve the inference accuracy while successfully reducing the ultra-high dimension of existing HDC models by orders of magnitude (e.g., 8000 vs. 4/64). We run experiments to evaluate our LDC classifier by considering different datasets for inference on tiny devices, and also implement different models on an FPGA platform for acceleration. The results highlight that our LDC classifier offers an overwhelming advantage over the existing brain-inspired HDC models and is particularly suitable for inference on tiny devices.
TinyM^2Net: A Flexible System Algorithm Co-designed Multimodal Learning Framework for Tiny Devices
Hasib-Al-RASHID, PhD Student, University of Maryland Baltimore
With the emergence of Artificial Intelligence (AI), new attention has been given to implement AI algorithms on resource constrained tiny devices to expand the application domain of IoT. Multimodal Learning has recently become very popular with the classification task due to its impressive performance for both image and audio event classification. This paper presents TinyM2Net — a flexible system algorithm co-designed multimodal learning framework for resource constrained tiny devices. The framework was designed to be evaluated on two different case-studies: COVID-19 detection from multimodal audio recordings and battle field object detection from multimodal images and audios. In order to compress the model to implement on tiny devices, substantial network architecture optimization and mixed precision quantization were performed (mixed 8-bit and 4-bit). TinyM2Net shows that even a tiny multimodal learning model can improve the classification performance than that of any unimodal frameworks. The most compressed TinyM2Net achieves 88.4% COVID-19 detection accuracy (14.5% improvement from unimodal base model) and 96.8% battle field object detection accuracy (3.9% improvement from unimodal base model). Finally, we test our TinyM2Net models on a Raspberry Pi 4 to see how they perform when deployed to a resource constrained tiny device.
4:30 pm to 4:50 pm
4:50 pm to 5:30 pm
Session 2 - Short papers
Session Chair: Houman Homayoun, Professor at UC Davis
L3U-Net: Low-Latency Lightweight U-Net Based Image Segmentation Model for Parallel CNN Processors
Erman Okman, Analog Devices
Mehmet Gorkem Ulkar, Analog Devices
Gulnur Selda Uyanik, Analog Devices
Neural Architecture Search for Energy Efficient Always-on Audio Models
Simon Carlile, X, The Moonshot Factory
Karolis Misiunas, Google Research
Sagi Perel, Google Research
Malcolm Slaney, Google Research
Daniel T. Speckhard, X, The Moonshot Factory Tenghui Zhu, Google Research
Towards Agile Design of Neural Processing Units with Chisel
Binyi Wu, Technische Universitaet Dresden Wolfgang Furtner, Infineon Technologies AG
Christian Georg Mayr, Technische Universitaet Dresden
Bernd Waschneck, Infineon Technologies AG
Neural Architecture Search for Low-Precision Neural Networks
Christian Georg Mayr, Technische Universitaet Dresden
Bernd Waschneck, Infineon Technologies AG
Binyi Wu, Technische Universitaet Dresden
Schedule subject to change without notice.
Vijay JANAPA REDDI
University of Maryland Baltimore County
University of Cyprus
University of Wisconsin - Madison
University of Michigan, Ann Arbor MI
Reality Labs, Meta
University of Maryland Baltimore
University of Maryland Baltimore
Technical University of Munich
Syed Shakib SARWAR
Hong Kong University of Science and Technology (HKUST)
Carnegie Mellon University