Inaugural tinyML EMEA Technical Forum
tinyML events are going “global”, virtually. After postponing the in-person event that was to be held in Cyprus, we will be going online the week of June 7, 2021. Even though it is virtual it will still be a “regional” event, with speakers and participants showcasing technology from the Europe, Middle East, and Africa (EMEA) region.
tinyML is a fast growing branch of machine learning technologies, architectures and approaches dealing with machine intelligence at the very edge. It is broadly defined as integrated, “full-stack” (HW-SYS-SW-apps), ML architectures, techniques, tools and approaches capable of performing on-device analytics for a variety of sensing modalities (vision, audio, motion, environmental, human health monitoring etc.) at extreme energy efficiency, typically in the single mW (and below) power range, enabling machine intelligence right at the boundary of the physical and digital worlds.
Central European Summer Time / UTC +2
4:00 pm to 4:15 pm
Open / Welcome
4:15 pm to 5:00 pm
Tutorial: Context Awareness Function Pack (FP)
Lisa TROLLO, Artificial Intelligence Strategy, STMicroelectronics
Federico IACCARINO, Product Marketing, STMicroelectronics
Carlo PARATA, System Engineer, STMicroelectronics
This live tutorial will feature experts from ST Microelectronics covering the joint use of STM32 and In-sensor computing with machine learning core. The agenda is aligned to the subject as follows:
introduction of ST products for edge AI for both STM32 and sensors
describe ML/AI ecosystem in term of tools to be used for FP
description and usage of the FP for ASC and HAR on the boards
The tutorial is not hands-on; rather it is a How to Get Started plus and will include information for attendees will be able to buy the boards, install and run the FP after the tutorial.
5:05 pm to 5:50 pm
Tutorial: Bio-inspired neuromorphic circuits architectures
Giacomo INDIVERI, Professor, ETH Zurich
Artificial Intelligence (AI) and deep learning algorithms are revolutionizing our computing landscape, and have demonstrated impressive results in a wide range of applications. However, they still have serious shortcomings for use cases that require closed-loop interactions with the real-world.
Current AI systems are still not able to compete with biological ones in tasks that involve real-time processing of sensory data and decision making in complex and noisy settings.
Neuromorphic Intelligence (NI) aims to fill this gap by developing ultra-low power electronic circuits and radically different brain-inspired in-memory computing architectures.
NI hardware systems implement the principles of computation observed in the nervous system by exploiting the physics of their electronic devices to directly emulate the biophysics of real neurons and synapses.
This tutorial will present strategies derived from neuroscience for carrying out robust and low latency computation using electronic neural computing elements that share the same (analog, slow, and
noisy) properties of their biological counterparts. I will present examples of NI circuits, and demonstrate applications of NI processing systems to extreme-edge use cases, that require low power, local processing of the sensed data, and that cannot afford to connect to the cloud for running AI algorithms.
5:55 pm to 7:30 pm
TinyMLPerf: Development of a Benchmark Suite for TinyML Systems
Csaba KIRALY, Internet of Things Engineer, Digital Catapult UK
Tiny machine learning (tinyML) is driving enormous growth within the IoT industry, enabling data driven development and previously unseen levels of machine intelligence and autonomy of operation at the far edge.
Evaluating the performance of low-power solutions in such a fastly evolving space is already difficult given the large design space offering various performance-energy tradeoffs even for a single application. Providing benchmarks that allow the comparison of different solutions is even more challenging due to the wide range of targeted applications, power budgets, model specific optimizations, innovative HW and SW designs, and toolchains. Yet, to foster innovation, it is necessary to provide a benchmark that is fair, replicable, robust and enjoys the support of the wider community; a global community in which several EMEA players also contributed to the development of the first version of such a benchmark.
This work presents the first version of tinyMLPerf, a suite of benchmarks developed by the tinyML community to be used to compare tinyML hardware and software systems. The talk gives an insight into the development process behind the benchmark suite, describing the benchmark selection process, some of the design choices made, and the benchmarks selected for this first iteration consisting of four ML tasks: small vocabulary keyword spotting, binary image classification, small image classification, and anomaly detection using machine operating sounds. It will present the benchmark framework developed in a collaboration of MLCommons and EEMBC, the development of reference implementations on an ST platform to help submitters, the use of the benchmark to evaluate some performance-energy tradeoffs of a single solution, and some of the lessons learned during the process.
tinyML design for environmental sensing applications
Jianyu ZHAO, Algorithm and Modeling Engineer, Infineon Technologies AG
The deployment of large numbers of sensors to monitor various environmental parameters (such as temperature, pressure, noise, pollutants, etc.) and the resulting availability of a large amount of data is motivating the use of machine learning (ML) algorithms including neural networks also on small devices with the goal of making the sensors “smarter” and thus enabling “intelligence at the edge”.
ML techniques allow for more accurate analysis of complex sensor behaviors and interdependencies and can help quickly identify dangerous situations, such as the presence of poisonous gases in an indoor or outdoor environment. As the use of more complex algorithms spreads, a growing interest is observed in the scientific community toward a joint optimization of algorithms, software, and dedicated hardware for on-sensor data analysis (inference) on battery-operated low-power devices.
For the specific gas sensing application we address in the present contribution, a small Gated Recurrent Unit (GRU) is used to estimate gas concentrations in the air. It can exploit the time properties of the sensor signals while keeping the memory footprint within the budget. The algorithmic model is first designed and trained on a computer cluster and then deployed on a Cypress PSoC® interface board, which is later used for signal measurement, heater control, real-time concentration estimation, and communication.
The hardware platform is equipped with an ARM Cortex M0+ processor, with 32 kB Flash and 4 kB SRAM. Its limited memory and computational resources hinder the use of ready-made deployment toolchains, such as TensorFlow Lite, which are normally image-oriented and require at least several hundred kB of memory. To circumvent these issues, we developed our own Python and C library dedicated to extremely small and low-cost smart sensor applications. The deployment workflow is guided with a Jupyter Notebook and can be divided into 4 steps: network quantization, C code generation, performance evaluation, and verification on the embedded target. With the dedicated test bench, it’s possible to visualize the simulated output, to flexibly adjust quantization setups (such as the position of the binary point), and thus to find the best trade-off between algorithm performance and memory footprints with little effort. As a result, we manage to migrate the best performing algorithm, which comes with signal processing functions and a GRU regressor (25 time steps and 20 hidden units), from the computer cluster to the target hardware without significant loss of accuracy.
The deployment library can be applied to similar sensor applications concerned with small neural networks with dense and GRU layers. In the future, we plan to extend the support also for convolutional layers.
Building Heterogeneous TinyML Pipelines
Christopher KNOROWSKI, CTO, SensiML Corp
When complexity and/or the number of the class in the data increases, creating a successful model becomes more challenging in constrained embedded devices. In order to overcome this challenge, we can iteratively combine the classes into similar groups and optimize for each new group using different classifiers and/or features. This creates a hierarchical model, which is a combination of several simple and high-performing models. Hierarchical models often provide a better resource vs. accuracy performance than a single large model.
In this talk we will describe an embedded SDK architecture that makes it possible to combine mixed classifier machine learning pipelines efficiently into a single library. The SDK allows for the creation of hierarchical and multi-model machine learning pipelines while reducing the overall memory footprint.
Image-based target identification on a tiny Risc-V multi-core application processor
Manuele RUSCI, Embedded Machine Learning Engineer, Greenwaves
In this talk, we would like to explain the new technology we used to design a new generation hearable platform based on RISC-V multicore processor. GAP9 can deliver exceptional audio quality with ultra-low latency which allows you to implement new-generation algorithms like ANC, Noise reduction and Spatial Sound.
Lessons learned from building a TinyML-powered artificial nose
Benjamin CABE , Principal Program Manager, Microsoft
It was a long weekend of May 2020, early during the pandemic, and I started to explore how a simple Cortex-M4 coupled with a gas sensor could maybe help me figure out when my sourdough starter had reached the perfect level of fermentation.
Join this session to hear about my journey from not being able to grasp what a neural network even really was in the first place, to building a complete open source & open hardware DIY artificial nose that unlocks a wide variety of applications, in addition to providing a great framework for introducing (embedded) developers to TinyML. I will also touch on the intersection of TinyML and IoT, and how to make TinyML applications first-class citizens in a connected world.
7:30 pm to 8:10 pm
Lightning Talks enable the audience to review as many potentially exciting ideas as possible in a short space of time.
Innovative Minimization of Parameter Memory Space in Small-Silicon, Low-Power Devices
Moshe HAIUT, CTO Staff, DSP Group
NN models normally require tens and sometimes hundreds of mega-bytes for their parameter (weights) storage, which introduces a big challenge for tinyML edge-based solutions. In these tinyML chips the storage space that is allocated for the weights is below 1Mbyte, in order to meet the requirements of reasonable silicon area and power consumption below 1mW.
Fortunately, the memory space for weights storage can be reduced dramatically by using a combination of three techniques: Quantization, Pruning, and Lossless Compression.
This tinyML talk will show how the weights space can be reduced to minimum when using the DSPG nNetLite h/w engine to make inference of small to medium NN models in the DBM10L chip. The combination of dedicated h/w and efficient compilation and simulation toolchain results in a high compression ratio, still maintaining inference accuracy with minimum latency. The nNetLite compiler provides the user a way of controlling the number of bits that is allocated for weights quantization, as well as a way of reducing the number of weights by applying smart post-training pruning. The nNetLite bit-exact simulator provides the user a means of analyzing the final inference accuracy under different quantization and pruning constraints as selected in the compilation process. This integrated toolchain enables a trial-and-error approach to reach a final optimized solution in which the compressed weights can fit into a pre-determined memory space with minimum degradation in performance.
The final piece of the DSPG nNetLite solution is the h/w part: This IP incorporates a module called Weight Extraction Unit (WEU) that is responsible for performing the weights de-compression and de-quantization in real-time in order to prepare for the math operations. This way the Multiply & Accumulate unit (MAC) enjoys access to a narrow sliding window of plain parameters from the WEU zero-wait-states tightly-coupled cache memory.
Attendees will see the process of shrinking the weights memory space in a specific example of a NN model that is based on a known popular database. The process will demonstrate the power of the DSPG nNetLite compiler and simulator toolchain.
Perfect coffee roasting with TinyML sound sensing
Jon NORDBY, CTO, Soundsensing
Great coffee requires not just high quality coffee beans, but also a roasting process that consistently brings out the desired flavor and aroma. During the roasting the coffee beans will pop like popcorn (“cracking”), and the sound of these cracks is a good indicator of the development stage of the coffee beans.
By integrating MEMS microphones and on-edge analysis using machine learning (TinyML), the ROEST coffee roasters can use sound to automatically keep track of the roasting process. This technology has been developed in a collaboration between Roest and Soundsensing, and is shipping on ROEST sample roaster since August 2020.
In this talk you will hear about this fun and practical application of TinyML, and some of the challenges and solutions we found when deploying on-edge machine learning in professional grade electronics products.
A low power and High Performance Artificial Intelligence Inference Approach for Embedded Data Processing
Mandar HARSHE, Senior Developer, Klepsydra Technologies GmbH
Machine learning techniques when used in the automotive, space and IoT fields require a large number of high-quality sensors and a corresponding high computational power. New sensors available in the market produce this high quality data at desired high rates, while new processors allow a substantial increase in available computation power. Current mobile phones are also equipped with cameras having a higher resolution than cameras used traditionally in the automotive or space sectors. This combination of increased computational power coupled with better high quality sensors allows for the consideration of advanced embedded artificial intelligence algorithms.
However, the use of advanced AI algorithms with increased sensor data, and increased processor power brings new challenges with it: low determinism, excessive power consumption, large amounts of potentially redundant data, parallel data processing, and cumbersome software development. Current approaches to address these challenges and to increase the throughput of data processed by AI algorithms are to consider using FPGAs or GPUs. However, these solutions present other technical problems including increased power consumption and programming complexity and, in case of space applications, radiation hardening limitations.
We present a novel approach to AI that can produce deterministic and optimal code for scenarios presented above and which can work on a CPU. The approach uses advanced lock-free programming techniques coupled with high-performance event loops to optimize the data flow through neural networks during AI inference. Lock-free techniques reduce context switching caused by traditional approaches, reducing CPU usage. They are also suitable for higher throughput when used with fast event loops. The approach presented uses these ideas to stream data between layers of deep neural networks and allows for an increased throughput in data processed. This approach allows the use of state of the art AI algorithms to process data onboard, thus allowing the target application to work with limited connection to the cloud and reduces the costs associated with storing and transferring redundant or low-value data.
The approach is also highly configurable to different target scenarios and can be tuned easily to address different constraints like throughput, latency, CPU consumption or memory requirements. The solution presented is used in aerospace and robotics applications to produce data and process it at rates substantially higher than other available products. The experimental setup tests the Alexnet and Mobilenet V1 architectures on an Intel machine and a Raspberry Pi running Ubuntu. The presented results show that our implementation is not only high performance and scalable but also has a substantially low power consumption, which makes it suitable for a variety of applications and allows targeting different constraints as per the application.
Location specific TinyML model calibration
Sam LEROUX, PhD Student, Ghent University
TinyML refers to the use of machine learning models on resource constrained edge devices. Crucially, TinyML allows for intelligent data analysis close to the sensor. We argue that there is a lot of potential to exploit this locality principle to train smaller, more accurate models. SInce the ML model will only process data from a specific sensor instance, it only needs to perform well for this specific sensor and does not need to generalize to data from other locations, making it possible to use models that are smaller than a general purpose model. Intelligent surveillance cameras for example will always monitor the same scene. It is wasteful to use a general object detection model that can recognize a wide variety of objects if the camera is overlooking a highway where it only needs to detect cars and trucks. In addition, it will always observe the objects from a fixed angle so it should only detect features that describe a car from the front for example as the model will never see a car from the back. Another model that observes objects from the side will need to detect other features to make accurate predictions but crucially, no model needs to be able to detect both types of features. Anomaly detection is another use case that could benefit greatly from location specific models. In a factory for example, we could use microphones located near machines to detect failures or accidents. Different locations in the factory have different soundscapes and what is an anomaly at one location might be normal at another location.
A disadvantage of these location specific models is that we now have to train and keep track of a large number of independent models. In this talk we show how we can use Rune and Hammer, two tools developed by Hotg.ai to easily train, manage and deploy location specific models. Rune is our container technology that wraps the ML model and all code for pre- or post-processing of the data into a portable webassembly format. Similar to a docker image, a Rune is configured using a configuration file that describes what input sensor is needed, what preprocessing steps are required, what ML model is used and how the predictions should be used. Because all this functionality is compiled into webassembly, Rune images can be deployed on servers, desktops, smartphones and other edge devices. Hammer is our service which can manage a large number of Runes. Hammer can upload Runes to different devices and can monitor the state and performance of the models. Hammer can collect data from different sources and use these to update the models which are then packaged as a Rune and sent over to the device. The Runes can be swapped on the fly without any interruption in operation. To keep track of the different versions of the model, Rune images can be tagged with a name and version number.
We will show this system in action using a toy example where we trained a speech recognition model to classify voice commands (up, down, left, right) as input for a game of 2048. The model is first trained for english commands. We then show how Hammer can be used to manage different models to deal with different dialects or even different languages.
Predicting Faults in a Water Pump and its Pipeline using TinyML
Mayank MATHUR, Senior Solutions Architect, NA
Monitoring and maintenance of rotating machineries like water pumps, wind turbines, electric motors etc. tends to be laborious, time-consuming and costly; especially monitoring the ones installed in remote locations by the utility companies for power generation, water supply & distribution or at the oil mines. The breakdowns not only heavily costs the business but the resulting disruption also causes inconvenience to the consumers. While some IoT sensors today can monitor the condition of the equipment but their major draw-back is that they rely on cloud based analytics for which they need to be connected to the internet most of the time resulting in significant power consumption. Such sensors are not at all suitable to be installed in remote locations with limited power and network availability.
Relevance to tinyML
Installing small and cheap sensors that can not only monitor but also analyse the vibrations generated by these machineries can help predict lots of faults well in advance before they become significant enough to cause a breakdown. A TinyML model built for the purpose and deployed on these sensors reduces the dependency on the cloud making it more suitable for installations in remote areas. A ML model deployed on the device itself also has multiple other benefits like low latency, low network bandwidth consumption, and improved data security.
The solution to the stated problem is to design a sensor that is capable of capturing the vibrations generated by a machinery and predicting faults in near real-time. Instead of continuously sending the vibrational data, the sensor only sends the results of the inferencing to the cloud. The technical approach and its novelty to validate if the proposed solution of deploying a ML model to a small microcontroller for PdM of machineries would practically work or not a sensor prototype with a STM32F411 microcontroller and a LIS3DH MEMS accelerometer was built. A separate setup with a small water pump was also created to test the sensor by installing it on top of the pump.
The novelty of the technical approach is in creating a setup where the faults can be manually and repeatedly generated to demonstrate and validate the reliability of the TinyML model running on a microcontroller. The sensor although installed on top of the pump is able to predict the faults in the water pipeline.
Results and their significance to the tinyML community
The result is a highly reliable model with 96.8% accuracy. This solution helps in establishing how TinyML can be leveraged to design better and more efficient solutions for remote monitoring in Industrial IoT.
A deeply embedded radar based hand gesture recognition application
Stephan SCHOENFELDT, Lead Principle System Architect, Infineon Technologies
Contactless interaction with machines (elevators, vending machines, ticket machines, information terminals, etc.) is an effective way to avoid the spread Covid-19 or other virus by machines, regularly used by many different people. The demand for such solutions is already there and will persist. Regular disinfection of the interfaces can help in this situation, but is not practical as it has to happen regularly and the number of entities is just too large.
Radar Sensors are well suited to detect and classify different gestures. Compared to solution using RGB cameras, infrared or ultrasound sensors, they have advantages when it comes to overall sensitivity, maximum range and robustness towards disturbers. Additionally they are superior in different aspects of industrial design, as product designers can place them behind different types of material and they do not require openings in the housing. This makes radar sensors very robust against dust and vapor. Finally yet importantly, radar sensors provide intrinsic privacy protection.
This talk is about how we implement a radar based hand gesture recognition application on an M4 Microcontroller running at 150 MHz and comprising a total RAM footprint of below 300 kBytes (Cypress PSOC6). I cover the required preprocessing/feature extraction algorithm, the neural network design and training strategy. Furthermore, I discuss the approach to network quantization and how we use tensor flow light micro as an inference runtime.
I elaborate on execution timing and resource consumption in on the embedded platform. Finally, I show a video of the final application running on the microcontroller.
Remote Birding with TensorFlow Lite and Raspberry Pi
Rob LAUER, Developer Relations Lead, Blues Wireless
Like many of us, too much of my day is spent in front of a screen. The pleasure I get from watching birds flock to my bird feeder is unfortunately outweighed by the realities of life.
However, I’m not one to give up on this lazy dream of bird watching! Instead of sitting on my porch waiting for birds to come in, why not be *actively notified* when birds are at the feeder (and which birds are showing up)? While slightly dystopian, it’s still a great idea – let’s build it!
In this talk, we will walk through a real-world solution that ties together Machine Learning concepts, remotely-powered IoT deployments, and cloud connectivity in an easy-to-understand project. By using TensorFlow Lite on a Raspberry Pi (with a handy Twilio integration for MMS), we will see how easy it can be to develop robust Machine Learning solutions with readily available hardware and open source software.
Central European Summer Time / UTC +2
4:00 pm to 4:15 pm
Open / Welcome
4:15 pm to 5:00 pm
Keynote: A novel approach to building exceptionally tiny, predictive and explainable models for non-data scientists
Blair NEWMAN, CTO, Neuton.ai
Performing compute and inference on the edge solves most issues with privacy, latency and reliability, but how do we address the remaining obstacles:
many parties interested in AI/ML, including those who work with microcontrollers, do not have knowledge in Machine Learning and software development
the difficulty of embedding large ML models into small compute devices
the challenge of evaluating the quality of a model, and whether it has interpretable, explainable and reliable output
Inference on edge devices will move toward mass adoption only if Machine Learning becomes available to non-Data Scientists. We will show how already, today, non-ML users can build – with just a few clicks and no-code – compact models which are up to 1000 times smaller than those built with Tensor Flow and similar frameworks (and without reduction of accuracy). We will demonstrate why models built with those frameworks are not optimal in size and accuracy and share how to overcome those obstacles and build quality compact models with an excellent generalization capability.
We will explain and show examples of how Neuton’s working tiny models can be embedded into microcontrollers and will compare the results with those built with TensorFlow Lite. We will also demonstrate how users can evaluate model quality at every stage and identify the logic behind the model analysis, therefore clarifying why certain predictions have been made.
5:00 pm to 5:45 pm
Keynote: The model efficiency pipeline, enabling deep learning inference at the edge
Bert MOONS, Research Scientist, Qualcomm
Today, most deep learning and AI applications are developed on and for high-performance computing systems in the cloud. In order to make them suitable for real-time deployment on low-power edge devices and wearable platforms, they have to be specifically optimized. This talk is an overview of a model-efficiency pipeline that achieves this goal: automatically optimizing deep learning applications through Hardware-Aware Neural Architecture Search, compressing and pruning redundant layers and subsequently converting them to low-bitwidth integer representations with state-of-the-art data-free and training-based quantization tools. Finally, we take a sneak peek at what’s next in efficient deep learning at the edge: mixed-precision hardware-aware neural architecture search and conditional processing.
5:45 pm to 5:55 pm
5:55 pm to 6:40 pm
A Battery-Free Long-Range Wireless Smart Camera for Face Detection: An accurate benchmark of novel Edge AI platforms and milliwatt microcontrollers
Michele MAGNO, Head of the Project-based learning Center, ETH Zurich, D-ITET
An emerging class of those devices is hosting low-power image sensors to perform surveillance, monitoring, and controlling. Miniaturized camera devices are today a commercial reality with several market products for a wide range of applications, from industrial to entertainment and autonomous navigation . On the other side, those tiny camera systems are usually supplied by little energy storage, limiting their lifetime in the range of a few hours. Moreover, most of those miniaturized IoT “smart” cameras limit their intelligence or even only acquire/store the images and send them wirelessly to a smart-phone or more intelligent device, or download them off-line.
A class of ML that is becoming more and more attractive and challenging for ML is the edge ML or tiny machine learning, where ML algorithms are compressed to run in resource-constrained microcontrollers. To allow to have effective tiny ML systems, on one side, hardware specialists are designing novel hardware architectures to deal with the demand for large computational and storage capability. On the other side, software and algorithms specialists, including Google, are proposing less complex models and sophisticated training tools. However, bringing tiny ML on a resource-constrained processor is still a very challenging task due to the limited memory and computational capabilities available in low power processors. Typical processors for low power sensors are microcontrollers, the most popular ARM Cortex-M and RISC-V families, that can count of only a few hundred million operations per second (MOPS) with a power consumption of 10-100 mW, compatible with the goal of low power long-lasting intelligence devices. The recent trend is designing mW power microcontroller with parallel architecture, for instance PULP processor or the commercial version GAP8 from Greenwave or the novel architectures with hardware accelerators, such as Xmos.AI and Maxim78000 from Maxim to have more operation per clock in the same mW power envelope.
On the other hand, another big obstacle for intelligent devices, and in general for IoT devices, to become truly pervasive is their need for a long-term reliable power source. The use of batteries is the most direct way of powering wireless devices, but regular battery replacement is vital to ensure continuous operation. Such a requirement is unappealing as it implies high maintenance costs, especially in remote areas or if environmental issues related to battery disposal are of concern. Energy harvesting (EH), the technology to convert energy from environmental sources, is the most promising technology to achieve perpetually powered sensors for the IoT, with zero battery replacements over their mission lifetime. EH is already a mature technology for both commercial and residential settings. However, many challenges are still open for tiny-form factor harvesters, needed for the majority of unobtrusive smart sensors.
This work presents a battery-less video sensor node for continuous image processing and it is performing also an accurate benchmark of novel edge AI platform at parity of conditions. The proposes Tiny ML algorithm for challenging face identification with high accuracy with five faces to recognize target low power microcontroller. This work propose also a designed sensor node can run from a cold start in less than one minute from only 350 lux, thanks to the low power design and the high-efficiency energy harvesting circuit that can host both thermal and solar energy harvesters. After the cold-start, the node achieves perpetual working in the presence of the same or higher luminosity. Moreover, the node can cold start also with a very low luminosity of only 250 lux. The specific contribution of this work is as follow:
• Design and implementation of TINYML face recognition with a neural network-based inference, optimized for low power microcontrollers.
• accurate evaluation of novel microcontroller. The benchmark included the implementation of the tiny face-detection Convolutional Neural Network, on seven different microcontrollers, i.e.: ATMEL SAMD51, Ambiq Apollo3, Sony Spresense, PULP/GAP8 from ETH Zurich/Greenwave, Arm Cortex-M55, XMOS xCORE.ai and finally the novel MAXIM78000.
• The design and development of a battery-less wireless video sensor node that can host both thermal and solar energy harvesting.
• Experimental results demonstrating the functionality and benefits of low-power hardware and software co-design combined with solar power to achieve battery-less and perpetual work.
Extra support material.
Main reference of a previous work:
 Giordano, Marco, Philipp Mayer, and Michele Magno. “A Battery-Free Long-Range Wireless Smart Camera for Face Detection.” In Proceedings of the 8th International Workshop on Energy Harvesting and Energy-Neutral Sensing Systems, pp. 29-35. 2020.
Video of the presented battery less camera designed with an ARM Cortex-M4 microcontroller.
Preliminary results of the work we will present at EMEA 2021. (MAXIM7800 still missing will arrive soon)
Modelling and Simulation of Edge AI inference for instantaneous Offline Prediction of Cholera in Rural Communal Water Taps
Marvin OGORE, Student, The University of Rwanda
Africa accounts for 54% of the world disease burden due to lack of access to safe drinking water, with the majority of populations from rural areas or endemic zones getting access to water via potentially unsafe community water taps or faucets. Unfortunately, the expensive laboratory processes and resources used in water processing centers to detect water-borne diseases like cholera cannot be massively deployed on all those taps, this in order to guarantee safe water for everyone, anywhere at any time. Thanks to the Internet of Things (IoT) and Artificial Intelligence (AI), the prediction of water-bone cholera can be done by monitoring water’s physicochemical patterns such as PH, turbidity, conductivity, temperature and salinity. However, related state of the art IoT/AI solutions are designed around a cloud-centric architecture, with dumb edge sensors required to send collected water properties data to the cloud for inferencing. Unfortunately, all time internet connectivity is not guaranteed in rural areas and low latency detection is a strong requirement to be able to warn tap users before consuming potentially unsafe water. My Master thesis research explores the rapid prototyping of an offline edge AI rapid water-bone cholera detector kit, pluggable into existing taps in an non-invasive way to instantaneously infer water safety. This strategy will allow to lower the cost of massive deployment on several thousands of rural-based water taps in developing countries.Our innovative idea built on top of the latest advances in TinyML frameworks and open-source managed services that enable to produce edge AI model libraries optimised for a specific targeted embedded processor.
Technical approach and its novelty
The first step in our work was to identify an existing dataset linking water physicochemical patterns with cholera. Next to that, we have set up a rapid simulation pipeline integrating (1) edge impulse as an edge AI/ML platform tool that take as input the above dataset and generates the corresponding TinyML library, (2) STM32IDECube to integrate the generated library into our application and (3) Proteus VSM, an embedded software/hardware simulation tool. As an embedded board target, we considered STM32 devices. By using Proteus, this setup enables us to explore different embedded challenges before deployment on the real board. Up to now, challenges encountered include finding a rich dataset with cholera cell count found in water. We are currently exploring synthetic data generation as a way to overcome this data challenge.
Results and their significance to the tinyML community
The above simulation pipeline is in place and has been validated on the small dataset of waterborne cholera. The results produced at the simulation level classify correctly. This is a fundamental milestone as it provides a clear path for us to build a foundation for evaluation of the IoT and embedded ML. The next steps involve an advanced analysis of shallow ML algorithms such as Logistic Regression and Support Vector Machine which are more suited for small datasets as well as data augmentation via generation of synthetic data.
Manivannan S, Senior Software Developer, Zf Wabco
The existing IoT Medical device sends bulk ECG data to the mobile/server and analysis is done in high processor or mobile application .So all the ECG analyzing device has dependency on Internet or high processing computers. The proposed TinyML application using Edge Impulse software is to develop a mini-Diagnosis ECG analyzer device which can fit in a pocket and it can diagnose heart diseases independently without cloud connectivity. The proposed ECG Analyzer can detect atrial fibrillation, AV Block 1 and AV Block 2 with more than 90% accuracy.
This application will convert the human observations into datasets, that’s how the accuracy of TinyML model increased. The novelty of the device is decoding the raw ECG data into three different waveforms as Filtered ECG, R-R Interval, and P-R interval. This approach will clearly differentiate the different heart condition’s ECG.
As an application, the research work will be demonstrated using Arduino Nano Ble 33 and ECG sensor AD8232.The proposed research work will be using 3-lead ECG system and the generated TinyML model will have a size <15kB ROM.
6:45 pm to 7:15 pm
Efficient video perception through AI
Faith PORIKLI, Senior Director, Qualcomm
Video data is abundant and being generated at ever increasing rates. Analyzing video with AI can provide valuable insights and capabilities for many applications ranging from autonomous driving and smart cameras to smartphones, extended reality, and IoT. However, as video resolution and frame rates increase while AI video perception models become more complex, running these workloads in real time is becoming more challenging. This presentation will explore the latest research that is enabling efficient video perception while maintaining neural network model accuracy. You’ll learn about:
How video perception is crucial for understanding the world and making devices smarter
The challenges of on-device real-time video perception at high resolution through AI
Qualcomm AI Research’s latest research and techniques for efficient video perception.
Avoiding Loss of Quality while in Pursuit of a Tiny Model
Blair NEWMAN, CTO, Neuton.ai
We can see that today the entire tinyML community is focused on solving the model shrinking issue. We are confident that the issue of assessing the quality of the model and its explainability is already relevant for the entire tinyML community, and we will share how we approach this challenge.
In this talk, we’ll show how incredibly compact models can be created without losing focus on precision. During the talk, we plan to provide answers to the following questions, particularly relevant to the tinyML community at this time:
How, in the pursuit of a small model, can we avoid loss of quality?
Is there a choice between model accuracy and size, today?
How can the quality of a model be evaluated, at all stages, without need for a data scientist?
How can the logic of decision making by a model be identified and understood, if you are dealing with a neural network?
How can available training data be evaluated, and the most important statistics in the context of a single variable, overall data, interconnections, and in relation to the target variable in a training dataset be clearly understood?
How can the reason why this particular model made this or that decision be identified? How can a model’s output be interpreted? Do models built with Neural Networks have explainability potential?
Do all parameters from a data source sensor need to be collected to build a model and obtain meaningful insights? What parameters are enough to build a tiny model?
How can the influence and relative importance of every parameter be understood, on the output?
Can the input parameters be emulated to see how output changes and why?
How can the quality of a tiny model be evaluated?
How can model decay, and need for retraining, be automatically identified?
How can the quality of every single prediction be thoroughly evaluated? How can credibility of each prediction be understood and measured, and how can the level of confidence in each prediction be evaluated?
7:15 pm to 8:15 pm
Central European Summer Time / UTC +2
4:00 pm to 4:15 pm
Open / Welcome
4:15 pm to 5:00 pm
Keynote: Bottom-up and top-down neural processing systems design: unveiling the road toward neuromorphic intelligence
Charlotte FRENKEL, Postdoctoral Researcher, Institute of Neuroinformatics
While Moore’s law has driven exponential computing power expectations, its nearing end calls for new roads to embedded cognition. The field of neuromorphic computing aims at a paradigm shift compared to conventional von-Neumann computers, both for the architecture (i.e. in-memory computing) and for the data representation (i.e. spike-based event-driven encoding). In this talk we will show how to best exploit a bottom-up (neuroscience-driven) approach and a top-down (application-driven) one toward embedded cognition and neuromorphic intelligence. The talk is thus divided in two parts.
In the first part we will focus on the bottom-up approach. From the building-block level to the silicon integration, we design two digital time-multiplexed spiking neural network processing devices: ODIN and MorphIC. Furthermore, we explore the design of neuromorphic processors that use mixed-signal analog-digital circuits and temporal dynamics matched to the one of their input signals, without having to resort to time-multiplexing. We demonstrate with silicon measurement results that hardware-aware neuroscience model design and selection allows optimizing a tradeoff between biophysical versatility, neuron and synapse densities, and power consumption.
In the second part of this talk we will follow a top-down approach. By starting from the applicative problem of adaptive edge computing, we derive a learning algorithm optimized for low-cost on-chip learning: the Direct Random Target Projection (DRTP) algorithm. With silicon measurement results of a top-down DRTP-enabled neuromorphic processor codenamed SPOON, we demonstrate that combining event-driven and frame-based processing with weight-transport-free update-unlocked training supports low-cost adaptive edge computing with spike-based sensors.
Therefore, each of these two design approaches can act as a guide to address the shortcomings of the other. We compare them and discuss their tradeoffs for different potential use cases in edge computing.
5:00 pm to 5:45 pm
Keynote: tinyML Beyond Audio and Vision
Wolfgang FURTNER, Distinguished Engineer System Architecture, Infineon Technologies
This talk introduces you to a few sensors beyond microphones and cameras and to what embedded ML applications are interesting for these sensors. In particular radar sensors and environmental sensors are addressed. It illustrates the specific challenges of this sensors and how advances in AIML can be also leveraged for their application. The presentation discusses the processing needs for this sensors and gives implementation examples on small microcontrollers. It will provide guidelines for the choice of processing architectures and compare their performances. Finally it concludes with an outlook on future embedded hardware and software.
5:45 pm to 5:55 pm
5:55 pm to 6:40 pm
ZigZag: An Architecture-Mapping Design Space Exploration (DSE) Framework for Deep Learning Accelerator
Linyan MEI, PhD Student, KU Leuven
Building efficient embedded deep learning systems requires a tight co-design between DNN algorithms, hardware, and algorithm-to-hardware mapping, a.k.a. dataflow. However, owing to the large joint design space, finding an optimal solution through physical implementation becomes infeasible.
This talk introduces ZigZag, a rapid DSE framework for DNN accelerator architecture and mapping.
ZigZag consists of three key components: 1) an analytical energy-performance-area Hardware Cost Estimator, 2) two Mapping Search Engines that support spatial and temporal even/uneven mapping search, and 3) an Architecture Generator that auto-explores the wide memory hierarchy design space. It takes in a DNN model topology, hardware constraints, and technology parameters and produces optimal hardware architectures, mappings, and corresponding hardware cost (energy, latency, area) estimates. ZigZag uses an enhanced nested-for-loop format as a uniform representation to integrate algorithm, accelerator, and algorithm-to-accelerator mapping descriptions.
ZigZag extends the common DSE frameworks with uneven mapping opportunities and smart mapping search strategies for accelerated search. Uneven mapping decouples the memory hierarchy and mappings (temporal / spatial) of the different operands (W/I/O), opening up a whole new space for DSE, and thus better design points are found.
This talk will describe ZigZag and show the benchmarking experiments against published works, an in-house accelerator, and existing DSE frameworks, together with three case studies, to demonstrate the reliability and capability of ZigZag. Up to 64% more energy-efficient solutions are found compared to other SotAs DSE frameworks, due to ZigZag’s uneven mapping capabilities.
The talk will end with the newest research outcomes of the ZigZag team at KU Leuven, such as applying ZigZag to analog-in-memory-computing (AiMC) architectures and the new fast-and-flexible temporal mapping search method – Loop Order based Memory Allocation (LOMA).
ZigZag is published on IEEE Transactions on Computers, 2021:
ZigZag is open-source on: https://github.com/ZigZag-Project/zigzag.
Energy-efficient TCN-Extensions for a TNN accelerator
Tim FISCHER, PhD Student, ETH Zürich
In recent years, the traditional approach of cloud computing for extremely power constrained IoT devices has been increasingly challenged by the emerging paradigm of edge computing. With the surging demand for intelligence on the edge, highly quantized neural networks have become essential for many embedded applications.
Merging the advantages of both highly quantized and temporal neural networks, this work presents novel extensions to an existing ternary neural network (TNN) accelerator architecture (CUTIE) supporting energy-efficient processing of sequential data.
This talk discusses 1) the hardware implementation of a ternary TCN accelerator for modelling sequential data 2) a hardware-friendly mapping of TCN layers in order to exploit the highly parallel nature of the CUTIE architecture.
Leveraging the hardware and software extensions for TCNs, the TNN accelerator is able to process and classify sequential data on the edge. Exploiting its highly unrolled architecture, the accelerator achieves a peak performance of 962 TOp/s/W in a GF 22nm implementation.
TinyML journey: from face detection demo to real-life commercial deployment
Elad BARAM, VP Products, Emza Visual Sense
This is to tell about our experience in driving TinyML from POC level to a design win, planned to be deployed in millions of Notebooks. This is probably the first widely deployed commercial consumer case study.
One of the main topics we intend to cover, beyond the application itself, is the gap between available demos and benchmark, to what it takes to accommodate real life use cases – addressing different distances of objects, robustness in terms of light conditions etc.
While TinyML holds the potential to be extremely successful, through its inherent advantage of using low cost MCUs, bridging the technology gap is what will convert the demos to real business.
6:40 pm to 6:45 pm
6:45 pm to 8:15 pm
EON Tuner: An AutoML pipeline for real-world embedded devices
Jan JONGBOOM, CTO, Edge Impulse
Finding the best ML model for analyzing sensor data is not easy. What preprocessing steps yield the best results? What signal processing parameters should you pick? And what ML model to use? This is even more true when the resulting model needs to run on a microcontroller where you have latency, memory and power constraints. AutoML tools can help, but typically only look at just the neural networks – disregarding the big role preprocessing and signal processing play for TinyML. In this hands-on session we’ll look at the EON Tuner, a new tool available to all Edge Impulse developers, and how to use it to pick the best model within the constraints of your device.
Panel Discussion: tinyML for Good
Moderator: Fran BAKER , Global Social Innovation Lead, Arm
Tomas EDSÖ, Senior Principal Design Engineer, Arm
Gian Marco IODICE, Team and Tech Lead in the Machine Learning Group, Arm
Jan JONGBOOM, CTO, Edge Impulse
Rosemary NALWANGA, Postgraduate Student, ACEIoT
Marvin OGORE, Student, The University of Rwanda
Samson Otieno OOKO, Postgraduate Student, University of Rwanda
Ciira MAINA, Senior Lecturer, Dedan Kimathi University of Technology
During this panel discussion entitled tinyML for Good, several Arm experts will be outlining how they are using tinyML applications that are focused on the environment and sustainability. They will talk about what has worked and more importantly, what has not worked for them. Prepare to listen and participate in a lively chat about tinyML for Good.
Central European Summer Time / UTC +2
4:00 pm to 4:15 pm
Open / Welcome
4:15 pm to 5:00 pm
Keynote: Shaping the digital future of Europe
Colette MALONEY, Head of Unit "Microelectronics and Photonics Industry”, European Commission - DG Communications Networks, Content and Technolog
The European Commission has put forward a number of new strategic initiatives in digital technologies with the ambition to shape our digital future and foster European competitiveness. Important parts of Horizon Europe are dedicated to R&D in digital technologies; the new Digital Europe Programme will ensure that Europe drives the digital transformation of the economy and society and brings its benefits to all citizens and businesses; the new Joint Undertaking in Key Digital Technologies will provide support for R&D in microelectronics and photonics. At the same time, the joint declaration of member states on processors and semiconductors, the Digital Compass Communication with specific targets for advanced semiconductors and the forthcoming Important Project of Common European Interest in microelectronics provide new momentum. The talk will provide an overview of the new European initiatives in digital technologies from the bigger policy context to specific funding opportunities
5:00 pm to 6:00 pm
Growing the tinyML Community in the EMEA Region
Moderator: Tijmen BLANKEVOORT , Senior Staff Engineer Deep Learning, Qualcomm
Gian Marco IODICE, Team and Tech Lead in the Machine Learning Group, Arm
Loic LIETAR, CEO, GreenWaves Technologies SAS
Hajar MOUSANNIF, Associate Professor, Cadi Ayyad University, Morocco
Abbas RAHIMI, Research Staff Member, IBM Research-Zürich
Patricia SCANLON , Founder and Executive Chair, SoapBox Labs
In the EMEA region, we have a strong presence in both research and industry for digital, analog and mixed signal hardware and sensors and software for IoT, automotive among many other areas. While there are many good initiatives for collaboration, sometimes publicly funded, often also at local levels, it is often a challenge to align across geography and across academia and industry. In this panel we have gathered five experts in tinyML from academia and industry to discuss the unique strengths in EMEA to enable machine learning at the edge. In addition, we will ask them: what are we missing? What are the key challenges to tackle? How can we improve collaboration on the topic in the region?
6:00 pm to 6:15 pm
6:15 pm to 6:30 pm
6:30 pm to 8:00 pm
As part of the TinyML EMEA technical forum we invited students to submit poster abstracts that highlight their work in the area of machine learning at the edge or embedded edge machine learning.
The posters will be accompanied by a 3-5 minute live talk from the students to present current results and future research. The video poster and live Q&As will be recorded and posted in the tinyML YouTube channel after the Forum.
Enabling autonomous navigation on UAVs with onboard MCU based camera running TinyML models at the edge
Andrea ALBANESE, Fellow Researcher, University of Trento
Drones are becoming essential in everyday life; they can assist experts in different fields for challenging tasks, demonstrating feasibility and success. However, autonomous drones that can act autonomously in the surrounding environment would be more efficient, reliable, and permit scalable and cheaper solutions that are not human-assisted. Camera on-board can be used with deep learning models for more and more complex tasks such as monitoring, autonomous navigation, rescue, and aerial surveillance. Many drone applications are based on the cloud computing approach. This technique is limited due to the data transmission that introduces latency (e.g., tight to satisfy real-time requirements) and a consistent energy consumption. Drones flight time is a bottleneck; thus, an edge computing approach that executes all vision and control algorithms on the vehicle is preferable to improve reactiveness and efficiency even in small-size UAVs. Recent papers have shown the possibility to use micro-computers for edge AI computing; however, their power consumption is still in the order of watts. MCU based cameras with Tiny inference onboard are also being investigated for the smaller drones. Common DL algorithms for detection and classification cannot be used because they exceed the maximum memory available in MCU based cameras and pruning and quantization are essential to fit complex models on MCU devices.
We present a particular approach for assisting autonomous navigation. The idea is to use arrows or other symbols drawn on the ground to suggest high-level targets and actions (example in Figure 1). The drone finds the “written messages”, classifies, and decodes the action, and elaborates the paths for its autonomous navigation towards the target detected ad-hoc CNNs. In this way, drones do not need the supervision of an expert pilot, and more vehicles can navigate in a shared space interpreting indications on the ground. Human operators will no longer be in a real-time control loop but provides only high-level goals by writing on the floor or with gestures.
We used OpenMV cam H7 Plus, which consumes only 240mA at 3.3V in active mode. It is based on the STM32H7 ARM Cortex M7 processor running at 480MHz with 1MB of internal SRAM and 2MB of internal flash. The arrow direction is predicted with a custom DL algorithm which consists of a classifier among eight classes: 0°(north), 45°(north-east), 90°(east), 135°(south-east), 180°(south), 225°(south-west), 270°(west) and 315°(north-west). The dataset has been generated in a semi-automatic way, starting from raw videos acquired off-line representing arrows of different fonts and sizes in order to ensure a generalization capability to the network. Videos are processed and the obtained dataset consists in 26600 images for training and 6700 images for test equally split among the 8 classes. The DL model has been trained and tested with input image size 64 x 64 and with three different CNN architectures in order to compare them with a small (SqueezeNet), medium (MobileNetV2) and big (LeNet-5) size structure complexity. Then, the obtained models are analyzed and optimized to fit the camera memory constraint, which is about 500KB because of the footprint of the main firmware and other libraries. In particular, model optimization techniques have been assessed both before and after training. Before training, the model architectures have been optimized by reducing their convolutional layers. A trial-and-error approach has been used to find the best trade-off to fit model size and obtained accuracy. After training, cascade pruning and a float fall back quantization are also executed. Three architectures are compared in terms of memory optimization, and the loss in accuracy during their intermediate representations.
Neural gradients are near lognormal: Improves quantized and sparse training
Brian CHMIEL , Deep Learning Researcher, Intel
While training can mostly be accelerated by reducing the time needed to propagate neural gradients (loss gradients with respect to the intermediate neural layer outputs) back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Distinguished from weights and activations, we find that the distribution of neural gradients is approximately lognormal. Considering this, we suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients. The first method optimizes the floating-point format and scale of the gradients. The second method accurately sets sparsity thresholds for gradient pruning. Each method achieves state-of-the-art results on ImageNet. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2)achieve up to 85% gradient sparsity — in each case without accuracy degradation.
Processor Architecture Optimization for Spatially Dynamic Neural Networks
Steven COLLEMAN, PhD Student, KU Leuven
Spatially dynamic neural networks (SDNNs) are a new and promising type of type of neural networks for handling image-related machine learning tasks. SDNNs adjust network execution based on the input data, saving computations by skipping non-important image regions. However, the saved computation of SDNN is hard to be translated into real speedup on hardware platforms like GPU due to the fact that GPUs don’t support these spatially dynamic execution patterns.
Our research investigates hardware constraints preventing such speedup and proposes and compares novel processor architectures and dataflows enabling latency improvements due to the dynamic execution with minimal loss of utilization. We propose two hardware architectures that flexibly support spatial execution of a broad range of convolutional layers. The first architecture has only 1 PE array which is used to map all the workloads. The second architecture has 2 differently configured PE arrays that can work in parallel. Our flexible architectures can handle both standard convolutional layers and depthwise layers. For the derived architectures, the spatial unrolling for each layer type is optimized and validated making use of the ZigZag design space exploration framework where appropriate.
This allows to benchmark and compare the hardware architectures on NNs for classification and human pose estimation, increasing throughput up to x1.9 and x2.3 compared to their static executions, respectively, outperforming GPU. SDNNs can bring the same order of magnitude speedup as other dynamic execution methods if the hardware is designed wisely, and SDNNs and other types of dynamic execution methods can to combined to use to get even larger hardware benefit. The architecture with 2 PE arrays is better for networks with depthwise layers; the architecture with 1 PE array is better for networks with only standard convolutional layers.
Framework for dataset construction including fused data from Human and Remotely Operated Vehicles (ROVs)
Rafaella ELIA , PhD Student, University of Cyprus
Remotely Operated Vehicles (ROVs) are used in many safety critical applications such as search and rescue missions, etc. In such cases, human operators operate the ROV in harsh conditions, causing stress and fatigue. This can lead to involuntary commands during the mission. We propose a shared-control mechanism for jointly monitoring the operator and the ROV. We monitor the operator by reading physiological signals and the ROV via its Inertial Measurement Unit (IMU) sensor data. The resulting fused dataset is used to design a resource-constrained neural inference controller placed on the ROV, responsible for detecting normal vs. abnormal operator commands.
Constrained neural networks for thermal vision
Massimiliano GALANTI, Graduating Student, Politecnico di Milano
Thermal vision is of increasing interest for multiple applications. Person detection for security and pandemic control is one of the most actual one. Leveraging on low cost and low power microcontrollers, a new breed of small, smart, redundant and self contained Vision and AI enabled devices can be designed. Using such resource constrained architectures poses new interesting challenges to AI engineers. Presented here is a case study in designing a suitable Convolutional Neural Network, validating it with X-CUBE-AI toolchain to profiling and optimizing its design for STM32 microcontroller based application to preserve accuracy while minimizing memory footprint, with and without tensorflow post training quantization, MCU load and execution time. Results demonstrate a -75% improvement in flash memory, -30% in RAM memory and 2x speedup can be obtained.
Low-Power License Plate Detection and Recognitionon a RISC-V Multi-Core MCU-based Vision System
Lorenzo LAMBERTI, PhD Student, University of Bologna, Italy
Battery-powered smart devices have the potential to revolutionize the Internet-of-Things (IoT) world. Due to their limited power envelope, MicroController units represent the ideal platform for IoT sensing-nodes. However, the limited power severely constraints the onboard processing capabilities of these devices, relegating computing-intensive vision tasks on high-end powerful MCUs only. We present the first low-power MCU-based edge device for Automatic License Plate Recognition. The design leverages on a 9-core parallel ultra-low-power RISC-V processor, achieving a throughput of 1.09 FPS when running a deep learning multi-model. Our solution is the first MCU-class device embedding such a level of network complexity (687MMAC), with a power cost of only 117mW.
Transprecise Object Detection to Maximise Real-Time Accuracy on the Edge
JunKyu LEE, Research Fellow, Queen’s University Belfast
Real-time video analytics on the edge is challenging as the computationally constrained resources typically cannot analyse video streams at full fidelity and frame rate, which results in loss of accuracy. This paper proposes a Transprecise Object Detector (TOD) which maximises the real-time object detection accuracy on an edge device by selecting an appropriate Deep Neural Network (DNN) on the fly with negligible computational overhead. TOD makes two key contributions over the state of the art: (1) TOD leverages characteristics of the video stream such as object size and speed of movement to identify networks with high prediction accuracy for the current frames; (2) it selects the best-performing network based on projected accuracy and computational demand using an effective and low-overhead decision mechanism. Experimental evaluation on a Jetson Nano demonstrates that TOD improves the average object detection precision by 34.7 % over the YOLOv4-tiny-288 model on average over the MOT17Det dataset. In the MOT17-05 test dataset, TOD utilises only 45.1 % of GPU resource and 62.7 % of the GPU board power without losing accuracy, compared to YOLOv4-416 model.
Comparing Industry Frameworks and Zoo for Deeply Quantized Neural Networks
Francesco LORO, Intern, STMicroelectronics
TinyML community gave increasing attention to frameworks for Deeply Quantized Neural Network (DQNN) that enable developments to meet challenging power consumption targets below mW.
Two DQNN alternative frameworks exist to date: QKeras and Larq. Starting from a comparative analysis of their features, we tested their performance in terms of accuracy and inference time.
The use cases presented are Human Activity Recognition networks and Anomaly Detection with accuracy results of 98.6% and PSNR up to 111.2 respectively .
Moreover, we coded an initial set of DQNN for imaging use case with QKeras such as BinaryAlexNet and BinaryResNet and compared with Larq.
 D. Pau, M. Lattuada, F. Loro, A. D. Vita, and G. D. Licciardo,“Comparing industry frameworks with deeply quantized neural networkson microcontrollers,” in2021 IEEE Eighth International Conference onCommunications and Electronics (ICCE), 2021.
Simulating Edge AI Inference to early predict Chronic obstructive pulmonary disease from Exhaled Breath Fingerprint
Samson Otieno OOKO, Postgraduate Student, University of Rwanda
According to the World Health Organization (WHO), every year over 4 million people die prematurely from different types of respiratory diseases, The high rate of mortality results from late diagnosis mainly occurring when a patient starts to experience symptoms and go to a healthcare facility; a diagnosis requiring expensive resources such as equipment and healthcare professionals to envision regular preventive check-ups for all populations. Affordable, free to use, noninvasive early prediction solutions for home use would anticipate on-time medical consultation and therefore treatment, and at the same time increase the datasets of biomarkers required to develop improved respiratory disease prediction analytics.
Existing portable non-invasive breath analysis studies and commercial solutions integrating the Internet of Things (IoT) and Machine Learning (ML) use a cloud-centered architecture, meaning that the breath biomarkers collected by IoT sensors are sent to the cloud for inferencing. Considering bandwidth-constrained scenarios and potentially privacy-preserving requirements of healthcare services, such cloud-centered solution is not viable making an offline breath analysis solution a necessity. Our Master thesis research fits in this context and explores rapid prototyping of an edge Artificial Intelligence (AI) embedded prototype that predicts respiratory disease from exhaled breath fingerprints with objectives to (1) enable offline free to use and regular checkup in home settings and (2) collect large curated datasets to enable development of analytics achieving clinical-grade accuracy.
Up to now, we have set in place a rapid development and simulation pipeline integrating Edge Impulse, STM32IDE Cube and Proteus VSM; respectively for (1) training dataset and generating a TinyML library, (2) integrating the generated library into our application and compile an executable for our target board and finally (3) simulate the edge AI inference process in an embedded context similar to targeted board. We are currently evaluating this set-up on an open-source Chronic obstructive pulmonary disease (COPD) time-series dataset but our design process is scalable to other diseases as long as corresponding datasets are available. This simulation setup around Proteus VSM is particularly interesting to further explore the impact of non-idealities in embedded hardware as a way to design robust edge AI models. The simulated inference results on Proteus matches pretty well the one on the remote edge impulse platform with a slight variation in classification accuracy. We are currently investigating the origin of this delta.
TinyML Platform for On-Device Continual Learning with Quantized Latent Replays
Leonardo RAVAGLIA, PhD Student, University of Bologna, Italy
In the last few years, research and development on Deep Learning models & techniques for ultra-low-power devices – in a word, TinyML – has mainly focused on a train-thendeploy
assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and finetuning. Latent Replay-based Continual Learning (CL) techniques  enable online, serverless adaptation in principle, but so far
they have still been too computation- and memory-hungry for ultra-low-power TinyML devices, which are typically based on microcontrollers. In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel
ultra-low-power (PULP) processor. We rethink the baseline Latent Replay CL algorithm, leveraging quantization of frontend and Latent Replays (LRs) to reduce their memory cost with minimal impact on accuracy. In particular, 8-bit compression
of the LR memory proves almost lossless compared to the full-precision baseline implementation, but requires 4_ less memory, while 7 bit can also be used with minimal accuracy degradation. We also introduce optimized primitives for forward
and backward propagation on the PULP processor, together with data tiling strategies to fully exploit its memory hierarchy, while maximizing efficiency. Our results show that by combining these techniques, continual learning can be achieved in practice
using less than 64MB of memory – an amount compatible with embedding in TinyML devices. On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs on average 42_ faster than a low-power STM32 L4 microcontroller, being 22_ more energy efficient – enough for a lifetime of 317h when learning a new mini-batch of data once every minute.
Pruning In Time (PIT): A Lightweight Network Architecture Optimizer for Temporal Convolutional Networks
Matteo RISSO, PhD Student, Politecnico di Torino, Italy
Temporal Convolutional Networks (TCNs) are promising Deep Learning models for time-series processing tasks. One key feature of TCNs is time-dilated convolution, whose optimization requires extensive experimentation. We propose an automatic dilation optimizer, which tackles the problem as a weight pruning on the time-axis, and learns dilation factors together with weights, in a single training. Our method reduces the model size and inference latency on a real SoC hardware target by up to 7.4x and 3x, respectively with no accuracy drop compared to a network without dilation. It also yields a rich set of Pareto-optimal TCNs starting from a single model, outperforming hand-designed solutions in both size and accuracy.
Mini-NAS: A Neural Architecture Search Framework for Small Scale Image Classification Applications
Shahid SIDDIQUI, PhD Student, KIOS Center of Excellence
Neural architecture search (NAS) approaches have shown promising results on benchmark image classification datasets. However, the considerable size of ImageNet or even CIFAR-10 adds to increased search costs, therefore, much of the effort has been spent on making NAS computationally feasible. Interestingly, many real world tinyML applications, 1) may come with datasets much smaller than even CIFAR, and 2) demand small memory footprint networks for edge deployment. Since every dataset is inherently unique, NAS should help discover an exclusive architecture for each and is therefore much needed for such datasets as it is for ImageNet. In this work, we first present a suit of 30 image classification datasets that mimics possible real world use cases. Next, we present a powerful yet minimal global search space that contains all vital ingredients to create structurally diverse still parameter efficient networks. Lastly, we propose an algorithm that can efficiently navigate a huge discrete search space and is specifically tailored for discovering high accuracy, low complexity tiny convolution networks. The proposed NAS system, Mini-NAS, on average, discovers 14.7× more parameter efficient networks for 30 datasets as compared to MobileNetV2 while achieving on par accuracy. On CIFAR-10, Mini-NAS discovers a model that is 2.3×, 1.9× and 1.2× smaller than the smallest models discovered by RL, gradient-based and evolutionary NAS methods respectively while the search cost is only 2.4 days.
Utilizing Static Code Generation in TinyML
Rafael STAHL, PhD Candidate, Technical University of Munich
The deployment of machine learning applications on micro-controllers known as TinyML enables advanced, low-power applications. Major challenges are posed by these resource-constrained devices in terms of run time, memory usage and safety. Existing machine learning frameworks provide runtime libraries that are weak in those aspects because they dynamically load and execute models. In this talk, we present deployment flows based on TensorFlow Lite for Microcontrollers and TVM, that improve these aspects through static code generation. The presented static code generator flows already provide some features that will ease the use in safety-critical software systems. Yet, there are still open challenges in deploying generated code from existing deployment flows in according to automotive safety standards, that will be discussed shortly in the talk. The code generators were evaluated on the TinyMLPerf benchmark and on average reduced the run time by 3.0x in TVM, working memory by 1.37x and read-only memory by 1.54x in TFLite Micro.
Squeeze-and-Threshold based quantization forLow-Precision Neural Networks
Binyi WU, PhD Student, Infineon Technologies AG
Problem statement: Deep Convolutional Neural Networks (DCNNs), widely used for image recognition, require a large amount of calculation and memory, making them infeasible to run on embedded devices. Among various effective techniques such as quantization and pruning, 8-bit quantization is the most widely used method. However, it is not sufficient for embedded devices with extremely limited hardware resource. Prior work has already demonstrated lower precision quantization is feasible but they have different schemes on 1-bit and multi-bit quantization. In this work, we proposed a new quantization method based on attention mechanism, which unify the binarization and multi-bit quantization of activations into one and demonstrate state-of-the-art performance.
Relevance to tinyML: The proposed low-precision (1,2,3,4-bit) quantization method is a neural network optimization method for low-power application.
Novelty: 1. First time to apply attention mechanism on quantization 2. A consistent method on 1-bit and multi-bit quantization
The floating-point convolution operation (left) is replaced with quantized convolution operation.
Runtime DNN Performance Scaling through Resource Management on Heterogeneous Embedded Platforms
Lei XUN, PhD Student, University of Southampton
DNN inference is increasingly being executed locally on embedded platforms, due to the clear advantages in latency, privacy and connectivity. Modern SoCs typically execute a combination of different and dynamic workloads concurrently, it is challenging to consistently meet latency/energy budgets because the local computing resources available to the DNN vary considerably. In this poster, we show how resource management can be applied to optimise the performance of DNN workloads by monitoring and tuning both software and hardware constantly at runtime. This work shows how dynamic DNNs trade-off accuracy with latency/energy/power on heterogeneous embedded CPU-GPU platform.
TinyML meets vibration-based Structural Health Monitoring: solving a binary classification problem at the edge
Federica ZONZINI, PhD Student, University of Bologna, Italy
Structural Health Monitoring (SHM) is a trending discipline aiming at assessing the integrity condition of structures throughout their life cycle and in their normal operations. Hence, low-latency, long-term and real-time functionalities are three pillar design criteria to be leveraged while designing a resilient and effective monitoring system.
To this end, the Tiny Machine Learning (TinyML) paradigm has very recently pioneered outstanding solutions capable to optimize both the time and the dimension of the data to be processed and shared among the SHM network. TinyML could indeed bring a radical shift of perspective, moving from cloud-based data analytics, which is usually performed on remote servers in a time and energy consuming manner, to sensor-near data inference, empowered to smart sensors in charge of processing information in a streaming fashion.
Within this scenario, damage diagnosis and prognosis are the two main tasks in which TinyML founds its natural application. Accordingly, the contribution of this work is to present the practical embodiment of TinyML architectures on resource-constrained devices, devoted to the health assessment of structures in dynamic regime. The latter comprise all those application domains which can be thoroughly described by frequency-related quantities, which are conventionally extracted by means of standard spectral analysis techniques.
More in detail, the structural evaluation process was tackled as a binary classification problem. Such an approach yielded to the design of two different neural networks, which require natural frequencies of vibration as inputs and provide the corresponding damage status (i.e. healthy/unhealthy structure) as output. The neural network topology was taken from a standard feedforward Autoassociative Neural Network (ANN). However, due to the high dependency of these frequency parameters from environmental and operational factors (EOF), the neural network models were corrected by including a set of structurally sensitive EOF data (e.g. temperature) as additional inputs to the model. Differently from conventional implementations, the primary aim in pursuing this strategy is to make the network self-adaptative in regressing the EOF-to-frequency relationship without needing any standard regression and/or compensation technique to be performed aside.
The tested ANN models have been initially coded in the Python TensorFlow programming environment; then, their distilled versions, characterized by much a fewer number of hyperparamaters but similar classification performances, have been obtained after conversion to TensorFlow Lite. Finally, the sought models have been ported on the Arduino Nano 33 BLE Sense platform and validated with experimental data from the Z24 bridge use case, reaching an average accuracy and precision of 92% and 91%, respectively. The maximum model size was kept below 10 KB and the maximally measured execution time inferior to 1.5 s.
6:15 pm to 8:15 pm
tinyML EMEA community building: matchmaking and networking session
The Horizon Europe Program and other national and EMEA regional programs are what drives research and innovation opportunities beyond the industrial initiatives. Specifically, the EMEA institutions have been actively involved in research and innovation activities funded by national and international programs. To participate in such programs, collaboration between various stakeholders is imperative! Realizing this, the tinyML EMEA Forum Organizing Committee is organizing this break-out session where such research and innovation programs and opportunities relevant to the EMEA tinyML community will be presented and discussed amongst attendees, providing an ideal networking opportunity for EMEA-based academia, industry, and all other interested entities and stakeholders, to identify, discuss and explore the possibility of joint undertaking activities in forming consortia and collaborating towards taking advantage of the funding opportunities. During the session, there will be short talks from policy makers (names to be confirmed), success stories from ongoing projects and national initiatives and networking activities such as meetup groups. Topics will include:
• Policy Makers
• Success Stories – Projects relevant to tinyML
• National Initiatives and Activities
Marco CECCARELLI, Programm Officer at Directorate General for Communications Networks, Content and Technology, European Comission
Cecile HUET, Deputy Head of the Robotics and Artificial Intelligence Innovation and Excellence Unit, European Commission
This session will give members of the European Commission the opportunity to present to a large part of the community involved with energy efficient machine learning. Topics such as tools for collaborative projects that will be available in the Horizon Europe and Digital Europe Programmes will be discussed These projects will certainly stimulate the activity in Europe.
Success Stories – Projects relevant to tinyML
tinyML research under the EU ERC program
Marian VERHELST, Associate Professor, KU Leuven
tinyML research should strike the complete spectrum between far out blue sky research, over maturing ideas to tech transfer in close collaboration with the industry. This testimony will talk about a tinyML project at the start of this research pipeline – funded by the European Research Council (ERC) – and its current and potential outcomes to the rest of the pipeline: Re-SENSE: https://www.kuleuven.be/english/research/EU/p/horizon2020/es/erc/re-sense.
STMicroelectronics’ AI Sensing Platform for early earthquake detection
Danilo PAU, Technical Director, IEEE and ST Fellow, STMicroelectronics
Earthquake detection and warning systems are crucial to ensure the timely evacuation of local populations and to avoid human tragedies.
The Use Case 3 (UC3) in SEMIoTICS aims to provide an innovative technology for distributed Artificial Intelligence, allowing low-power IoT field devices to perform data processing at the edge, and to address a wide range of applications, including earthquake detection. Based on STMicroelectronics’s AI Sensing platform, the UC3 system enables highly scalable distributed intelligence, implementing the highly non-linear approximation capabilities of Artificial Neural Networks, statistical analysis, and distributed computation, thereby enabling increased system scalability, safety, and robustness.
Spike-based neuromorphic computing for the extreme edge
Federico CORRADI, Senior Neuromorphic Researcher, IMEC
Neuromorphic sensing is an engineering paradigm for AI at the extreme edge. The neuromorphic key concepts underpinning our research are on-demand computing, sparsity, time-series processing, event-based sensory fusion, and learning. It can meet strict energy and cost constraints for robotic, automotive, and consumer sensing applications (vision, radar, lidar, etc.). IMEC develops compute architectures for event-based sensors, using neurons with co- located memory and processing. This short presentation will discuss our first neuromorphic spike-based architecture (μBrain) and illustrate its potential for various sensing tasks.
National Initiatives and Activities – tinyML EMEA Meetup Groups
Evgeni GOUSEV, Senior Director, Qualcomm Research, USA
A few of our 33 Meetup Groups from 23 countries will introduce the activities that their groups are involved in.
Schedule subject to change without notice.
Technical Program Chair
University of Cyprus
Technical Program Vice-Chair
ETHZ | University of Bologna
Qualcomm Research, USA
Emza Visual Sense
KIOS Research and Innovation Center of Excellence
Chair of Electronic Design Automation
Samson Otieno OOKO
University of Rwanda
Marios M. POLYCARPOU
University of Cyprus
Arizona State University
University of Trento
Emza Visual Sense
University of Cyprus
Institute of Neuroinformatics
Politecnico di Milano
Klepsydra Technologies GmbH
Gian Marco IODICE
Vijay JANAPA REDDI
Digital Catapult UK
KIOS Research and Innovation Center of Excellence
University of Bologna, Italy
Queen’s University Belfast
GreenWaves Technologies SAS
ETH Zurich, D-ITET
Dedan Kimathi University of Technology
European Commission - DG Communications Networks, Content and Technolog
Cadi Ayyad University, Morocco
The University of Rwanda
Morten Opprud JAKOBSEN
Samson Otieno OOKO
University of Rwanda
University of Bologna, Italy
Politecnico di Torino, Italy
KIOS Center of Excellence
Technical University of Munich
Infineon Technologies AG
University of Southampton
Infineon Technologies AG
University of Bologna, Italy