The tinyML EMEA Innovation Forum is accelerating the adoption of tiny machine learning across the region by connecting the efforts of the private sector with those of academia in pushing the boundaries of machine learning and artificial intelligence on ultra-low powered devices.
Stadhouderskade 12, Amsterdam, Netherlands, 1054 ES
- Tiny ML in the Real World
This track will cover the latest advancements in small-scale machine learning and their real-world applications, examining the opportunities and challenges of developing such solutions and addressing societal issues and Sustainable Development Goals (SDGs) through them. It will also explore the shift from traditional Digital Signal Processing (DSP) to deep learning-based techniques to handle the increasing complexity and volume of data generated by various devices and sensors.
- Algorithms and Optimization Techniques
This track will focus on recent innovations to bring highly efficient inference models to real devices by optimizing performance and energy on-device using techniques such as optimized network architectures and compression techniques. The track will also delve into emerging approaches to benchmarking performance on tiny devices.
- MLOps, development and deployment tools
This track will cover the crucial tools for enabling tinyML technology, including recent advancements in software for developing, optimizing and deploying tinyML solutions. It will also discuss data collection, pre-processing, and curation as an important step before the development phase, and best practices, methodologies, and tools to facilitate the whole process and make tinyML solutions ubiquitous in our lives.
- Hardware and Sensors
This track will focus on innovation and advancements within the tinyML hardware and sensor ecosystem, highlighting emerging trends that will shape the future of tinyML solutions. It will delve into the tinyML hardware and sensor ecosystem, showcasing current market-ready solutions as well as what’s on the horizon, exploring novel architectures like NPUs, custom hardware acceleration, and neuromorphic technology. The track will also cover new sensor paradigms and architectures and provide a comprehensive overview of the state of the art in tinyML hardware.
|-Algorithms and Optimization Techniques||Search Space Optimization in Hardware-Aware Neural Architecture Search||Dennis||Rieber||Bosch Research|
|-Algorithms and Optimization Techniques||Twofold Sparsity: Joint Bit- and Network-level Sparse Deep Neural Network for Energy-efficient RRAM-Based CIM||Foroozan||Karimzadeh||Georgia Institute of Technology|
|-Algorithms and Optimization Techniques||Quantization-Aware Neural Architecture Search for Efficient Semantic Segmentation on Edge Devices||Hiram||Rayo Torres Rodriguez||NXP Semiconductors|
|-Algorithms and Optimization Techniques||The BitBrain method for learning and inference at the edge||Michael||Hopkins||The University of Manchester|
|-Algorithms and Optimization Techniques||Face Recognition with binary networks||Simone||Moro||ST Microelectronics|
|-Algorithms and Optimization Techniques||Super Slim and Low Power Radar-based Gesture Recognition||Stephan||Schoenfeldt||Infineon AG|
|-Hardware and Sensors||The Role of FPGAs in TinyML Applications||Alexander||Montgomerie-Corcoran||Imperial College London|
|-Hardware and Sensors||SENECA: Flexible and Scalable Neuromorphic Processor||Guangzhi||Tang||imec Netherlands|
|-MLOps, development, and deployment tools||MicroMind – a toolkit for tinyML||Francesco||Paissan||Fondazione Bruno Kessler|
|-Research Poster – tinyML Algorithms||Hardware/Software Co-design for embedded AI with AutoML||Thomas||Elsken||Bosch|
|-Research Poster – tinyML Applications||XiNets for Edge Keypoint Detection||Alberto||Ancilotto||Fondazione Bruno Kessler|
|-Research Poster – tinyML Applications||Hardware-aware Neural Architecture Search for Medical Imaging Applications||Hadjer||Benmeziane||Université Polytechnique des Hauts-de-France|
|-Research Poster – tinyML Applications||Single-Shot Visual Object Detectors on Nano-Drones||Luca||Bompani||Alma Mater Studiorum Università di Bologna|
|-Research Poster – tinyML Applications||Evaluation of OBDII data Contribution in Tiny Machine Learning based Driving Behaviour Monitoring||Massimo||Merenda||Unirc|
|-Research Poster – tinyML Software||Enabling On-Device Learning on RISC-V Multicore MCUs||Davide||Nadalini||Politecnico di Torino / Università di Bologna|
|-Research Poster – tinyML Software||End-to-end evolutionary neural architecture search for microcontroller units||René||Groh||Friedrich-Alexander-Universität Erlangen-Nürnberg|
|-tinyML in the Real World||tinyML for Crime Prevention: Detecting Violent Conversations||Amna||Anwar||Nottingham Trent University|
|-tinyML in the Real World||Tiny Neural Deep Clustering: An Unsupervised Approach for Continual Machine Learning on the Edge||Andrea||Albanese||University of Trento|
|-tinyML in the Real World||The Impact and Challenges of Livestock Tracking in Internetless Environments with tinyML||Bradley||Patrick||Nottingham Trent University|
|-tinyML in the Real World||Advancing Micromobility safety by deploying compressed Lane Recognition CNN model on low-spec Microcontroller Unit||Chinmaya||Kaundanya||Dublin City University|
|-tinyML in the Real World||All On-Device anomaly detection in NanoEdge AI Studio||He||HUANG||STMicroelectronics|
|-tinyML in the Real World||μLightDigit: TinyML-Empowered Contactless Digit Recognition with Light||Jie||Jang||TU Delft|
|-tinyML in the Real World||Combining Multiple tinyML Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers||Kieran||Woodward||Nottingham Trent University|
|-tinyML in the Real World||Realizing the Power of Edge Intelligence: Addressing the Challenges in AI and tinyML Applications for Edge Computing||Michael||Gibbs||Nottingham Trent University|
|-tinyML in the Real World||Estimating Lubrication conditions in Ball Bearings using low a cost MEMS Microphone||Morten||Opprud||Aarhus University|
|-tinyML in the Real World||TinyDigitalExposome: The Opportunities of Multimodal Urban Environmental Data and Mental Wellbeing on Constrained Microcontrollers||Thomas||Johnson||Nottingham Trent University|
8:00 am to 9:00 am
9:00 am to 9:15 am
Session Moderator: Alessandro GRANDE, Head of Product, Edge Impulse
9:15 am to 10:00 am
Keynote by Steve Furber, ICL Professor of Computer Engineering, The University of Manchester
A Novel Mechanism for Edge ML
Steve FURBER, ICL Professor of Computer Engineering, The University of Manchester
10:00 am to 10:30 am
Break & Networking
10:30 am to 11:40 am
Tiny ML in the real world intro and Session
Session Moderator: Eiman KANJO, Professor of Pervasive Sensing and the head of the Smart Sensing Lab, Nottingham Trent University
From the cloud to the edge: reverse-engineering and downsizing a black box ML algorithm
Jan Jongboom, Co-founder and CTO, Edge Impulse
We talk a lot about new model types, new silicon, and greenfield applications – but what if a
customer already has an ML model that they want to improve on and scale down? And what if this model is a complete black box, licensed from a third party; and the customer doesn’t own any of the data or labels? And, to make matters even more complex, the model was trained on clinically validated data – costing at least a million dollars to replicate.
Join us for a tale of reverse-engineering a black box sleep stage prediction algorithm, doing
complex data collection on a budget and how to move compute from the cloud to the edge –
here to enable wake-up on light sleep, a new feature that’s impossible to do with their black
box algorithm – adding new functionality to a device that’s already on the market.
An Embedded EOG-based BCI System for Robotic Control
Valeria TOMASELLI, Senior Engineer, STMicroelectronics
Brain Computer Interface (BCI) or Human–Computer in-terface (HCI) is an emerging technology which allows to establish a direct communication link between the human
brain and an external device; it was mainly conceived to assist people with severe motor disabilities, helping them to reestablish communicative and environmental control abilities.
Patients suffering from disabilities such as locked-in syn-drome (LIS) often retain the ability to control their eye movements. Electrooculography (EOG) contains highly recog-
nizable information of eyelid movements such as blinks and winks which can be clearly recorded with low-cost devices.
This makes them exploitable as a valuable source of infor-mation especially for control application; as a result, EOG can enable impaired people to autonomously move around by
controlling Electrically Powered Wheelchairs (EPW), interact with their domestic smart environment or even communicate with others through the use of virtual spellers.
The development of alternative ways to control external de-vices, without relying on language or body motions, is impor-tant for both motor impaired and healthy subjects. Generally, BCI systems in this field have some drawbacks which can be
summarized in the following points: few control dimensions, low classification accuracy, the need to execute commands synchronously with an external stimulus, the requirement of
extensive training from the subjects to be able to control the system. In order to address some of the most impeding disadvan-tages of the BCI technology, we realized a wearable BCI system that runs all the necessary steps from acquisition to the final inference and command transmission on a small SoC powered by MCU (MicroController Unit) and batteries. TinyML is well-suited for these systems due to the limited resources available on many BCI devices. By using small and efficient tinyML models, it is possible to run the necessary algorithms on resource constrained devices, such as MCUs.
Firstly, we treat EOG signals as a source of control commands, since these signals are highly distinguishable from the rest of the brain activity and also because they can be easily
generated by a user, without any previous training on the system usage. We collected left, right winks and both voluntary and involuntary blinks on two channels from the Fp1 and
Fp2 electrodes, shown in Fig. 1, with the aim to use them as control commands. The effort to distinguish voluntary and involuntary eye blinks is made to prevent unwanted inputs.
The whole process, from acquisition to the final inference, is achieved through three processing blocks embedded in the firmware of the device (e.g. pre-processing, event detection and event classification) which are the main focus of this work.
Firstly, the signals are pre-processed with digital filters to remove external noise from data. After that, an event detection algorithm is used to select portions of the incoming data
stream, with some clear activity, that are passed as inputs to the classifier for the recognition of such activity. The event detector is useful to achieve asynchronous operation and allows to improve the usage of computational resources of the microcontroller, avoiding constant classification of idle state where there are no useful commands. Finally, the selected signals go through the event classifier which is a 1-dimensional
(Conv 1D) CNN model to classify voluntary and involuntary eye blinks and left/right eye winks. The tinyML model was trained with the dataset we collected from eight volunteers,
using our custom board. After the training phase, the model was converted to a C equivalent architecture and embedded in the firmware. The developed tinyML model is well below the
MCUs computational constraint while achieving an average classification accuracy of the four classes of 99,3%. The proposed BCI system has been used to remotely control three
degrees of freedom (DoF) wheeled robot using left/right winks as rotation commands and single and double blinks as go and stop commands. Different subjects, both with some and
no experience with the BCI system, controlled the robot by following a traced path on the floor without any difficulty, achieving the reported command accuracy in the real-world
11:00 am to 11:20 am
Intro to Research Posters
Session Moderator: Tinoosh MOHSENIN, Associate Professor, University of Maryland Baltimore County
Hardware/Software Co-design for embedded AI with AutoML
XiNets for Edge Keypoint Detection
Hardware-aware Neural Architecture Search for Medical Imaging Applications
Single-Shot Visual Object Detectors on Nano-Drones
Evaluation of OBDII data contribution in Tiny Machine Learning based Driving Behaviour Monitoring
Enabling On-Device Learning on RISC-V Multicore MCUs
End-to-end evolutionary neural architecture search for microcontroller units
11:20 am to 12:30 pm
Posters (Tiny ML in the real world) and Demos
tinyML for Crime Prevention: Detecting Violent Conversations
Tiny Neural Deep Clustering: An Unsupervised Approach for Continual Machine Learning on the Edge
The Impact and Challenges of Livestock Tracking in Internetless Environments with tinyML
Advancing Micromobility safety by deploying compressed Lane Recognition CNN model on low-spec Microcontroller Unit
μLightDigit: TinyML-Empowered Contactless Digit Recognition with Light
Combining Multiple tinyML Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers
Realising the Power of Edge Intelligence: Addressing the Challenges in AI and tinyML Applications for Edge Computing
Estimating Lubrication conditions in Ball Bearings using low a cost MEMS Microphone
TinyDigitalExposome: The Opportunities of Multimodal Urban Environmental Data and Mental Wellbeing on Constrained Microcontrollers
12:30 pm to 2:00 pm
Lunch and Networking
2:00 pm to 2:50 pm
Tiny ML in the real world Session - Part 2
Session Moderator: Thomas BASIKOLO, Programme Officer, ITU
How to build an ML-powered doorbell notifier
Sandeep MISTRY, Principal SW Engineer and Developer Evangelist IoT, Arm
This talk will give an overview of an audio classification tinyML system built on top of an Arm Cortex-M33 based Realtek RTL8721DM SoC.
The presentation will cover:
1) How open audio datasets and transfer learning can be used to train a TensorFlow Lite
model with TensorFlow’s signal processing and Keras APIs;
2) How to port the feature extraction pipeline to the SoC using CMSIS-DSP and run ML
interfacing using TensorFlow Lite for Microcontrollers with CMSIS-NN accelerated
3) How the SoC’s built-in Wi-Fi connectivity is used only when the model detects audio
sounds of interest.
You will be able to see how Realtek’s RTL8721DM SoC’s compute, and resources, leave
ample room to explore more complex model architectures for other tinyML audio
classification use cases.
All the code used in the project will be made available on GitHub after the presentation for
attendees to take a deeper dive.
New-gen era of always-on devices: Ultra Tiny ML KWS model and HAR Model embedded into the sensor.
Blair NEWMAN, CTO, Neuton
Modern always-on devices either have limited functionality or discharge very fast.
Developers have to compromise between analysis/functionality complexity and energy capacity. At Neuton.AI, we managed to establish an innovative approach that helps us develop extremely compact neural networks able to recognize complex activities with the minimum energy and memory required.
We actually step in and enable the next generation of always on devices making it possible to build complex functionality that consumes a minimum amount of energy and memory.
The evolution of smart sensors is bringing IIoT devices to the next level. It allows them to interact with the world in more sophisticated ways. Ultra-low power sensors operate at the microwatt level with built-in AI opening up never-before-seen possibilities for intelligent devices letting them sense, process and take actions on their own while running on a single coin battery for years.
We have developed the first Neural Network Framework designed to create ML models of minimal size without loss of accuracy. It is a unique patented machine learning algorithm that forgoes error back propagation and stochastic gradient descent, growing the network structure neuron by neuron. This helps to build neural networks without compromising between size and accuracy. In addition it has:
- excellent generalizing capability
- with minimal size, often less than 1 Kb
- without loss of accuracy
- without compression techniques
Applying our novel approach allows users to:
- Create the most energy-efficient models for always-on connectivity solutions
- Embed models into tiny pieces of hardware, such as ultra-low-power
microcontrollers and sensors
- Spend less energy on ML inference, extend device battery life
- Have the smallest footprint for a TinyML solution that leaves more room for your valuable business logic
In this session, we will show you a live demo of the HAR model embedded into the sensor that perfectly illustrates the uniqueness of our approach.
You will also learn how this model can be created automatically without special knowledge of data science.
HAR model embedded into the intelligent sensor- a live demo in Human Activity Recognition with a model size of only 0.3 Kb in size and 98% accuracy. This model can recognize 6 classes of activity of the most optimal size to-accuracy ratio with the never-before-seen tiniest footprint that allowed to embed it into a low-power resource-constrained Intelligent sensor.
We are convinced that advanced approaches to building compact neural networks will eventually open new horizons for modern IoT products. What’s more, always-on devices will get a more complex functionality along with enhanced energy-efficient AI solutions. On the other hand, simpler hardware will let producers optimize their HW costs.
2:50 pm to 3:20 pm
Break & Networking
3:20 pm to 4:10 pm
Tiny ML in the real world Session - Part 3
Session Moderator: Hajar MOUSANNIF, Associate Professor, Cadi Ayyad University, Morocco
Monitoring of Vital Signs using Embedded AI in wearable devices
Lina WEI, Machine Learning Engineer, 7 Sensing Software
Vital signs are measurements of the body’s basic functions, such as Respiratory Rate, Heart Rate and Blood Pressure. Monitoring vital signs allows us to assess our wellbeing and detect underlying health issues at an early stage. For example, respiratory rate is an important marker of health; elevated respiratory rate values (> 27 bpm) have been shown to be predictive of cardiopulmonary arrest . Respiratory rate is often neglected due to lack of unobtrusive sensors for objective and convenient measurement. Recent improvements of photoplethysmogram (PPG) and growing interest in wearable devices promote the development of digital health. In this presentation, we will show how the combination of ams OSRAM medical and health sensors, and 7 Sensing Software’s embedded AI technology enables the unobtrusive and daily monitoring of Respiratory Rate. The deep learning-based solution has overcome the difficulties caused by data complexity and achieved a performance comparable to that of medical-grade devices. The deep learning model is converted and optimized to be compatible with low-end micro-controllers. The presented solution is currently being deployed to smart watches by OEMs for daily respiratory rate monitoring.
Familiar Face Identification on MCUs: A Privacy-Preserving Solution for Personalizing Your Devices
Tim de BRUIN, Deep Learning Researcher, Plumerai
Imagine a TV that shows tailored recommendations and adjusts the volume for each
viewer, or a video doorbell that notifies you when a stranger is at the door. A coffee
machine that knows exactly what you want so you only have to confirm. A car that
adjusts the seat as soon as you get in, because it knows who you are. All of this and
more is possible with Familiar Face Identification, a technology that enables devices
to recognize their users and personalize their settings accordingly.
Unfortunately, common methods for Familiar Face Identification are either inaccurate
or require running expensive models in the cloud, with all of the security and energy
compromises that come with cloud computing.
At Plumerai, we are on a mission to make AI tiny. We have recently succeeded in
bringing Familiar Face Identification to microcontrollers. This makes it possible to
identify users entirely locally — and therefore securely, using very little energy and
with very low-cost hardware.
Our solution uses an end-to-end deep learning approach that consists of three
neural networks: one for object detection, one for face representation, and one for
face matching. We have applied various advanced model compression and training
techniques to make these networks fit within the hardware constraints of
microcontrollers, while retaining excellent accuracy.
In this talk, we will present the techniques we used to achieve Familiar Face
Identification on microcontrollers and demonstrate our product in action by giving a
live demo. We will also discuss some of the practical challenges and lessons
learned from building this product and how they differ from the academic literature
on Familiar Face Identification. We believe our solution opens up new possibilities
for user-friendly and privacy-preserving applications on tiny devices.
4:10 pm to 5:00 pm
Panel - Benchmarking
Session Moderator: Petrut BOGDAN, Neuromorphic Architect, Innatera
5:00 pm to 5:30 pm
Posters & Demos
7:00 pm to 10:00 pm
Social Event - Kaasbar
You need to be registered for the social event
9:00 am to 9:05 am
Session Moderator: Hajar MOUSANNIF, Associate Professor, Cadi Ayyad University, Morocco
9:05 am to 9:50 am
Keynote by Lisa Trollo - STMicroelectronics
Smart, open and accurate: sensors in the sustainable Onlife era
LISA TROLLO, Artificial Intelligence Strategy, STMicroelectronics
What do we expect from technology today? Evolving at a rapid pace, today’s technology must keep us safe and protect our planet to ensure a sustainable future. Helping improve interactions between humankind and the environment, technology must remain non-invasive while enhancing our creativity, for a human-centric digital transformation. As we enter the Onlife era, with the increasing fusion of technology into our society and our daily lives, sensors are essential in making our world a better place and more sustainable.
ST is enabling this transition to the Onlife era with accurate, smart, and open-source sensing devices designed to optimize edge computing and create ultra-low-power systems that bring innovative solutions to environmental and social challenges.
9:50 am to 10:35 am
MLOps, development and deployment tools
Session Moderator: Dirk STANEKER, Group Leader, Bosch Sensortec GmbH
Change for the Better: Improving Predictions by Automating Drift Detection
Paola Andrea JARAMILLO GARCIA, Technical Manager Application Engineering, The MathWorks
A machine learning solution is only as good as its data. But real-world data does not always
stay within the bounds of the training set, posing a significant challenge for the data scientist: how to detect and respond to drifting data? Drifting data poses three problems: detecting and assessing drift-related model performance degradation; generating a more accurate model from the new data; and deploying a new model into an existing machine learning pipeline. Using a real-world predictive maintenance problem, we demonstrate a solution that addresses each of these challenges: data drift detection algorithms periodically evaluate observation variability and model prediction accuracy; high-fidelity physics-based simulation models precisely label new data; and integration with industry-standard machine learning pipelines supports continuous integration and deployment. We reduce the level of expertise required to operate the system by automating both drift detection and data labelling. Process automation reduces costs and increases reliability. The lockdowns and social distancing of the last two years reveal another advantage: minimizing human intervention and interaction to reduce risk while supporting
essential social services. As we emerge from the worst of this pandemic, accelerating adoption of machine autonomy increases the demand for the automation of human expertise. Consider a fleet of electric vehicles used for autonomous package delivery. Their batteries degrade over time, increasing charging time and diminishing vehicle range. The batteries are large and expensive to replace and relying on a statistical estimate of battery lifetime inevitably results in replacing some batteries too soon and some too late. A more cost-effective approach collects battery health and performance data from each vehicle and uses machine learning models to predict the remaining useful lifetime of each battery. But changes in the operating environment may introduce drift into health and performance data. External temperature, for example, affects battery maximum charge and discharge rate. And then the model predictions become less accurate. Our solution streams battery data through Kafka to production and training subsystems: a MATLAB Production Server-deployed model that predicts each battery’s remaining useful lifetime and a thermodynamically accurate physical Simulink model of the battery that automatically labels the data for use in training new models. Since simulation-based labeling is much slower than model-based prediction, the simulation cannot be used in production. The production subsystem monitors the deployed model and the streaming data to
detect drift. Drift-induced model accuracy degradation triggers the training system to create new models from the most current training sets. Newly trained models are uploaded to a model registry where the production system can retrieve and integrate them into the deployed machine learning pipeline.
Creating end-to-end Tiny ML application for the Ethos-U NPU in the cloud
George GEKOV, Application Engineer, Arm
The Arm Ethos-U55 and Ethos-U65 microNPUs are new class of microprocessors designed specifically for Machine Learning. They provide best in class performance per Watt for the ML operators that are commonly used by the tinyML neural networks.
In this talk, George will explain what the key factors are to consider and pitfalls to avoid
when designing ML models for tinyML applications. Furthermore, he will demonstrate how you can create an end-to-end tinyML application targeting Arm’s Cortex-M and Ethos-U today, even if you don’t have access to the latest silicon yet. Arm Virtual Hardware allows you achieve just that – start software product development early, experiment with ideas and test your application code. When silicon becomes available you will be in position to deploy your code in a frictionless manner, significantly reducing the time to market of your product.
10:35 am to 11:05 am
Break & Networking
11:05 am to 12:10 pm
MLOps, development and deployment tools - Part 2
Session Moderator: Valeria TOMASELLI, Senior Engineer, STMicroelectronics
MILEA – An Approach for Small Scale Applications
Kathrin GERHARD, Software developer / project management, Robert Bosch GmbH
Introduction to MILEA
The focus of this contribution is not about low power or very small devices but microcontroller for engine control e.g.,
TriCore®. The big challenges in this domain are the limited resources, real time, and compliance with safety requirements. Without recompilation, deployment happens by changing data instead of code. Most of the large number of processes of an engine control (about 2700) are already blocked by tasks, that are not suitable for AI. This leads to the fact, that today only a small fraction of tasks uses AI. With this boundary conditions in
mind, BOSCH developed MILEA (Machine Intelligence Library for Embedded Applications). It’s small, efficient, flexible and easy to use. Besides the expressed reasoning, the library is designed to meet the strict safety regulations for automotive software.
AI based tasks focus on time-series or one dimensional, e.g., compensating aging effects of sensors and actuators, modelling virtual sensors or plausibility checks of measured data required for diagnosis.
Technical Description and MILEA Workflow
All functions are designed as small interpreter on a FlatBuffers description. There are different interpreters for all kind of AI algorithms and can be extended on customer demand:
– Neural Net Sequential with different layers (Dense, LSTM, GRU, Conv1D, Add, Activation Functions)
– Tree Algorithms (Binary Decision Tree, Random Forest, Boosting)
– Support Vector Machine (SVM)
– Statistical functions like Kolmogorow-Smirnow-Test
– Property function to support online learning of multi-dimensional maps
1. Train a model
The model development is done by the customer and supported by ETAS ASCMO. ETAS ASCMO is a standard tool used in automotive industry to analyze and process measurement data. The measurements coming from vehicle usage or testbench can be visualized, analyzed and trained. The current major usage is to calibrate vehicle parameter like characteristic maps. Moreover, a python converter for additional functions, which are not yet supported by ETAS ASCMO, can be used. In this case the customer provides trained models directly with tools like Keras, scikit or MATLAB.
2. Export a model
MILEA provides an interface that is based on the FlatBuffers technology. This open-source format is easy to use, flexible
and ensures consistency in your parametrization when deploying it on the control unit. The generated FlatBuffers
format (binary) that can be deployed via a calibration file, e.g., cdf or dcm. Hence, the workflow is similar to
conventional calibration data.
The preferred method to export the model is ETAS ASCMO, see “Train a model”.
3. Run the model on the embedded device
One of the key advantages of MILEA is the easy deployment. To actually run the model, you have to include the MILEA library in the software (C-code). There are no hardware dependencies. In a second step, you call the corresponding function in a task of your choice and handover the calibration data (FlatBuffers) and your model inputs. This is like calling an interpolation routine and handing over a characteristic map and the operation point.
Each MILEA function is built as C implementation and compliant with MISRA 2004, MISRA 2011 and guarantees automotive safety: ISO26262. In total, the overall number of the functions are small and highly optimized for running these models on a μC-powered device. The whole library is just about 5000 LOC.
MILEA ensures transferable model behavior on the embedded device, so validation can be done mainly on PC.
Outlook and summary
By this simple workflow, MILEA speeds up the development and gives the developers a robust tool in hand. The idea is to extend MILEA according to the customer’s demand. An integer implementation as well as some additional algorithms and methods are planned.
The application possibilities are wide-ranging. They are going from sensor plausibility over onboard diagnosis to data reduction for development in the cloud.
To summarize, the library MILEA is small, efficient, flexible and easy to use.
pip install edgeimpulse – A programmatic approach to automate your MLOps Pipelines
Louis MOREAU, Senior DevRel Engineer, Edge Impulse
In this session, attendees will learn how to set up and automate a machine-learning pipeline
using Edge Impulse’s newly-released Python SDK.
The Edge Impulse Python SDK is a powerful tool for automating MLOps pipelines, allowing
developers to streamline the process of building, training, and deploying machine learning
models at the edge or turning existing deep-learning models into embedded-optimized C++.
By providing a programmatically accessible interface to Edge Impulse’s cloud-based
development platform, the Python SDK simplifies the integration of machine learning into
embedded systems and IoT devices.
Tiny-MLOps: orchestrate ML applications at the edge of the network and beyond
Mattia ANTONINI, Researcher, Fondazione Bruno Kessler
Pushing AI capabilities into Internet of Things (IoT) devices has the potential to revolutionize multiple industrial domains in the next years. Currently, Machine Learning (ML) models are mainly deployed to computing machines that run a fully-fledged Linux distribution. In this category, single board computers (SBCs) offer a good balance between the size of the device and the actual cost of around a few tens of dollars. However, SBCs impose energy requirements limiting their actual applicability in multiple domains where devices need to be battery-powered. In contrast, devices powered by 32-bit microcontrollers offer a cost-effective and energy-efficient option, but their limited resources and non-Linux-based embedded operating systems hinder hosting ML models and pose strong challenges in orchestrating ML applications and pipelines. In this presentation, we introduce and propose a possible vision of the Tiny-MLOps framework1 as the natural evolution of ML orchestration practices by including embedded devices running at the far edge of the network. Each phase of the classical ML orchestration loop is tailored to accommodate the resource constraints of typical IoT devices, i.e., a few tens or hundreds of KB of RAM and flash memory. The Tiny-MLOps framework is represented by extending the ML infinite loop creating a new loop of 8 steps, split in the TinyML and Ops circles. Leveraging on the Tiny-MLOps loop and also benefiting from the locality principle, we present a real case study 2,3 where the framework has been successfully applied to deploy and orchestrate an anomaly detection pipeline on a resource-constrained IoT sensor.
The sensor is placed inside an underwater pump available in a wastewater management plant, which is an extreme industrial environment (device inaccessibility, limited bandwidth, extreme environmental conditions, etc.). The framework helps to manage the model, including model configuration, onboard training, model inference, monitoring, and triggering the next iteration of the framework. Our experiments support the feasibility of designing the next-get IoT applications with the Tiny-MLOps framework helping in the adaption and evolution to different conditions, scenarios, and environments.
12:10 pm to 1:00 pm
Posters (MLOps) and Demos
Session Moderator: Vitaly KLEBAN, Co-founder and CTO, Everynet
1:00 pm to 2:00 pm
Lunch & Networking
2:00 pm to 3:15 pm
Algorithms and Optimization Techniques
Session Moderator: Andrea DUNBAR, Head of Sector Edge AI and Vision, CSEM
TinyDenoiser: RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Marco FARISELLI, Embedded ML Engineer, Greenwaves Technologies
This talk presents an optimized methodology to design and deploy Speech
Enhancement (SE) algorithms based on Recurrent Neural Networks (RNNs) on a
state-of-the-art MicroController Unit (MCU) GAP9, with 1+9 general-purpose RISC-V cores
and support for vector 8-bit integer (INT8) and 16-bit floating-point (FP16)
arithmetic. To achieve low-latency execution, we propose a software pipeline interleaving parallel computation of LSTM or GRU recurrent units with manually-managed memory transfers of the model parameters. To ensure minimal accuracy degradation with respect to the full-precision models, we also propose a novel FP16-INT8 Mixed-Precision Post-Training Quantization (PTQ) scheme that compresses the recurrent layers to 8-bit while the bit precision of remaining layers is kept to FP16.
Experiments are conducted on multiple LSTM and GRU based SE models belonging
to the TinyDenoiser family and featuring up to 1.24M parameters. Thanks to the
proposed approach, we speed-up the computation by up to 4× with respect to the
lossless FP16 baselines, while showing a low-degradation of the PESQ score. Our
design results >10× more energy efficient than state-of-the-art SE solutions deployed on single-core MCUs that make use of smaller models and quantization-aware training.
Audio-Visual Active Speaker Detection on Embedded Devices
Baptiste POUTHIER, PhD Student, NXP Semiconductors
Active Speaker Detection (ASD) is the task of identifying active speakers in a video by analyzing both visual and audio features. It is a key component in human-robot interactions, for speech enhancement, and for video re-targeting in video-conferencing systems. Over the last decade, advances in machine learning have paved the way for highly reliable ASD methods. However, since both the visual and audio signals must be
processed and analyzed, these methods are extremely computationally demanding and therefore impractical for micro-controllers. For instance, most ASD models have tens of millions of parameters. Moreover, in standard use-cases like video conferencing, the model needs to run in real-time (at least 25 video frames per second) while tracking and processing multiple potential talkers. To meet the challenge, we have developed
a set of state-of-the-art ASD models with a drastic cut of the computational costs. The originality of our approach is to leverage a multi-objective optimization and a novel modality fusion scheme. In particular, we focused on building two models featuring additional architectural and optimization changes to fit two hardware configurations:
– A model that runs on high-end NXP MPU featured with quad Arm Cortex-A53 processor and with a Neural Processing Unit (NPU)
– A tiny model that runs on the dual core i.MX RT1170 MCU with Arm Cortex-M7 core at 1GHz and Arm Cortex-M4 at 400 MHz
The models are end-to-end deep learning architectures following the same block diagram The network is based on a two-branch architecture with each branch processing either the audio or the visual signal. The audio and visual embeddings are finally combined within the “fusion” block that outputs the probability of an individual speaking. This information is used by downstream algorithms, such as speech enhancement and video re-targeting, which are beyond the scope of our presentation.
All the network components are designed for hardware requirements: the fusion block, convolutional layers and temporal sequence modeling are indeed modified to optimize the model performance. The input signals are also processed accordingly: the resolution of the data and the temporal contexts used by the network are adapted to the different hardware capabilities. Our presentation is about the whole optimization and porting process, from the model design changes to the quantization and the integration on NXP devices. For each model, we focus the performance analysis on the trade-off between the computational burden and the system accuracy.
Sensitivity analysis of hyperparameters in deep neural-network pruning
Jonna MATTHIESEN, Deep Learning Researcher /Master thesis student, Embedl
Deep neural-network pruning plays an integral part for deployment to resource constrained devices. By adopting the right pruning strategies to the right hardware, it is possible to significantly reduce the inference latency, memory footprint and energy consumption without affecting the network’s performance considerably. It is particularly useful to have structured and robust pruning methods for dynamically scaling the computational need of a model when deploying to a plurality of platforms since manually redesigning the model can be too tedious or even infeasible. It is clear that pruning will continue to play an important role in enabling tinyML, even as the performance of embedded hardware increases, with an ever-increasing need to deploy bigger and better models while keeping the latency, energy, and memory demands within permissible budgets.
Structured pruning, where entire filters/channels or groups of operations are removed is a proven way to speedup models on hardware. The best methods involve fine-tuning or pruning the model iteratively during training. Pruning is, thus, tightly coupled with the act of training. A key element in training deep neural networks is the choice of hyperparameters. Since structured pruning modifies the actual model architecture, it is unclear how it will affect the choice of hyperparameters. To answer this question, we have investigated the sensitivity of hyperparameters under structured neural network pruning.
First, we use state-of-the-art hyperparameter optimization (HPO) methods, such as Bayesian optimization, and Bayesian optimization together with Hyperband (BOHB), to find the best possible set of hyperparameters for training a variation of models on public datasets. We then span a bigger region of the hyperparameter-space by performing a grid search around the vicinity of the optimal hyperparameter-set from the bayesian-based methods in order to extract an approximate hyperparameter-performance distribution. We then prune the models to various degrees and perform a new grid search on the compressed model. Finally, the sensitivity is captured and quantified in a distance metric for distributions. By observing the shift in the hyperparameter-performance distribution between the original model and pruned model, we are able to identify how sensitive hyperparameters are to pruning and how aggressively models can be pruned before the hyperparameters need to be reconsidered for optimal performance.
From a practical perspective, understanding how pruning affects the choice of hyperparameters is of crucial importance for maximizing the performance of networks running on resource-limited hardware. However, It is also interesting from a more fundamental perspective in understanding how neural networks work and are able to generalize well.
3:15 pm to 3:45 pm
Break & Networking
3:45 pm to 5:00 pm
Algorithms and Optimization Techniques Session - Part 2
Session Moderator: Martin CROOME, Vice President Marketing, GreenWaves
Exploiting forward-forward based algorithm for training on device
Marco LATTUADA, Senior Software Engineer, STMicroelectronics
In recent years, there has been a growing interest in training machine learning models on devices, rather than in the Cloud or on a centralized server. This approach, known as on-device training, has several advantages, including improved privacy and reduced latency. Tiny Machine Learning (TinyML) is becoming a novel way to deliver intelligence into constrained hardware devices e.g., Micro Controlling Units (MCUs) for the realization of low power tailored applications. The training of deep learning models on embedded systems is a very challenging process mainly due to their low amount of memory, energy, and computing power which significantly limit the task complexity that can be executed, thus making impossible the use of traditional training algorithms such as backpropagation. To overcome this issue, various techniques have been proposed, such as model compression and quantization, which reduce the size and complexity of a model, and transfer
learning, which uses pre-trained models as a starting point. However, these solutions only address the problems related to the deployment and the inference steps, and become ineffective when there is the necessity of learning new patterns in real-time.
In such a context, the goal should be the realization of an on-device training/inference system able to learn and generate predictions without the need of external components. Forward-Forward (FF) is a novel training algorithm that has been recently
proposed as an alternative to backpropagation when computing power is an issue . Unlike backpropagation, this algorithm split a neural network architecture into multiple layers which are individually trained, without the need of storing activities
and gradients, thus reducing the amount of computing power, energy and memory required.
Formally speaking, FF algorithm is a learning procedure that takes inspiration from Boltzmann machines and noise contrastive estimation. The base idea of this algorithm is to replace the forward and backward passes of backpropagation with two forward passes having opposite objectives. To do so, FF introduces a new metric called goodness calculated as the sum of the squared activities of a given layer:
muRISCV-NN: Deep-Learning Inference Kernels for Embedded Platforms using the RISC-V Vector and Packed Extensions
Philipp VAN KEMPEN, PhD Student / Chair of Electronic Design Automation, Technical University of Munich
With the rapid adoption of deep learning workloads to resource-constrained edge devices,
efficient and data-parallel computing paradigms are becoming increasingly important. To this end, the RISC-V ISA provides two attractive data parallel extensions: The super-word
parallel Vector V extension and the sub-word parallel Packed P extension. An increasing
number of both academic and commercial RISC-V processors are already implementing
these extensions. They provide powerful data computation capabilities to accelerate deep
learning workloads at the edge. However, the RISC-V ecosystem lacks a lightweight,
open-source, and vendor-agnostic compute library to support these extensions on
embedded and ultra-low-power platforms. This requires every processor designer to
implement and ship a custom compute-library implementation..
We introduce muRISCV-NN, an open-source compute library for embedded and
microcontroller class systems. muRISCV-NN targets to provide an open-source, and
vendor-agnostic compute library targeting all RISC-V-compliant platforms for supplying a
HW/SW interface between industry-standard deep learning libraries and emerging
ultra-low-power compute platforms. Forked from ARM’s CMSIS-NN library, muRISCV-NN
provides optimized scalar kernels written in plain C as an efficient and highly portable
baseline. Additionally, we provide hand-optimized vectorized kernels employing either the V or P extensions. muRISCV-NN is designed to be lightweight and modular, and is
implemented as a static library that can be linked to the application software and accessed
through a single header file. Furthermore, muRISCV-NN is bit-accurate to CMSIS-NN and
can, thus, be used as a drop-in replacement with only minor changes to the compilation flow.
This makes its use with higher-level frameworks completely transparent and enables a
seamless transition from ARM-based systems to RISC-V. As a proof of concept, we provide
full integration support with both TensorFlow Lite for Microcontrollers and microTVM. We
demonstrate the effectiveness of muRISCV-NN on the MLPerf Tiny benchmark, observing
up to a 9x speedup and 5x EDP reduction compared to the plain C-Version of CMSIS-NN
across all four benchmarks.
muRISCV-NN supports the latest RISC-V vector v1.0 and packed v0.9.6 specification,
enabling it to run on many open-source and commercial RISC-V processors and simulators.
The instruction-level simulators supported by the library include Spike, OVPsim, and ETISS. RISC-V processor support exists for TU Wien’s Vicuna, with active work being done on supporting ETH’s Ara and Spatz cores. In addition, work is ongoing to backport the library to provide support for commercial cores that were taped out before the introduction of the v1.0 vector specification. In terms of toolchain support, muRISCV-NN can be compiled with both GCC and LLVM.
The muRISCV-NN project is open source and fully available on GitHub.
Advances in quantization for efficient on-device inference
Mart VAN BAALEN, Staff Engineer/Manager, Qualcomm AI Research in Amsterdam
Deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on edge devices within tight power and thermal budgets. Quantization is particularly important because it allows for automated reduction of weights and activations to improve power efficiency and performance while maintaining accuracy. This talk will cover:
- FP8 vs INT8 formats for efficient inference
- Oscillations in quantization-aware-training
- Removing outliers for improved quantization of transformers and LLMs
- New mixed-precision methods
5:00 pm to 5:30 pm
Demos & Posters (Algorithms and Optimization Techniques)
Session Moderator: Martin CROOME, Vice President Marketing, GreenWaves
Super Slim and Low Power Radar-based Gesture Recognition
Face Recongnition with binary networks
The BitBrain method for learning and inference at the edge
Quantization-Aware Neural Architecture Search for Efficient Semantic Segmentation on Edge Devices
Twofold Sparsity: Joint Bit- and Network-level Sparse Deep Neural Network for Energy-efficient RRAM Based CIM
Search Space Optimization in Hardware-Aware Neural Architecture Search
6:30 pm to 9:00 pm
VIP Dinner Networking
You need to be registered for the dinner
9:00 am to 9:05 am
Session Moderator: Alessandro GRANDE, Head of Product, Edge Impulse
9:05 am to 9:50 am
Keynote by Marian Verhelst, KU Leuven
Should Tiny ML Processors be Multi-core?
Marian VERHELST, Associate Professor, KU Leuven
The real-time deployment of tinyML algorithms towards the dreams of smart spaces augmented humans, or personalized healthcare requires responsiveness at affordable energy or power budgets. To this end, many ML-optimized custom processors have been presented over the past decade. In their quest for higher and higher throughput and efficiency, ML accelerators have evolved from small single-core designs, over scaled-up datapath arrays, to multi-core implementations. While this trend towards multi-core processors is still mostly happening in the cloud, one can wonder whether it will penetrate the extreme edge as well. This lead to interesting questions, such as: Do tinyML devices need homogeneous or heterogeneous multi-core designs? How to schedule tinyML workloads across multi-core edge SoCs? This talk will dig deeper into the underlying motivations of this multi-core evolution, the accompanying challenges, and exciting future opportunities.
9:50 am to 10:30 am
Demos & Posters
Session Moderator: Sumeet KUMAR, CEO, Innatera
10:30 am to 11:00 am
Break & Networking
11:00 am to 12:15 pm
Hardware and Sensors Session
Session Moderator: Sumeet KUMAR, CEO, Innatera
Powering Machine Learning (ML) applications on Arm with “algae”
Gian Marco IODICE, Team and Tech Lead in the Machine Learning Group, Arm
Paolo BOMBELLI, Research scientist, University of Cambridge
Dr. Paolo Bombelli (Researcher at the University of Cambridge) and Gian Marco Iodice (MLtech lead at Arm) will present a groundbreaking project that involves powering a tinyML application with a very low-power battery made with algae.
The talk will focus on the technical aspects of the project. For example, we will discuss the challenges and solutions Dr. Paolo Bombelli and his team found to build a sustainable and environmentally friendly battery using renewable resources such as algae. We will also demonstrate how this battery can power an Arm Cortex-M0-based microcontroller and how we tailored the ML application to work efficiently with this sustainable and eco-friendly battery.
This talk is for anyone interested in sustainable technology, renewable energy, and tinyML applications for microcontrollers.
Event sensors for embedded edge AI vision applications
Christoph POSCH, CTO, PROPHESEE
Event-based vision is a term naming an emerging paradigm of acquisition and processing of visual information for numerous artificial vision applications in industrial, surveillance, IoT, AR/VR, automotive and more. The highly efficient way of acquiring sparse data and the robustness to uncontrolled lighting conditions are characteristics of the event sensing process that make event-based vision attractive for at-the-edge visual perception systems that are able to cope with limited resources and a high degree of autonomy.
However, the unconventional format of the event data, non-constant data rates, non-standard interfaces and, in general, the way, dynamic visual information is encoded inside the data, pose challenges to the usage and integration of event sensors in an embedded vision system.
Prophesee has recently developed the first of a new generation of event sensor that was designed with the explicit goal to improve integrability and usability of event sensing technology in an embedded at-the-edge vision system. Particular emphasis has been put on event data pre-processing and formatting, data interface compatibility and low-latency connectivity to various processing platforms including low- power uCs and neuromorphic processor architectures. Furthermore, the sensor has been optimized for ultra-low power operation, featuring a hierarchy of low-power modes and application-specific modes of
operation. On-chip power management and an embedded uC core further improve sensor flexibility and useability at-the-edge.
Ultra-Low Power Gesture Recognition with pMUT Arrays and Spike-based Beamforming
Emmanuel HARDY, Research Engineer, CEA Leti
Sensor arrays constrain the power budget of battery-powered smart sensor as the
analogue front-end, analogue-to-digital conversion (ADC) and digital signal processing is duplicated for each channel. By converting and processing the relevant information in the spiking domain, the energy consumption can be reduced by several orders of magnitude. We propose the first end-to-end ultra-low power Gesture Recognition system. It comprises an array of emitting and receiving piezoelectric micromachined ultrasonic transducers (pMUT), driving/sensing electronics and a novel spike-based beamforming strategy that extracts the distance and angle information from incoming echoes without conventional ADCs. A Spiking Recurrent Neural Network performs the Gesture Recognition. We experimentally demonstrate a classification accuracy of 86.0% on a dataset of five
3D gestures collected on our experimental setup.
12:15 pm to 1:00 pm
1:00 pm to 2:00 pm
Lunch & Networking
2:00 pm to 3:15 pm
Hardware and Sensors - Part 2
Session Moderator: Tomas EDSÖ, Senior Principal Design Engineer, Arm
Monostable Multivibrator Networks: extremely low power inference at the edge with timer neurons
Lars KEUNINCKX, Researcher, imec
In our proposed presentation, we will discuss the following:
• MMV introduction and how MMV networks set up and test spike timing conditions,
• the training algorithm, which is responsible for optimizing the excitatory and inhibitory input and recurrent connections of the OR-ing network, as well as the integer periods of the MMVs. Our method is based on the surrogate gradient technique and slow binarization of the connections,
• several use cases on publicly available datasets: Google Soli radar gestures, Heidelberg keyword spotting, IBM DVS-128 gestures and the Yin-Yang symbol segmentation, all with excellent results.
• future work and outlook.
A general tenet of neuromorphic engineering is that taking inspiration from biological reality will naturally lead to the most efficient hardware possible. We argue that an overemphasis on the biological apparatus could become self-limiting for the field since the instrumental biological operating principles, even if understood well enough
in detail, may simply not be transferable to the electronic hardware domain. After all, neurons are living cells first-with all their idiosyncrasies- and only then computational units. Thus, we reverse the original neuromorphic question: instead of trying to find ways to efficiently implement and network a biological neuron in electronic hardware, we ask which are the fundamental electronic building blocks that are easy to implement and connect en masse that we already have at our disposal and what are their computational properties.
As a possible answer to this question, we present networks of monostable multivibrator (MMVs).
MMVs are simple timers that are straightforward to implement using counters in digital hardware.
Brain Inspired ISFET Arrays – A Tiny ML approch to Lab-on-Chip Diagnostics
Prateek TRIPATHI, PhD Research Student, Imperial College London
Background: Innovation in medical technology plays a significant role in supporting
healthcare. Recent years have witnessed growth in the development of point-of-care devices
that can provide real-time medical diagnostics at the point of need. The COVID-19 pandemic has further highlighted the need for technologies that can provide rapid and accurate diagnosis of infectious diseases without the need for specialized labs. While lateral flow tests have supported mass testing during the pandemic, they suffer from low accuracy and do not allow to multiplex several diseases, which is becoming critical as the pandemic progresses.
Diagnostics can further benefit from the use of Artificial intelligence (AI) which has become a popular tool in the field of healthcare, with medical imaging applications ranging from
diagnostics to assistive surgery giving machines the cognitive ability to make informed
Objective: The aim of this research is to establish new methods for portable and rapid AI-
based diagnostics. While most techniques rely on optical methods, electrochemical sensing enables miniaturisation, scalability and robustness. Compatible with CMOS, sensors such as Ion-Sensitive Field-Effect Transistors (ISFETs) can be coupled with novel instrumentation on-chip. We qualify the integration of electrochemical sensing with novel AI algorithms as ‘sensor learning’, leveraging on-chip methodologies to automatically calibrate the sensors and extract accurate diagnostic information. Further to the AI approach, neuromorphic electronics allows to encode the signal in the frequency domain and is inherently compatible with spatial processing at low power. We further intend to use TinyML to bring forth the opportunity to extend deep learning solutions to point-of-care diagnostic devices for infectious diseases. The use of TinyML will alleviate any privacy issues associated with patient data and will also consume significantly lower power than any cloud-based AI solutions.
Methods: Since both neurons and ISFET respond to change in ionic concentration we
evaluated four neuromorphic ISFET array topologies involving spatial compensation, temporal integration, linear weighting, and background inhibition for creating low powered spiking ISFET arrays. The arrays have been implemented in TSMC 180nm and take advantage of the fault-tolerant nature, spatial connectivity, spike-domain processing capabilities, neuron inhibitions and AER compatibility of neuromorphic electronics for creating the next generation of LoC devices. We have also created a novel winner-take-all (WTA) architecture for background inhibition in ISFET neurons that can form Clustered WTA and Distributed WTA architecture while at the same time perform drift compensation using temporal and spatial averaging. Finally, we have implemented novel transforms on data collected for COVID-19 and Cancer biomarkers that we trained to classify nucleic acid amplification using convolutional neural networks on microcontrollers and tflite-micro.
Results: This work presents a spatial correlation between the non-ideal effects to facilitate
inter-pixel processing using neuromorphic ISFETs. We present four novel ISFET neuron
architectures. We begin with neuromorphic ISFET arrays using spike domain encoding and
spatial device compensation. This is followed by a completely autonomous cluster topology of neuron-based pixels based on a multiple-channel Integrate and Fire (I&F) architecture for temporal integration and spatial averaging. The designs have been implemented in TSMC 0.18μm. The proposed ISFET topologies are scalable and operate at an ultra-low power of 171.6nW to 410.9nW based on the output spiking frequency. Further, to this we have established a state-of-the-art with our first models for Lab-on-chip platforms that have been trained to identify infectious diseases and cancer biomarkers using tinyML. This has been done using causal machine learning approaches that have allowed the implementation of novel frameworks for Lab-on-chip platforms.
Conclusion: This work presents four novel neuromorphic architectures for electrochemical sensing. The presented architectures were all implemented in TSMC 0.18μm and work at ultra-low power. In addition, it also presents a framework that can accelerate our testing response to future pandemics using AI at the edge.
Smart and Connected Soft Biomedical Stethoscope and Machine Learning for Continuous Real-Time Auscultation and Automated Disease Detection
W. Hong YEO, Associate Professor and Woodruff Faculty Fellow, Georgia Tech
In this work, the computational mechanics study offers a key design guide for
developing a soft wearable system, maintaining mechanical reliability in multiple uses
with bending and stretching. Optimizing a system packaging using biocompatible
elastomers and soft adhesives allows for skin-friendly, robust adhesion to the body while minimizing motion artifacts due to the stress distribution and conformable lamination. The soft device demonstrates a precise detection of high-quality cardiopulmonary sounds even with the subject’s different actions. Compared to commercial digital stethoscopes, the SWS using a wavelet denoising algorithm shows superior performance as validated by the enhanced signal-to-noise ratio. Deep-learning integration with the SWS demonstrates a successful application for a clinical study where the stethoscope is used for continuous, wireless auscultation with multiple patients. The results show automatic detection and diagnosis of four different types of lung diseases, such as crackle, wheeze, stridor, and rhonchi, with about 95% accuracy for five classes. Collectively, this work represents a major shift in how clinicians collect cardiopulmonary sounds for disease diagnosis and health monitoring.
In addition, Dr. Yeo will share how different printing processes are used to
manufacture nano-microscale sensors and circuit interconnects, while discussing the
details of hard-soft materials integration and soft packaging strategies. He will also share other application examples of soft electronic platforms such as portable health monitoring devices, disease diagnostic devices, therapeutic systems, and human-machine interface systems. Finally, more details of sensor design, circuits, manufacturing, system optimization, signal processing, machine learning, and data classification will be shared at high levels.
3:15 pm to 3:45 pm
Break & Networking
3:45 pm to 4:20 pm
Hardware and Sensors - Part 3
Session Moderator: Tomas EDSÖ, Senior Principal Design Engineer, Arm
Ultra-Fast, Energy-Efficient Neuromorphic Edge Processing For Event-Based and Frame-Based Cameras: ColibriUAV and Eye-tracking
Michele MAGNO, Head of the Project-based learning Center, ETH Zurich, D-ITET
The interest in dynamic vision sensor (DVS) for a wide range of application is raising, especially due to the microsecond-level reaction time of the bio-inspired event sensor, which increases robustness and reduces latency of the perception tasks compared to a RGB camera. This talk presents two embedded platforms ColibriUAV, a UAV platform with both frame-based and event-based cameras interfaces for efficient perception and near-sensor processing and ColibriEYE. The proposed platforms platform are designed around Kraken, a novel low-power RISC-V System on Chip with two hardware accelerators targeting spiking neural networks and deep ternary neural networks. Kraken is capable of efficiently processing both event data from a DVS camera and frame data from an RGB camera. A key feature of Kraken is its integrated, dedicated interface with a DVS camera from Inivation. This talk benchmarks the end-to-end latency and power efficiency of the neuromorphic and event-based UAV subsystem, demonstrating state-of-the-art event data with a throughput of 7200 frames of events per second and a power consumption of 10.7 mW, which is over 6.6 times faster and a hundred times less power-consuming than the widely-used data reading approach through the USB interface. The overall sensing and processing power consumption is below 50 mW, achieving latency in the milliseconds range, making the platform suitable for low-latency autonomous nano-drones as well eye tracking. Potentially comparison with other commercial and academic processor will be presented during the talk.
Data Pre-processing on Sensor Nodes for Predictive Maintenance
Alexander TIMOFEEV, Founder and Chief Executive Officer , Polyn.ai
Vibration-based condition monitoring is a fundamental Predictive Maintenance technique that is used to detect machine health conditions and predict failures. By analyzing vibrations, it is possible to identify a range of mechanical problems such as shaft unbalance and misalignment, bearing failures, gear wear, cracks, looseness, and more. Vibration sensors are typically attached to rotating equipment to measure the vibrations it generates. These sensors have a frequency bandwidth of up to 20KHz to ensure accurate prediction of mechanical failures. However, the high-frequency signals create especially large amounts of data to be processed by Machine Learning algorithms in continuous condition monitoring applications. Sending all this data collected on the sensor nodes to a central location for analysis would be more burdensome than beneficial. Reducing the amount of transmitted data would give latency improvement and save the transmission infrastructure costs as well as data processing and storage resources.
It is possible to replace high-volume vibration data with small patterns (embeddings) that are transmitted to the cloud instead. Despite the reduced size, the information contained in embeddings is still sufficient for reliable Predictive Maintenance.
To this end, we propose an innovative concept we call a Neuromorphic Front-End located next to the sensor, and a unique Neuromorphic Analog Signal Processing (NASP) technology for the implementation of a trained neural network in a tiny silicon chip made of analog circuitry elements. The role of the Neuromorphic Front-End chip is to extract useful information from raw sensor data, similar to the way biological sensory systems work.
The NASP technology utilizes a unique architecture comprising artificial neurons (nodes responsible for performing computations) and axons (connections between nodes with specific weights). Specifically, operational amplifiers are used to implement neurons, and a mask programmable resistors’ layer is used to implement axons and their weights. Such analog structure performs true parallel data processing without accessing memory and other excessive data traffic. This is the key to unprecedented energy efficiency, low latency, and 100% chip area utilization of NASP solutions.
The NASP approach involves the trained neural network modeling, verifying, and converting into the chip structure standard files that any foundry can use for chip manufacturing. This is different from costly attempts to accommodate a neural network on a general-purpose digital chip. The result is an application-specific analog inference engine tailored for the task and complemented with a fully flexible digital layer responsible for classification.
NASP is capable to process the raw data directly on the sensor node with high precision, extracting vibration embeddings, and reducing the data flow by 1000 times.
Power consumption is a critical factor in battery-powered Industrial IoT applications, with data transmission accounting for 85-99% of the total consumption in wireless sensors. Thousandfold data reduction by NASP answers this challenge enabling LPWA (low power wide area) data communication. NASP saves the sensor node power budget due to its ultra-low power consumption of only 100µW on always-on operations. NASP Neuromorphic Front-End chips support the widespread use of wireless and energy-harvesting solutions. The use of a Neuromorphic Front-End enables deployments in previously inaccessible remote and mobile locations. It also simplifies the entire system and reduces associated operational and capital expenses.
4:20 pm to 4:30 pm
Wrap up and Closing
Session Moderator: Alessandro GRANDE, Head of Product, Edge Impulse
Schedule subject to change without notice.
Cadi Ayyad University, Morocco
University of Bologna, Italy
Delft University of Technology
Qualcomm Research, USA
Nottingham Trent University
Bosch Sensortec GmbH
University of Cyprus
Keynote - Monday
The University of Manchester
Keynote - Tuesday
Keynote - Wednesday
Fondazione Bruno Kessler
University of Cambridge
Tim de BRUIN
Robert Bosch GmbH
Gian Marco IODICE
Paola Andrea JARAMILLO GARCIA
ETH Zurich, D-ITET
Imperial College London
Mart VAN BAALEN
Qualcomm AI Research in Amsterdam
Philipp VAN KEMPEN
Technical University of Munich
7 Sensing Software
W. Hong YEO