June 24-26, 2024

About

The tinyML EMEA Innovation Forum was held in June 2024 and accelerated the adoption of tiny machine learning across the region by connecting the efforts of the private sector with those of academia in pushing the boundaries of machine learning and artificial intelligence on low power platforms.

Join our Discord server and see the published slides and commentary.

Venue

Allianz MiCo • Milano Convention Centre

Piazzale Carlo Magno 1/Gate 16, 20149 Milano MI Italy

Contact us

Rosina Haberl

enohP

liaM

Schedule

Speakers

Commitee

Schedule

8:00 am to 9:00 am

Registration

9:00 am to 9:15 am

Welcome

Session Moderator: Hajar MOUSANNIF, Associate Professor, Cadi Ayyad University, Morocco

9:15 am to 10:00 am

Keynote by Alessandro Cremonesi - STMicroelectronics

EDGE AI for a sustainable future

Alessandro CREMONESI, Chief Innovation Officer & Executive Vice President General Manager System Research and Applications, STMicroelectronics

Abstract (English)

In a digitalized and connected world, where the real and virtual coexist, AI will be pervasive from cloud to edge. The distributed AI processing enables a sustainable deployment of new services and applications. By increasing energy efficiency, privacy, and security, edge AI can ensure the proper scalability to the next level of digital transformation.

The semiconductor industry, vital to this evolution and propelled by AI, is projected to hit $1 trillion by 2030. To sustain this growth, the industry will leverage the full range of technologies from ‘Moore’s law’ to ‘More than Moore’ to ‘heterogeneous integration’. New architectures overcoming Von Neumann limitations will be required as well as new algorithms that better fit edge constraints. AI is accelerating innovation, leading to the emergence of new developer profiles. These developers will rely on the semiconductor industry to provide agile and easy-to-use tools to effectively exploit all technologies.

10:00 am to 10:30 am

Hardware Acceleration and Model Compression

Session Moderator: Manuel ROVERI, Full Professor, Politecnico di Milano

Network Projection for AI Model Compression Applied to Embedded Acoustic Sensing

Antoni WOSS, Senior Software Developer , MathWorks

Abstract (English)

Deploying increasingly large and complex deep learning networks onto resource-constrained devices is a growing challenge facing many AI practitioners, especially within the domain of audio and acoustic applications. Modern deep neural networks, which are integral to advancing state-of-the-art signal processing algorithms, typically require high-performance processors and/or GPUs due to their extensive number of learnable parameters. As these large AI models set new benchmarks for quality and functionality, they simultaneously push the boundaries of what can be embedded into real-time systems and edge devices. Consequently, engineers today are faced with the critical task of reconciling the
complexity of these networks with the stringent resource limitations of portable devices and low-power sensors, while ensuring real-time performance without compromising accuracy. Deploying these powerful deep learning models onto edge devices often requires compressing them to reduce runtime memory and inference times, while attempting to retain high accuracy and model expressivity.

In this talk, we introduce a new technique: network projection, and utilize this to compress
and embed a sound-based machine health classification network. Network projection leverages in-distribution data to elicit a neural response from network activations. This approach then analyzes the covariances between these neural responses and modifies layer operations to work within a lower-rank projective space, which, in turn, significantly reduces the quantity of learnable parameters. Despite the operations occurring within a lower-rank projective space,
the layers maintain a high level of expressivity due to the preservation of width—that is, the number of neural activations—matching that of the original network architecture. The application of the projection technique differs between recurrent and non-recurrent layers but can be applied to both, e.g., fully connected, convolutional and LSTM layers, and can be used in place of or in addition to pruning and quantization.

In this talk, attendees will learn how to:
• Compress a deep learning model using a novel technique – network projection.
• Integrate and validate the compressed AI model into a larger MATLAB system design.
• Automatically generate embeddable C/C++ implementations for AI-powered designs.

10:30 am to 11:00 am

Break & Networking

11:00 am to 11:25 am

Hardware Acceleration and Model Compression - Part II

Session Moderator: Manuel ROVERI, Full Professor, Politecnico di Milano

Hard Contextual Parameter Sharing: A Multi-task Learning Approach for tinyML

Michael GIBBS, PhD student in the Smart Sensing Lab, Nottingham Trent University

Abstract (English)

There is an increasing need to deploy complex models on resource-constrained devices such as microcontrollers for real-world deployment. However, multiple complex networks often exceed the limited memory and compute capabilities of these devices. This work proposes a novel Hard Contextual Parameter Sharing (HCPS) optimisation to enable efficient deployment of multiple neural network models on highly resource-constrained microcontroller devices. The approach deconstructs similar neural networks into shared layers that extract common features, and specialised layers that retain task-specific representations. By reusing network layers across models, the overall parameter count and storage requirements are diminished, enabling the implementation of more complex so-
lutions. Practical demonstrations on an Arduino Portenta H7 microcontroller board yielded a 49.5% reduction in storage burden. As shared layers increase, we find a corresponding increase in loss within the newly constructed model.
The proposed methodology enhances traditional multi-task learning for edge computing by selectively sharing parameters across tasks while maintaining con-text awareness. HCPS divides a large network into smaller ones, sharing specific parts via hard parameter sharing on early layers to extract universally relevant low-level features. This approach mitigates overfitting risks, enhancing gener-alisation across tasks, and improves scalability without significantly increasing parameters as tasks are added. The outcome is an ensemble of specialised, de-ployable models, reducing storage and computational demands compared to a singular heavy model.

An additional advantage of HCPS is the adaptability to deploy new spe-cialised models without retraining the entire network. This contrasts with tra-ditional multi-task learning, allowing shared layers for multiple specialised mod-
els without simultaneous training. The proposed approach finds application indiverse scenarios, such as energy management that adapts to seasonal patterns or context-based driver assistance systems that adjust to varying driving con-ditions.

Despite successful demonstrations, limitations persist, especially in deter-mining which layers to share. Two proposed approaches involve sharing only initial layers for high accuracy or sharing the majority with few specialised lay-
ers at the end for maximal storage savings. Assumptions include task-related similarities and non-overlapping classes, ensuring shared layers benefit perfor-mance. As machine learning extends to embedded devices, this study empha-
sises sustainable computing practices, illustrating how careful architecture and parameter sharing balance intelligence with resource constraints, contributing to reduced power consumption and enhanced environmental sustainability.

The contributions of this work are as follows:

• A novel deconstructed, shared neural network approach utilising contex-tual information to reduce the storage occupancy and computational bur-den of multiple deep learning models.

• Improved efficiency by reducing redundancy through selective neural net-work layer sharing, enabling deployment on ultra-low powered microcon-trollers.

• Enhanced adaptability compared to traditional multi-task learning, as spe-cialised sub-networks can be added without retraining the entire network, providing flexible expandability.

• A practical tinyML implementation of the proposed approach using two
widely recognised datasets illustrating the viability of the proposed method.

This modular ensemble model balances performance, adaptability, and sus-tainability under tight memory and power limits. The method is evaluated on an Arduino Portenta H7 microcontroller using MNIST and Fashion-MNIST datasets for digit and clothing classification tasks. The shared model com-bines common convolutional filters to reduce redundancy between networks by49.5%, while final task-specific dense layers preserve 97.9% and 74.4% accuracy
respectively. HCPS advances beyond conventional multi-task learning for edge devices by decomposing models into selective shared and specialised components to maximise on-device intelligence under constrained resources.

11:25 am to 12:25 pm

Posters and Demos

12:25 pm to 1:25 pm

Lunch & Networking

1:25 pm to 2:10 am

Keynote by Francesco Conti

Session Moderator: Martin CROOME, Vice President Marketing, GreenWaves

The Quest for Open-Source tinyML Heterogeneous Hardware Acceleration: A 10+ Year PULP Journey

Francesco CONTI, Assistant Professor, University of Bologna, Italy

Abstract (English)

In the last few years, our perception of what constitutes a “tinyML device” has shifted from simple microcontrollers to complex heterogeneous SoCs suited to execute DNNs directly at the extreme edge in real time and at minimal power cost. These devices provide ultra-low latency and high energy efficiency necessary to meet the constraints of advanced use cases that can not be satisfied by cloud solutions. However, how can tinyML hardware keep up with the evolution of the AI landscape, continuously pushing towards much larger and more complex models? The costs to develop new accelerators and Neural Processing Units for each evolutive step in AI are hard to sustain. A possible way forward is given by the open-source model for digital hardware, popularized by RISC-V: multiple actors – both academic and industrial – collaborate on the development of digital technology that can benefit all parties. In this keynote, I present a 10+-year “quest” to push the performance and energy efficiency of tinyML further and further by exploiting a fully open-source model based on the PULP Platform initiative. I show how the open-source cooperative model makes it possible to combine different ideas and contributions in a technologically portable way, acting as an innovation catalyst and enabling the fast pace of evolution required to keep up with new ideas in AI within a tiny power budget.

2:10 pm to 3:05 pm

Neuromorphic Computing

Session Moderator: Eiman KANJO, Provost’s Visiting Professor in Pervasive Sensing and tinyML , Imperial College London

Spiking Neural Processor T1: An ultra-low power neuromorphic microcontroller for the sensor edge

Petrut BOGDAN, Neuromorphic Architect, Innatera

Abstract (English)

We present the world’s first ultra-low power neuromorphic microcontroller – the Spiking Neural Processor T1. Neuromorphic hardware adopts an approach inspired by the function and structure of the brain. A representation that particularly closely mimics the brain is a Spiking Neural Network (SNN), which are a class of event-based neural networks in which information is represented using precisely timed events (spikes). SNN models work by manipulating the timing relationships between these events, and leveraging temporal correlations between events to
identify patterns in the data. Key to these capabilities is the inherent notion of time built into the neurons and synapses of the SNNs. The time-varying states of neurons and synapses enable powerful temporal processing to be carried out even with small models, with sparse and efficient event-based communication between computing elements. SNNs enable rapid recognition of patterns in sensor data, in addition to complex signal processing, just like the brain does.

The T1 system-on-chip enables powerful neuromorphic processing of sensor data, within a singular always-on component. Integrating a triad of processing elements, the T1 incorporates a spiking compute engine for SNNs, an accelerator for CNNs, and a light-weight RISC-V CPU to provide application developers with a heterogeneous platform for sensor data processing. The mixed signal spiking compute engine enables SNN inference within a sub-milliwatt power envelope, for always-on low-latency processing of data streams. The architecture facilitates the capabilities of SNNs to be blended with conventional non-spiking neural networks to realize a broad range of application capabilities within the same device. Both accelerators are highly customizable in terms of the parameters and connectivity they support. As a companion to sensors, the T1 incorporates a RISC-V core that enables handling of multiple sensors, marshalling of data, as well as conditioning and pre-/post-processing of sensor data and inference results.

The T1 integrates an ample amount of memory for complex workloads at the sensor edge, and incorporates a diverse set of interfaces – QSPI, I2C, I2S, UART, JTAG, and GPIOs, in addition to a front-end ADC – ensuring compatibility with a vast range of sensors. Its 35-lead WLCSP package of 2.16x3mm allows it to be integrated into the most compact of edge applications, right next to the sensor. The T1 effectively serves as the first and only chip a sensor needs to interface with,
allowing actionable insights to be derived from raw data, right at the sensor edge.

The Talamo Software Development Kit serves as the gateway for application developers to use the T1. Enabling access to the novel capabilities brought by the T1’s mixed signal computing fabric, Talamo simplifies the process of application
development based on SNNs through its integration with PyTorch. This enables comprehensive development, optimization, and deployment of SNN and CNN models onto the T1 without having to deal with the complexity of its underlying compute architecture. An easy-to-use pipeline construction API and model zoo included within Talamo simplify the creation of end-to-end applications.

The T1 evaluation kit is now available to early customers for application trials.

As a follow-up to our previous presentations at tinyML events, we will walk through the architecture of our new T1 chip, and show how the chip and the Talamo SDK come together to enable power-efficient AI at edge applications. We will illustrate the above through an application example, and share insights into the power-performance of the Spiking Neural Processor platform in a context relevant to the tinyML community.

In-Glasses Eye-Tracking with DVS camera, Tiny Memory-Efficient Model for Neuromorphic Computing and Digital Processors

Michele MAGNO, Head of the Project-based learning Center, ETH Zurich, D-ITET

Abstract (English)

This presentation introduces an innovative neuromorphic methodology for eye-tracking, utilizing pure event data captured by a Dynamic Vision Sensor (DVS) camera. Our approach integrates a directly trained Spiking Neural Network (SNN) regression model and capitalizes on the cutting-edge, low-power edge neuromorphic processor – Speck. This combination aims to significantly enhance the precision and efficiency of eye-tracking systems.

Initially, we present the ‘Ini-30,’ a novel event-based eye-tracking dataset collected from thirty volunteers using two DVS cameras mounted on glasses. We then describe our SNN model, named ‘Retina’, which is based on Integrate-and-Fire (IF) neurons. Remarkably, ‘Retina’ has only 64k parameters – 6.63 times fewer than the latest models – yet achieves a pupil tracking error of just 3.24 pixels on a 64×64 DVS input. The continuous regression output is derived by applying a non-spiking temporal 1D filter across the output spiking layer.

Furthermore, we evaluate ‘Retina’ on the neuromorphic processor, demonstrating an end-to-end power consumption of only 2.89-4.8 mW and a latency of 5.57-8.01 ms, depending on the time window. We benchmark our model against the latest event-based eye-tracking method, ‘3ET’, built on event frames. The results indicate that ‘Retina’ not only achieves higher precision, with 1.24px less error in pupil centroid estimation, but also boasts a significantly reduced computational complexity, requiring 35 times fewer Multiply-Accumulate (MAC) operations.

Additionally, I will present ‘DigitalRetina’, a TinyML model based on Yolo, optimized for minimal memory usage and evaluated on the GreenWaves GAP9 processor. This talk will offer a comprehensive comparison of the two approaches – neuromorphic and digital – in terms of energy efficiency, latency, and other critical metrics.

An under review paper with more information can be found here https://arxiv.org/abs/2312.00425

3:05 pm to 3:35 pm

Break & Networking

3:35 pm to 4:50 pm

Efficiency and Optimization in tinyML

Session Moderator: Valeria TOMASELLI, Senior Engineer, STMicroelectronics

tinyCLAP: distilling language-audio pretrained models

Francesco PAISSAN, Junior Researcher, Fondazione Bruno Kessler (FBK)

Abstract (English)

Contrastive Language-Audio Pretraining (CLAP) [1], and similarly, its image counterpart, CLIP [2], proved to be an effective technique to pretrain audio and image encoders. In particular, CLAP and some of its variants [3, 4] achieved state-of-the-art performance for sound event detection, showcasing impressive performance also in Zero-Shot (ZS) classification. However, one of the main limitations is the considerable amount of data required in the training process and the overall computational complexity during inference. In this presentation, we will review how we can reduce the complexity of contrastive language-audio pre-trained models, yielding an efficient model called tinyCLAP. We derive an unimodal distillation loss from first principles and explore how the dimensionality of the shared, multimodal latent
space can be reduced via pruning. tinyCLAP uses only 6% of the original Microsoft CLAP parameters with a minimal reduction (less than 5%) in zero-shot classification performance across the three sound event detection datasets it tested. Specifically, since the model capacity needed for learning the correlations between audio and text is high, the CLAP audio and text encoders are not suited for fast and low-footprint inference. To simplify the pipeline, we employ knowledge distillation [7] and pruning [8], as they proved to be effective techniques for learning smaller models while inheriting the representation capabilities of the teacher model. Standard knowledge distillation is not suited for the CLAP audio encoder because, in CLAP, there are no soft labels since the classification is performed using the learned similarity score – and thus, the number of classes is not selected a-priori. We will present how the knowledge distillation loss for CLAP can be formulated to preserve the similarity score between text and audio. We show that the distillation and pruning strategies can work with audio samples without text with this formulation.
Finally, we will showcase the zero-shot classifier on an ARM-Cortex M7-based board, analysing the benefits with respect to a standard classifier, and its complexity-performance tradeoff.
Attached paper: https://arxiv.org/pdf/2311.14517.pdf

Optimizing Vision Transformers: A Novel Neuron Leveraging Max and Min Operations for Aggressively Prunable DNNs

Philippe BICH, PhD Student, Politecnico di Torino

Abstract (English)

Deep Neural Networks (DNNs) are structures capable of solving complex tasks with the use of a massive number of trainable parameters. They are in such a large number that they result to be greatly redundant. Pruning is a necessary operation to achieve low-power and lightweight inference, which is fundamental in the field of Tiny Machine Learning (TinyML) for the implementation of neural networks on mobile devices with limited battery capacity and constrained computational capabilities. Classical pruning approaches in the literature [1] simply leverage methods to select the interconnections or entire neurons to be pruned in a DNN without modifying its inherent structure, i.e. neurons based
on the typical Multiply-and-ACumulate (MAC) paradigm. In standard MAC-based neurons, the output is computed by first modulating inputs independently of each other (map operation), then by aggregating the outcomes into a single quantity (reduce operation), and finally by reshaping the value through an activation function. For these classical neurons, map is multiply, while reduce is accumulate.
In a recent work [2], we have introduced an alternative neuron structure, whose adoption in a DNN
allows to achieve much better results when the entire network is pruned – virtually using any pruning
method allowed in the literature. More specifically, whereas the map-reduce paradigm adopted in a typical
neural network is based on a MAC operation, we propose to substitute accumulate with a maximum plus
minimum operation, obtaining a different structure for the neurons. We call this novel map-reduce paradigm
Multiply-And-Max/min (MAM).
The advantage of using this neuron (which we discuss in detail in [3]) is a much higher prunability of
the neural network that can be aggressively simplified by the removal of redundant interconnections while
retaining its original performance.

A Paradigm Shift From Imaging to Vision: Oculi Enables 600x Reduction in Latency-Energy Factor for Visual Edge Applications

Charbel RIZK, Founder and CEO, Oculi

Abstract (English)

Remarkable progress has been achieved in the field of artificial intelligence, particularly in the extensive use of deep neural networks, which have significantly enhanced the reliability of face detection, eye tracking, hand tracking, and people detection. However, performing these tasks still demands substantial computational power and memory resources, making it a resource-intensive endeavor that remains to be solved. Consequently, power consumption and
latency pose significant challenges for many systems operating in always-on, edge applications.

The OCULI SPU (Sensing and Processing Unit), ideal for TinyML vision applications, represents an intelligent, programmable vision sensor capable of configuration dynamically to output select data in various modes depending on use case needs. These modes include images or video, polarity events, smart events, and actionable information that make the vision sensor efficient. Moreover, the SPU allows real-time programmability of spatial and temporal resolution, as well as dynamic range and bit depth. By enabling continuous optimization, computer/machine vision
solutions deploying the OCULI SPU, in lieu of imaging sensors, can reduce the latency-energy factor by more than 600x at a fraction of the cost. Smart events and actionable information outputs and modes are distinctive features unique to the Oculi vision sensor.

To showcase tinyML capabilities, Oculi participated in the tinyML Hackathon 2023: Pedestrian Detection. Our initial results demonstrated an always-on solution (24/7) with a latency of less than 4 ms that only consumes 3 W-hr total for a whole year, equivalent to a single AA battery.
Because the OCULI SPU is fully programmable, the solution can be dynamically optimized between latency and power consumption. It will enable the first truly wireless battery-operated always-on vision products in the market. The presentation will provide an overview of Oculi’s novel vision architecture for edge applications, and also include key results for latency and energy results for multiple use cases of interest to the TinyML community including
presence/people/pedestrian/object, face, hand, and eye detection. Our results will also include a comparison with alternate or conventional solutions that demonstrate significant advantages in adopting a paradigm shift from imaging to vision for visual edge applications.

Structural Health Monitoring at the Edge for Wind Turbine Blades Using Pressure Sensors

Denis MIKHAYLOV, Researcher , D-ITET, ETH Zürich

Abstract (English)

Wind energy is an increasingly important component of the transition to renewable energy and away from fossil
fuels. Technological and physical limitations have led to wind turbines increasing in size and height in recent
years. Larger turbines with larger blades allow more energy to be produced more consistently, but come with
manufacturing and maintenance challenges. In particular, regular visual inspections are necessary to
detect defects or damage to the blades such as delamination, debonding or cracks, and damage due to
extreme weather. However, the trend towards tall, offshore wind turbines that are difficult to access makes
remote monitoring solutions desirable, which would reduce the need for visual inspections, improve safety,
and increase the uptime of the turbines. There have been numerous different approaches taken to monitor wind turbines, however until now most have mainly focused on modal analysis and vibration data [1]. Other approaches, such as acoustic monitoring using microphones or visual inspections using drones, have also been investigated, but currently there is no clearly optimal solution to monitoring and detecting faults. Therefore, significant challenges remain in functionality, reliability, cost, and ease of deployment [2]. To address these challenges, a data acquisition system was developed [3] which can be adhesively fitted to the surface of wind turbine blades featuring 40 MEMS barometers, 10 microphones, 5 differential pressure sensors and a MEMS IMU. A TI CC2652P microcontroller running at 48 MHz with an ARM Cortex M4F core is used for acquisition, power management and processing, while a
BLE connection with a bandwidth of 1.2 Mbps enables connectivity for data transmission and device
configuration. A flexible solar panel provides power that is stored in a battery. The entire system is
less than 4 mm thick allowing it to be mounted to the surface of the turbine blades without significantly affecting aerodynamic performance.
This system has already been successfully deployed on a wind turbine [3] as shown in Figure 1, where
data is acquired in periodic 10-minute acquisition windows every two hours, before being transmitted
to a remote server for further processing and analysis. The deployed system does not perform any
processing onboard. This imposes significant latency on the system and consumes large amounts of
energy due to the amount of raw sensor data that must be transmitted, generated at a rate of 4.2 Mbps.
More recently, investigations have been conducted in a wind tunnel to allow data from the sensors to
be collected in controlled conditions. The test setup for this work is shown in Figure 2 (a). An airfoil
is placed on a metal cantilever beam and a crack with increasing size (in 5 mm increments from 5 to
20 mm) is sawn into the cantilever to simulate a blade fault. The system is excited using a motor at
the opposite end of the blade to better simulate the effects of turbulence and the system is exposed to

different wind speeds from 10 to 20 m/s. In the healthy case with no crack, some experiments are also
performed with an additional heaving mass to simulate different load and weather conditions.
Using this data, a tinyML classifier was trained and deployed to the Aerosense system that classifies
crack size. The classifier uses only the pressure data from the 40 barometers and is trained using the
open-source XGBoost library, before being converted to C using m2cgen and deployed on the MCU.
It uses 4 simple statistical features (skew, mean kurtosis, variance) and one energy feature based on
512-sample windows of data (approximately 5 s at 100 Hz). The energy feature approximates the
energy of the modal frequency and is calculated using a narrow FIR bandpass filter tuned to the modal
frequency with 200 taps followed by a Hann window. An accuracy of 83% is achieved when using
a 66-33 training-test split and stratified 10-fold cross-validation.
The tinyML model consumes just 9.068 mJ per inference and has a latency of just 579 ms, meaning
a fault can be detected in less than 6 s after it has occurred. It requires 3 kB of RAM and 35 kB of
flash memory. The confusion matrix in Figure 2 (b) shows that the model reliably classifies the crack
class. The table in Figure 2 (c) shows the energy consumption and latency of the processing pipeline
on the MCU. The energy feature requires the most energy and consumes the most time. However,
without it the classification performance of the model drops significantly. While there are limitations

to the supervised classification approach and the representativity of the wind tunnel data set of real-
world conditions, these results demonstrate that such damage classification is possible using only

pressure sensors and onboard processing capabilities. The classifier has the potential to be applied as
part of an event-based data acquisition strategy, where sensor node data is continuously monitored
onboard and data is only transmitted to the remote processing system if a fault is suspected. Since the
data transmission makes up a significant proportion of the energy consumption and latency of the
Aerosense system (transmitting a single 5 s window of only the barometer data would require 119 mJ
of energy and 1s), such an approach would offer significant energy savings, without compromising
the usefulness of the data collected.
REFERENCES
[1] Zonzini, F., Malatesta, M. M., Bogomolov, D., Testoni, N., Marzani, A., & De Marchi, L. (2020). Vibration-based
SHM with upscalable and low-cost sensor networks. IEEE Transactions on Instrumentation and Measurement,
69(10), 7990-7998.
[2] M. Rezamand, M. Kordestani, R. Carriveau, D. S.-K. Ting, M. E. Orchard, and M. Saif, “Critical wind turbine
components prognostics: A comprehensive review,” IEEE Transactions on Instrumentation and Measurement, vol.
69, no. 12, pp. 9306–9328, 2020.
[3] Polonelli, T., Deparday, J., Abdallah, I., Barber, S., Chatzi, E., & Magno, M. (2023). Instrumentation and
Measurement Systems: Aerosense: A Wireless, Non-Intrusive, Flexible, and MEMS-Based Aerodynamic and
Acoustic Measurement System for Operating Wind Turbines. IEEE Instrumentation & Measurement Magazine,
26(4), 12-18.

7:00 pm to 10:00 pm

Networking Dinner

You need to be registered

9:00 am to 9:05 am

Welcome

Session Moderator: Martin CROOME, Vice President Marketing, GreenWaves

9:05 am to 9:50 am

Keynote by Diana Trojainello - Luxottica SPA

Session Moderator: Martin CROOME, Vice President Marketing, GreenWaves

Empowering Vision: The Revolution of Tiny Machine Learning in Smart Eyewear

Moderator: Martin CROOME, Vice President Marketing, GreenWaves

Diana TROJANIELLO, Project Manager | Smart Eyewear Research Stream Camera & Sensors, Luxottica

Abstract (English)

In the digital age, the convergence of artificial intelligence and wearable technology has paved the way for transformative innovations. Smart eyewear is now a tangible reality reshaping how we interact with the world. At the heart of this revolution lies Tiny Machine Learning (TinyML), a groundbreaking technology that empowers smart eyewear to understand, adapt, and respond to our needs with unprecedented efficiency and accuracy.

In this keynote speech, we will discuss about the fascinating realm of TinyML applications in smart eyewear. From improving hearing capabilities for individuals with mild to moderate hearing loss to revolutionizing real-time visual driven surrounding context knowledge and social interactions, TinyML-powered smart eyewear will redefine the boundaries of human-machine interaction. We will explore real-world examples, going into the technical aspects of TinyML algorithms optimized for resource-constrained devices, and discuss the ethical implications and challenges of deploying AI at the edge.

Join us on a journey through the cutting-edge developments, promising opportunities, and potential pitfalls of integrating Tiny Machine Learning into smart eyewear. Together, let us envision a future where our vision is not only augmented but also empowered by the seamless integration of AI and wearable technology.

9:50 am to 10:20 am

Efficiency and Optimization in tinyML - Part II

Session Moderator: Andrea DUNBAR, Head of Sector Edge AI and Vision, CSEM

Streamlining ML Operations for Enhanced Collaboration and Efficiency

Alessandro GRANDE, Head of Product, Edge Impulse

Abstract (English)

In the era of expanding tinyML applications across diverse environments, there is a heightened need for robust and customizable models. This presentation introduces a novel MLOps solution, reshaping the ML workflow to enable teams to efficiently develop, deploy, and monitor models tailored using real-world field data.

Our approach simplifies the development of base ML models, which serve as a robust starting point for further customization. This MLOps solution empowers product engineering teams with a streamlined process for model development, validation, and refinement, specifically tailored to production environments. Emphasizing scalability and adaptability, our solution ensures efficient customization to meet the unique requirements of various use cases.

The presentation explores how our MLOps solution caters to field engineers deploying and monitoring models in real-world environments. Leveraging on-device testing capabilities, engineers can assess and customize the model for optimal functionality in dynamic settings. Once the model is in production this flow ensures continuous monitoring and refinement, allowing teams to track and fine-tune model performance over time, guaranteeing sustained accuracy and relevance throughout the operational life of ML models.

Join us to delve into the practical applications and tangible benefits of this transformative MLOps workflow for production and field engineering teams. Witness firsthand how this streamlined workflow enhances collaboration, accelerates deployment cycles, and maximizes the value derived from machine learning in real-world scenarios.

10:20 am to 10:50 am

Break & Networking

11:00 am to 12:30 pm

Orange Hall

tinyML on Arduino from the Edge to the Cloud

Do you know that AI can run on Arduino devices too? In this workshop we’ll demonstrate how to develop a license plate detection algorithm on Arduino Nicla Vision using Edge Impulse. Additionally, we’ll show how to transmit processed data to Arduino Cloud, enabling logging, data sharing, and action triggers. This practical and intuitive session is designed to be accessible to anyone interested in diving into the world of Embedded AI on Arduino.

Session Chair: Leonardo CAVAGNIS, Firmware Engineer, Arduino

10:50 am to 11:40 am

Green Hall

Efficiency and Optimization in tinyML - Part III

Session Moderator: Andrea DUNBAR, Head of Sector Edge AI and Vision, CSEM

Deploying neural networks in sensors with near zero memory budget

Danilo PAU, Technical Director, IEEE & ST Fellow, System Research and Applications, STMicroelectronics

Danil ZHEREBTSOV, Head of Machine Learning & Analytics, Neuton.AI

Abstract (English)

Sensors are evolving from pure measurement devices to the capability to provide natively intelligent information for several industrial and consumer use cases. AI is the key to achieving this capability, but it is challenged by the deployable constraints mainly driven by the lack of the embedded memory, which is essentially RAM based and tightly integrated into the sensor itself. Careful use of this memory is crucial to enabling AI acceleration within the same power budget as the sensor itself. Moreover, machine learning engineers are required to provide solutions at the
highest productivity level therefore calling for tools that automate the machine learning algorithms design. Neuton.AI and ST have devised a groundbreaking end-to-end methodology to address these challenges, providing practical solutions for machine learning developers focused on sensor computing by utilizing the ISPU (intelligent sensor processor unit) technology.
During the session, we will present three use cases illustrating the application of these technologies in practice:
• Hands-free interaction for smartwatch: the solution can recognize five gestures and
consumes only 9.3 Kbyte of program RAM
• On-device package tracking for logistics the solution can track seven package states and
consumes only 9.3 Kbyte of program RAM
• Smart ring remote control: the solution can recognize eight gestures and consumes only
10 Kbyte of program RAM

6 years of open-source TinyML with emlearn

Jon NORDBY, CTO, Soundsensing

Abstract (English)

The availability of accessible, high-quality and open software is critical for the adoption of any new technology and paradigm. The area of TinyML is no exception. In 2018, when we started researching and developing in this area, there were not much software for Machine Learning on microcontrollers. This is why we started the emlearn project – an open-source Python library that allows converting scikit-learn and Keras models to efficient C code.
The goal of the project is to make it easy to deploy efficient models to any microcontroller with a C99 compiler, while keeping a Python-based workflow that is familiar to Machine Learning Engineers.
Over the years the library has been used in a wide range of applications; from detection of vehicles using acoustic sensor nodes, to tracking wellbeing of grazing cows using accelerometers, to hand gesture recognition based on skin electrical activity, to real-time intrusion detection in IoT networks.
In this presentation, we will showcase a selection of usecases and discuss the impact that the library has managed to have so far. We will also cover some of the latest improvements that are aimed to increase the future impact, such as improved documentation, integrated tooling for model size evaluation, and support for MicroPython.

11:40 am to 12:40 pm

Posters and Demos

12:40 pm to 1:40 pm

Lunch & Networking

1:40 pm to 2:25 pm

On Device Learning

Session Moderator: Theocharis THEOCHARIDES, Associate Professor, University of Cyprus

Training on the Fly: On-device Self-supervised Learning aboard Nano-UAVs within 20mW

Elia CEREDA, PhD Student, IDSIA, USI-SUPSI

Abstract (English)

Pocket-sized autonomous unmanned aerial vehicles (UAVs) leveraging tiny machine learning (TinyML) perception pipelines represent an upcoming technology with many valuable applications in the Internet-of-Things domain. Thanks
to their tiny form factor (i.e., ∼10 cm diameter), they can fly and land in small/cluttered spaces and safely operate near
humans, acting as universal/ubiquitous dynamic smart sensors. However, TinyML algorithms for small UAVs equipped with onboard ultra-low power class processors suffer from the inherent domain shift problem, i.e., perception performances drop when moving from the training domain to a different deployment/testing one. To cope with and mitigate this general problem, we present a novel on-device fine-tuning approach that relies only on ultra-limited memory and computational capabilities. We build our study, analysis, deployment, and field testing on top of a real-world robotic application of vision-based human pose estimation. Our 512-image on-device training requires only 19 mW, 1 MB of memory, and runs in only 510 ms (five epochs), employing a GWT GAP9 System-on-
Chip. Finally, we propose a self-supervised technique based on an ego-motion consistency loss to handle the absence of training labels aboard our tiny UAV. In-field results of our closed-loop systems show an improvement in the horizontal mean pose error vs. a non-fine-tuned baseline, up to 26%, and compared to State-of-the-Art previous work, our on-device learning makes the difference between failing and succeeding in a never-seen before challenging environment.

Unleashing the Potential of TinyML Directly on Inteligent Sensors

Wassim KEZAI, Software Engineer, Innovation Academy

Abstract (English)

The year 2023 was a key year for tinyML unleashing a new age of intelligent sensors pushing intelligence from the
MCU into the source of the data at the sensor level, enabling them to perform sophisticated algorithms and machine learning models in real-time. This presentation delves into the implementation of TinyML solution provided by the team of Innovation Academy; a non-profit in Algeria North Africa that won the first place in the IEEE COINS 2023 Contest for In-Sensor Machine Learning Computing. Based on ST Microelectronics Intelligent Sensor Processing Unit (ISPU) coupled with highly optimized machine learning model developed using Neuton.ai tinyML platform to create a an ultra tiny general use case human activity recognition model. This application showcases the industrial potential and
feasibility of tinyML solutions implemented directly on the sensor, which can lead into revolution in industrial vibration monitoring and predictive maintenance domain. In this study, we created a general use case HAR model spanning up to 24 classes achieving an accuracy of 83.16% on the test set running only at 2kb of stack memory thanks to the capacity of the platform. The implemented approach permits running DSP features algorithms and complex machine learning models written in C language directly on the ISPU core leaving room for the MCU to perform other tasks. Moreover, the ISPU operates on ultra-low power using only 0.5 mA pushing the benefits of TinyML to their limits. This application demonstrates the feasibility of running tinyM L model inference directly on the sensor, highlighting its ability to operate within the limited computational resources on the sensor while still delivering high performance and power efficiency.
This, opens up new possibilities for the industrial application of tinyML such as industrial vibration monitoring and predictive maintenance.

2:25 pm to 3:20 pm

tinyML 4Good

Session Moderator: Thomas BASIKOLO, Programme Officer, ITU

Smart and Tiny UWB-radar Sensors: a Distributed Solution for Enhanced Home Care

Massimo PAVAN, PhD Student, Politecnico di Milano

Abstract (English)

As our global population ages, the demand for effective and personalized elderly care is on the rise. While societies struggle to find qualified people to perform this type of job, the technology for elderly care is advancing rapidly, integrating Machine Learning (ML) and robotic solutions to help ease the jobs of the carers [1]. The application in
elderly care of ML, in particular, goes beyond mere innovation; it promises to enhance the quality of life for the elderly by providing personalized and proactive solutions to address their unique needs. Despite the promising results achieved by this type of technology, they are still being adopted at a slow pace. Among the factors that contribute to this result are the privacy concerns raised by these solutions and the aversion that the final users feel towards technologies that are perceived as limiting of their personal freedom [2] [3].

The deployment of ML technologies, in fact, often involves the collection and anal-ysis of vast amounts of personal data, including health records, daily activity patterns, and even biometric information. While these data-driven insights can contribute to personalized and effective care, they also raise serious privacy concerns. Finding the right balance between leveraging data for improved care and safeguarding individual privacy is a critical challenge that needs thoughtful and robust solutions.

At the same time, a crucial aspect in the design of technological solutions is their perception among the seniors. Technologies in this field, in fact, in order to be used and accepted by the final user, require to be non-invasive, discrete and respectful of each one habits and lifestyle. Wearables and smart cameras are examples of technologies
that showed this kind of rejection from the elderly population: wearables require, in fact, to be constantly carried by the users in order to work, while cameras are often perceived as too invasive of the final user privacy.
In this context, we propose an innovative UWB-radar-based distributed solution employing in-sensor TinyML, with the objective of enhancing privacy while taking into account the perception of older users.

Ultra-wide-band (UWB) is an innovative radar technology mainly used for short-range communication and imaging [4]. UWB radar systems typically operate at low power levels and work on a wide frequency spectrum, enabling high resolution and precision in applications like imaging and localization. UWB technology is of partic-
ular interest in the context of home care since images produced by this type of sensor do not allow for recognition of the identity of a target, but are precise enough to track their activities. UWB-radar devices employing this type of sensors can be very discrete, considering they can even be embedded inside of walls. Nevertheless, when applied to embedded devices, UWB-radar produces high-dimensional and noisy data that require deep analyses and processing in order to be fully exploited. For this reason, TinyML [5] is used for the in-sensor analysis [6] of UWB-radar data, making the solution presented in this paper the first smart UWB-radar sensor.

Meet the Acoustic Smart Weather Station Aurora

Jona BEYSENS, Senior R&D Engineer, CSEM

Abstract (English)

In this talk, we will present Aurora, a low-power, intelligent, low-cost, and easy-to-install weather sta-on. By listening to the surrounding environment, Aurora es-mates the rain and wind intensity without the need for any mechanical moving part. To accomplish this, Aurora collects audio signals with an integrated microphone and processes this data through ultra-low-power Machine Learning (-nyML) at the edge. Aurora aims to unleash the poten-al of tinyML for positive change by crea-ng a reliable, maintenance-free and low-cost weather staton.

3:20 pm to 3:50 pm

Break & Networking

3:50 pm to 5:00 pm

Panel - Challenges of tinyML Applications

Panelists:

Laleh Makarem – Logitech

Nicolo Annino – Idealarm Ltd

Eric Benhaim – Oro Sound

Armando Caltabiano – Truesense Srl

Jerome Schang – Alif Semiconductor

Session Moderator: Martin CROOME, Vice President Marketing, GreenWaves

9:00 am to 9:05 am

Welcome

Session Moderator: Hajar MOUSANNIF, Associate Professor, Cadi Ayyad University, Morocco

9:05 am to 9:50 am

Keynote from Bosch

Session Moderator: Hajar MOUSANNIF, Associate Professor, Cadi Ayyad University, Morocco

Between two worlds, high performance and ultra low power AI: challenges, trends, and solutions

Taha Soliman, Research Project leader, Robert Bosch GmbH

Abstract (English)

Neural networks are presented in the last few years as an effective candidate for various tasks targeting a wide range of products. From tiny devices all the way up to the autonomous driving, one of the main concerns is the energy efficiency which goes in hand with the environmental footprint. Increasing performance requirements, energy efficiency together with the rapid development of new neural network models and architectures are some of the main challenges. Leveraging achievements from different disciplines like emerging memory technologies, NPU architectures and hardware aware optimization frameworks can be an enabler to utilize new and powerful neural networks in the embedded domain. In this talk, the challenges, and trends will be explored and discussed from industrial viability and research promise.

9:50 am to 10:20 am

Efficiency and Optimization in tinyML - Part IV

Session Moderator: Dirk STANEKER, Group Leader, Bosch Sensortec GmbH

AutoStreamLib: Efficient execution of Temporal Convolutional Networks through Seamless Transformation from Non-Streaming to Streamable Inference

Seyed Ahmad MIRSALAR, PhD Student, University of Bologna

10:20 am to 10:50 am

Break & Networking

10:50 am to 11:15 am

TinyML: From Concept to Reality

Carmelo SANSONE, Director Strategic Business Development, Renesas

Abstract (English)

TinyML is no longer a futuristic concept; it’s gaining prominence. Discover how TinyML has transitioned from a futuristic concept to a game-changer. We’ll explore its benefits, market trends, and Renesas’ vision for democratizing AI/ML. Join us on this journey toward real-time analytics and Edge AI

11:15 am to 12:15 am

Demos & Posters

12:15 am to 1:15 pm

Lunch & Networking

1:15 pm to 2:35 pm

Hardware Acceleration and Model Compression - Part III

Sensorless Pattern Recognition of PMSM/BLDC Motors against Aerodynamic, Uncontrollability and Asymmetrical Load Influence at the Extreme Edge using Intelligent Neural Processing

Nabarun DASGUPTA, System Research and Applications Group Manager, STMicroelectronics

Abstract (English)

The study introduces a deep learning framework designed for diagnosing electric motor-actuated systems by leveraging only the current and voltage data gathered during the execution of a field-oriented control algorithm. The framework operates at the edge, in parallel with the motor control algorithm, and eliminates the need for additional sensors, which are normally employed for anomaly detection. The study focuses on detecting issues such as aerodynamic effects, control parameter mismatches, and asymmetric loads. The algorithm employs sensorless pattern recognition with a deep convolutional neural network running on an STM32G4 microcontroller, analyzing electrical signatures from the motor control algorithm. This method promises early anomaly detection, thereby enhancing motor health and operation by using the motor itself as a sensor.

Accelerating Quantized DNN Inference through Precision-Scalable Multipliers Integrated in Hardware Accelerators and RISC-V Processors

Luca URBINATI, PhD Student, Politecnico di Torino

Abstract (English)

Research Context and Motivation: Deep Learning (DL) algorithms, crucial for a myriad of applications, heavily rely on Multiply-and-Accumulate (MAC) units to perform essential operations such as convolutions and matrix multiplications. Furthermore, since many DL applications require low inference time, faster MAC units and hardware accelerators are needed. In the context of edge devices, there is also a pressing need for quantization of activations and weights to reduce energy consumption and inference latency.

Addressed Research Questions/Problems: Our work delves into the realm of Mixed-Precision Quantization (MPQ), which explores the optimal bit precision for activations and weights of Deep Neural Networks (DNNs) layers. Practical implementation of MPQ demands precision-scalable (PS) and reconfigurable hardware. Despite the many PS alternatives [1], those exploiting Sum-Together (ST) and Sum-Apart (SA) multipliers (Fig. 1A) in DNN hardware accelerators [1] and RISC-V cores [2] are only few.

Novel Contributions:
1. We conducted a comprehensive comparison of the state-of-the-art ST multipliers, evaluating their
power, performance and area (PPA) characteristics through a design space exploration (DSE).
2. We expanded the portfolio of accelerators based on ST multipliers [1], proposing three novel implementations: 2D-Convolution (2D-Conv), Depth-wise Convolution (DW-Conv), and Fully-Connected (FC). High-Level Synthesis (HLS) is employed for the first time to generate PS hardware accelerators.
3. Until now SA and ST multipliers have been proposed as alternative implementations. Instead, we
proposed a new Sum-Together/Apart Reconfigurable (STAR) multiplier, capable of operating either
in SA or ST mode. Based on a Baugh-Wooley (BW) architecture (Fig. 1B), its partial product matrix
(Fig. 2B) can be configured to execute: N=1 scalar multiplication, or N=2, 4 parallel low-precision
multiplications for both SA and ST operating modes, with 16 / N-bit operands (Tab. 1B). STAR
enables a more efficient utilization of hardware resources, by dynamically sharing them to perform
different tasks, such as both 2D-Conv and DW-Conv in a single hardware accelerator.
4. We are the first to embed STAR in the MAC unit of a low-end RISC-V CPU. We replaced the default
16-bit multiplier inside the MULT/DIV unit of the Ibex processor [4] and added new MAC
instructions: standard 32-bit and 16/8/4-bit MAC operations in ST/SA mode.

Adopted Methodologies and Results:
1. The DSE of ST multipliers, performed varying the clock frequency from 100 to 1000 MHz and
synthesizing the multipliers on a 28-nm 0.9V CMOS technology, established the Pareto optimality of
the Booth ST multiplier in terms of area across all frequencies (Fig. 2A).
2. We made a wide DSE for each ST accelerator, using our HLS flow (Fig. 3A) and exploring many
hardware parameters (e.g. the MAC units parallelism). The DSE enables designers to choose the
optimal accelerator for their PPA targets.
3. We showcased the advantages of ST- based accelerators when integrated into System-on-Chips (SoCs)
in three different scenarios (low-area, low- power, and low-latency). The case study involved running
inference on MLPerf Tiny models quantized in mixed-precision (MP). The results showed a significant
latency and energy reduction with negligible area overhead (0.9%–8.0%) across the three scenarios,
when compared to SoCs with accelerators based on fixed-precision 16-bit multipliers (Tab. 1A).
4) At the cost of limited overhead in area (<10%) and power (<3%) compared to the original Ibex, our
modified Ibex showed an acceleration up to 4.5x for FC and 3x for 2D-Conv layers, configuring STAR
MAC in ST mode, and up to 2.3x for DW-Conv layers, configuring STAR MAC in SA mode.
Future Work: In the future, we plan to develop new hardware accelerators harnessing the potential of STAR
multipliers to accelerate MP-quantized DL workloads.

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with Tinyissimo YOLO new

Julian MOOSMANN, Doctoral Student, ETH Zürich

Abstract (English)

Smart glasses rapidly gain advanced functionality thanks to cutting-edge computing technologies, accelerated hardware
architectures, and tiny Artificial Intelligence (AI) algorithms. Therefore, integrating AI algorithms into smart glasses featuring a small form factor and limited battery capacity is still challenging when targeting full-day usage for a satisfactory user experience. For example, recently released RayBan-Meta smart glasses have an estimated battery runtime for moderate use of approximately four hours1. While these wearable devices predominantly run with Qualcomm’s Snapdragon AR1 Gen1 computing platform, on-device AI inference is performed rarely [1].
This work2 illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors to enable prolonged continuous operation in smart glasses. We explore the energy- and latency-efficiency of smart glasses in the case of real-time object detection. To this goal, a smart glasses prototype was designed, see Fig. 1a) and Fig. 1b) as a research platform featuring two microcontrollers, including a novel milliwatt-power RISC-V parallel processor with a hardware accelerator for visual AI, and a Bluetooth low-power module for communication. A modular design was targeted so that reuse of the same platform can be guaranteed for different low power AIoT applications. All components are tightly integrated onto a custom, miniaturized Printed Circuit Board (PCB), which can fit inside the plastic housing of glasses and even replace existing PCBs of commercial smart glasses, see Fig. 1a). The smart glasses integrate power cycling mechanisms for imagers and audio sensing interfaces. A demonstrator firmware performs always on object detection using the following pipeline: capturing an image, demosaicing, running AI inference, and postprocessing. Thereafter, a network’s output is post-processed by running a non-max suppression algorithm to extract the boundary boxes of the predicted detection. The whole demonstrator loop—capture, pre/post process, and running inference—takes 56 ms resulting in about 18 fps of continuous demonstrator execution,
see Fig. 1c). Furthermore, a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-based inference dubbed TinyissimoYOLO v1.3, v5, and v8 were developed, aiming at benchmarking object detection networks running on smart glasses for energy and latency. Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO’s 17 ms inference latency and 1.59 mJ energy consumption per inference, see Fig. 1c) while ensuring acceptable detection accuracy, see Table I and Table II.
The proposed family of new TinyissimoYOLO versions, use the YOLOv3 [2] detection head and the backbones of the
architectures proposed in TinyissimoYOLOv1 [3], YOLOv5 [4]—with a LCM3 of 0.15—, and YOLOv8 [5]—with a LCM3 of 0.18 and DM4 of 0.3. The new networks contain 50 x to 100 x fewer parameters compared to the initial YOLOv1 version [6] and have been evaluated and compared on the PascalVOC [7] dataset. Further evaluations show the mean-average precision achieved for the new TinyissimoYOLO versions in terms of different network sizes, see Fig. 1d). It further shows, which TinyissimoYOLO versions fit inside the GAP9’s on-chip memory. The best performant big versions of TinyissimoYOLOv5 and v8 with 890 k and 840 k parameters respectively, achieve 42% mean-average precision (mAP), see Table I while being executed within 38.3 ms and 34 ms on the GAP9 microcontrollers’ NE16 accelerator. The fastest small TinyissimoYOLOv1.3 is executed within 16.2 ms consuming only 1.27 mJ of energy for one inference and achieves 32% mAP. As such, this work presents a highly generalized multi-class network family for detection, running with SOTA performance in real-time (>16fps) on the GAP9 microcontroller.
Further evaluation reveals an end-to-end latency from image capturing to the algorithm’s prediction of 56ms or equivalently 18 frames per seconds (FPS), with a total power consumption of 62.9mW, equivalent to a 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet [8] (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS per second. To sum up, this work proposes a novel smart-glasses architecture. Furthermore, new TinyissimoYOLO versions are proposed featuring the latest YOLOv5 and YOLOv8 versions to fit onto microcontrollers.
REFERENCES
[1] F. Samie, L. Bauer, and J. Henkel, “From cloud down to things: An overview of machine learning in internet of things,” IEEE Internet of Things Journal,
vol. 6, no. 3, pp. 4921–4934, 2019.
[2] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[3] J. Moosmann, M. Giordano, C. Vogt, and M. Magno, “Tinyissimoyolo: A quantized, low-memory footprint, tinyml object detection network for low
power microcontrollers,” in 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2023, pp. 1–5.
[4] G. Jocher, “YOLOv5 by Ultralytics,” May 2020. [Online]. Available: https://github.com/ultralytics/yolov5
[5] G. Jocher, A. Chaurasia, and J. Qiu, “YOLO by Ultralytics,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2016, pp. 779–788.
[7] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of
computer vision, vol. 88, pp. 303–338, 2010.
[8] J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han et al., “Mcunet: Tiny deep learning on iot devices,” Advances in Neural Information Processing Systems,
vol. 33, pp. 11 711–11 722, 2020.
* These authors contributed equally.
1https://www.meta.com/ch/en/legal/ray-ban-meta/disclosures/
2Moosmann J, Bonazzi P, Li Y, Bian S, Mayer P, Benini L, Magno M. Ultra-efficient on-device object detection on ai-integrated smart glasses with
tinyissimoyolo. arXiv preprint arXiv:2311.01057. 2023 Nov 2.