tinyML Summit 2024

Unleashing a world of possibilities with Tiny ML

April 22-24, 2024

About

The 2024 tinyML© Summit provides the ultimate convergence point for the tinyML community, from trailblazing suppliers to forward-thinking end-users, ingenious engineers to visionary business leaders. Join the world-wide tinyML community to discover the latest innovations in Tiny ML, understand successfully deployed commercial solutions, and network with thought-leaders bringing AI/ML closer to every sensor and edge device. By attending the summit and joining the tinyML community, you help unleash the possibilities of Tiny ML.

The tinyML Summit features an impactful mix of exhibitions, presentations, and networking. The exhibition area features leaders in tinyML technology for live demonstrations of the latest innovations. The summit will feature multiple opportunities to network and connect with other community members to share ideas, build collaborative solutions, and enhance business networks.

Venue

Hyatt Regency San Francisco Airport

1333 Bayshore Highway, Burlingame, CA 94010

Contact us

Rosina Haberl

Elfego Solares

The tinyML Research Symposium will be held in conjunction with the tinyML Summit. The Research Symposium is the premier annual worldwide gathering of technical experts and researchers representing the global tinyML research and educational community. The tinyML Research Symposium will focus on emerging tinyML technology, research, and theory that will potentially come to market in the coming decade. The tinyML Summit focuses on the technology and solutions available now or in the immediate future. Join us in harnessing the power of the future, today. By attending the 2024 tinyML Summit and becoming part of our dynamic community, you are not just an observer; you are a catalyst to unleash a world of possibilities of Tiny ML.

Schedule

9:00 am to 5:00 pm

tinyML Research Symposium

Agenda here

7:30 am to 8:30 am

Registration & Breakfast

8:30 am to 9:00 am

Welcome

9:00 am to 9:50 am

Keynote by Vikas Chandra from Meta

Session Moderator: Mouna ELKHATIB, CEO, CTO, and Co-Founder, AONDevices Inc.

On-device Contextual AI: Challenges and Opportunities

Vikas CHANDRA, Senior Director, Meta Reality Labs

9:50 am to 10:15 am

Solutions that moved ML/AI

Session Moderator: Lennart BAMBERG, Senior-Principal AI Architect, NXP

ExecuTorch: A PyTorch Software Stack for On-Device Machine Learning Execution

Mengtao YUAN, Tech Lead Manager, Meta

Mergen NACHIN, Software Engineer, Meta

Abstract (English)

The fast-paced evolution of Machine Learning (ML) and the variety of TinyML hardware designs
make it difficult for a single group or framework to simplify research-to-development. Historically,
ML frameworks have underserved the TinyML community, resulting in friction during deployment
and a slower adoption of on-device ML. PyTorch’s ExecuTorch addresses these issues by capturing PyTorch semantics into a platform-independent representation, allowing for cross-platform compatibility and the ability to tailor models and the execution environment (e.g.
the runtime) to specific platforms. This provides a foundation for the TinyML community to build
solutions and standards. ExecuTorch provides a PyTorch-based process that captures programs
without Python dependencies and provides entry points for full or partial compilation. It offers a
platform-portable interpreter with pluggable memory planning and kernel libraries, as well as
platform-specific extensions like logging and error handling. ExecuTorch is in alpha state and
already provides solutions for common platforms, with state-of-the-art performance on high
impact models via partnerships with top companies in the TinyML space. We expect ExecuTorch to become a foundational technology for the community to jointly develop solutions across a wide variety of use cases and foster innovation in the TinyML space.

10:15 am to 10:45 am

Break & Networking

10:45 am to 11:10 am

Solutions that moved ML/AI - Part 2

Session Moderator: Lennart BAMBERG, Senior-Principal AI Architect, NXP

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO

Michele MAGNO, Head of the Project-based learning Center, ETH Zurich, D-ITET

Abstract (English)

Smart glasses are rapidly gaining advanced func-tionality thanks to cutting-edge computing technologies, acceler-ated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. Integrating AI into smart glasses featuring a small form factor and limited battery capacity is still challenging when targeting full-day usage for a satisfactory user experience. This paper illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors to enable prolonged continuous operation in smart glasses. We explore the energy- and latency-efficient of smart glasses in the case of real-time object detection. To this goal, we designed
a smart glasses prototype as a research platform featuring two microcontrollers, including a novel milliwatt-power RISC-V parallel processor with a hardware accelerator for visual AI, and a Bluetooth low-power module for communication. The smart glasses integrate power cycling mechanisms, including image and audio sensing interfaces. Furthermore, we developed a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-
based inference dubbed TinyissimoYOLO v1.3, v5, and v8, aiming at benchmarking object detection with smart glasses for energy and latency. Evaluations on the prototype of the smart
glasses demonstrate TinyissimoYOLO’s 17ms inference latency and 1.59mJ energy consumption per inference while ensuring acceptable detection accuracy. Further evaluation reveals an end-to-end latency from image capturing to the algorithm’s prediction of 56ms or equivalently 18 frames per seconds (FPS), with a total power consumption of 62.9mW, equivalent to a 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs
a simpler task (image classification) at just 7.3 FPS per second.

11:10 am to 11:25 am

Lightning Talks Posters

Session Moderator: Mouna ELKHATIB, CEO, CTO, and Co-Founder, AONDevices Inc.

11:25 am to 11:40 am

Pitches of the Demo tables

Session Moderator: Mouna ELKHATIB, CEO, CTO, and Co-Founder, AONDevices Inc.

11:40 am to 12:40 pm

Demos (Exhibits) + Posters

12:40 pm to 1:40 pm

Lunch & Networking

1:40 pm to 2:05 pm

Solutions that moved ML/AI - Part 2

Session Moderator: Lennart BAMBERG, Senior-Principal AI Architect, NXP

Transformer-Based Model Deployment on Edge Devices through MicroNPUs Operator Converter

Shinkook CHOI, Lead Core Research, Nota Inc

Abstract (English)

As deep learning progresses, the development of diverse edge devices, including Microcontroller Units (MCUs) and micro Neural Processing Units (microNPUs), has surged, accompanied by advancements in compilers and runtimes. However, these devices, initially tailored for conventional computer vision tasks and accelerating convolution operations, pose challenges for running transformer-based AI models. Despite recent revelations showcasing the superior accuracy of vision transformers over convolution-based models, a gap exists in the support for certain operators within transformers by microNPUs compilers. This gap necessitates execution on Central Processing Units (CPUs), causing context switches between the CPUs and microNPUs and resulting in decreased latency performance. We propose a novel method to convert unsupported operators for microNPUs within transformers into combinations of supported operators, circumventing the need for compiler or runtime modifications. Testing our approach on an Alif Ensemble E7, which utilizes Arm Cortex M55 CPUs with Ethos U-55 microNPUs for Conformer and Vision Transformer (ViT) models, we achieved remarkable speed improvements with 100% utilization of microNPUs—Conformer is 5.7 times faster, and ViT is 8.8 times faster. This innovation paves the way for the efficient deployment of transformer-based models across a spectrum of edge devices.

2:05 pm to 3:20 pm

Real-World Success Stories

Session Moderator: Grant STRIEMER, Director, Corporate R&D, The Procter & Gamble Company

Integrating TinyML into Schneider Electric’s connected products

Pierre BARET, Senior Embedded Artificial Intelligence Engineer, Schneider Electrics

Abstract (English)

As one of the tinyML sponsors, Schneider Electric is actively exploring the potential of embedded artificial intelligence in driving the evolution of energy management and industrial automation towards a more sustainable future. In this session you will discover how to combine extensive domain expertise with cutting-edge technology to bring intelligence to the most compact products. Practical examples showing how Schneider embeds AI in their connected products will demonstrate the potential of tinyML: advanced functionalities and control, increased efficiency, and longer life span of assets.

Intelligent monitoring of Permanent Magnet Synchronous Motors with Stellar-E micro-controllers

Andrea ZANELLINI, MLEngineer, HPE Group

Abstract (English)

Problem statement

Electric motors, specifically Permanent Magnet Synchronous Motors (PMSMs), are vital in numerous industrial settings due to their high efficiency and performance. However, these motors are prone to demagnetization from excessive temperatures and loads, leading to severe performance degradation or complete failure.

Solution

To combat potential costly mechanical failures and operational downtime, we have introduced an innovative edge computing solution. Utilizing the STMicroelectronics Stellar-E MCU, our system employs a dual neural network approach. The first neural network emulates a temperature sensor for the magnets, while the second calculates a health index by tracking motor vibrations in real time and, when anomalies occur, identifies the faulty component.

Relevance to tinyML

This project serves as a practical example of tinyML’s effectiveness in solving intricate challenges with significant implications for the automotive industry. Our application of tinyML demonstrates the transformative potential of this technology in real-world industrial scenarios.

The near-term impact of the work

The fruits of our work are set to refine the control mechanisms of motors orchestrated by the inverter ECU in HPE Group’s electric powertrains, signalling the tangible advantages of our developments.

Technical approach and its novelty

Our methodology for developing both applications unfolded in three key stages:

  • Initially, we designed and constructed a bespoke sensing board to gather the data necessary for our AI models.
  • Next, we utilized this data to train a data-driven model capable of deducing the required measurements.
  • Lastly, we implemented the refined model into practical use, where it now provides instantaneous forecasts.

Due to the challenging nature of measuring magnet temperatures directly because of the rotor’s inherent rotation, we opted to instruct a recurrent neural network to estimate the magnet temperatures using data from accessible sensors.

Results and their significance to the tinyML community

Our system shows the capability to approximate rotor temperatures with a mean absolute error of less than 2 °C. Additionally, it identifies mechanical anomalies with 98% precision. These results underscore the system’s precision and reliability, marking a substantial contribution to the tinyML community and highlighting the potential for broader application in various industrial domains.

Call to Action for the tinyML Community

Automotive industry needs Tiny ML solutions, and the proposed work is a contribution in this respect. Through this work, we encourage such an industry to be involved more than in the past.

Embedded Joint Acoustic Echo Cancellation and Noise Suppression

Francesco CASTELLI , DSP engineer, NXP Semiconductors

Abstract (English)

In full-duplex audio communication systems, overall speech quality and intelligibility is undermined by the simultaneous presence of acoustic echo and noise. Acoustic Echo Cancellation (AEC) and Noise
Suppression (NS) systems aim at removing, respectively, these disturbances while preserving speech
quality. In the last few years, state-of-the-art solutions have transitioned from distinct digital signal
processing systems handling echo and noise separately to end-to-end deep learning models addressing
AEC and NS simultaneously. While these solutions ensure enhanced performances, the resulting
substantial increase in computational load combined with real-time processing requirements makes their use unfeasible on constrained-resource embedded hardware. To address these challenges, we developed a set of state-of-the-art joint AEC and NS models at different sizes designed to operate on distinct hardware configurations:
– A tiny sub-100k parameters model that runs on the i.MX RT600 MCU with Arm Cortex-M33 (300
MHz) and Cadence Xtensa HiFi4 Audio DSP (600 MHz)
– A larger model that runs on the i.MX 8M Plus MPU with quad core Arm Cortex-A53 processor and
NPU (2.3 TOPS)
All models are end-to-end deep learning systems based on state-of-the-art Convolutional Recurrent Networks (CRNs) [1][2] architecture, as presented in Figure 1. Microphone and reference frequency-
domain signals are first aligned and combined using an alignment block and further processed by aconvolutional encoder and decoder with skip connections and recurrent bottleneck. The model output is a set of time-frequency masks applied to the microphone input signal to jointly suppress acoustic echo and noise and restore the clean speech. The novelty of our approach is twofold. First, we define a multi-objective loss function where we separately compute signal distortion losses on echo, noise and clean speech estimates and combine them based on their contribution in the microphone input signal. Then, we apply a set of architectural optimizations to each model block to reduce computational burden.

3:20 pm to 3:50 pm

Break & Networking

3:50 pm to 4:40 pm

Real-World Success Stories - Part II

Session Moderator: Grant STRIEMER, Director, Corporate R&D, The Procter & Gamble Company

Oscar the Sorter – Revolutionizing Recycling Through Robotics and No-Code ML Development

Michael GAMBLE, Director Product Managememt, TDk Qeexo

Abstract (English)

Oscar the Sorter, a 2024 CES Innovation Award Honoree, is a recycling and order-picking machine learning application developed by TDK Qeexo and Doosan Robotics to demonstrate the potential of combining robotics and machine learning. Integrating Qeexo AutoML, TDK Qeexo’s no-code tool, with Doosan Robotics’ DART Suite, users can easily customize, train, and deploy machine learning models to supported Doosan robots, or access pre-trained ML models for tasks like recycling and order picking. With just a few clicks Oscar is ready to sort and pick efficiently.
Oscar the Sorter exclusively utilizes time-series current sensor and gripper position data to classify objects, eliminating the need for traditional vision technology. By avoiding camera data collection, Oscar enhances privacy and minimizes potential security risks, as detailed image data is not being collected at all, much less being streamed to the cloud as it is in many vision-based robotics solutions. While developing Oscar the Sorter, TDK Qeexo utilized only sensors already built into off-the-shelf gripper hardware, reducing costs required to purchase separate hardware and increasing ease of adoptability of machine learning solutions.
During this talk we will discuss how Qeexo’s AutoML no-code machine learning platform enabled Oscar’s development, and how AutoML’s fully automated machine learning workflows allow teams to be more efficient in developing and choosing the best performing model for their machine learning problem.
Join Qeexo to learn more about how Qeexo AutoML can power collaborative robotics applications and combine to create the future of robotics!

tinyML driven open-source edge portable spectrophotometer for rapid and non-destructive quality assessment of fruits and vegetables

Arun SHARMA, Assistant Professor, Department of Food Engineering at the National Institute of Food Technology Entrepreneurship and Management

Abstract (English)

Often consumers face uncertainty when purchasing fruits and vegetables as outer surface appearance
can be deceptive. What appears fresh on the surface may not hold true on the inside, leading to customer dissatisfaction. Most of the research on intact fruit spectroscopy is derivative in nature as it primarily showcase application of existing spectroscopy devices which are often proprietary in nature. The results of such studies often remain theoretical due to lack of integration mechanisms to incorporate developed models back into proprietary devices. This poses challenge for commercial adaptation of vibrational spectroscopy in commercial food quality supply chain. In response to this issue, tinyML driven chemometrics-machine learning framework is used to develop
portable spectrophotometer, an open-source edge device for rapid and real-time non-destructive quality assessment of fruits and vegetables while eliminating the need for costly, labour-intensive and time-consuming laboratory-based assessments.
This device currently at technology readiness level – 6 is going to be a win-win situation for both consumers and farmers. On one hand, it will empower consumers with data driven decisions and access to premium quality fruits and vegetables while on other hand premium quality horticulture produce command better prices, enabling farmers to increase their income. The real-time assessment capability of the instrument will also help farmers to identify fruits with potential issues before they deteriorate further. This will reduce post-harvest losses, ensuring more of the farmer’s produce reaches the market in optimal condition, further increasing their income by efficient resource utilization.

The present work address the research gap by facilitating integration of developed optimised machine
learning models back into microcontroller assembly to predict and classify quality of tomatoes in-real time. A first of its kind and innovative edge-computing portable short-wave near infra-red (SWNIR) spectrophotometer has been developed by integration of open-source hardware (AS7265x multispectral chipset wavelength range 410-940 nanometre (nm), Arduino Uno microcontroller) and software (R platform), housed in ergonomically designed and 3-dimension printed cabinet ensuring noise-free spectra acquisition.
Food technologists and horticulturists often use tomatoes as a model fruit for investigating ripening
process of climacteric fruits. Tomatoes contain bioactive compounds, fibre, vitamins, minerals, micronutrients, and carotenoids like lycopene. Epidemiological studies suggest lycopene consumption to be inversely related with risk of developing cancer, cataracts, osteoporosis, male infertility, peritonitis, and cardiovascular disease. During 15-day post-harvest storage study of over 100 samples of raw tomatoes, the spectral data was acquired at 18 different wavelengths ranging from 410nm to 940nm, along with laboratory estimation 14 physicochemical attributes. The functioning of the prototype involves Vis- SWNIR radiations penetrating into the fruit and subsequently reflected
back carrying information about its quality attributes. The machine learning models capture and analyse changes in the intensity of reflected waves to make accurate predictions about fruit quality. Statistical and chemometrics analysis revealed that blue (380-440nm) and green (440-600nm) spectra varied with attributes associated with water content loss, while red (600-750nm) and SW NIR (750-1100nm) spectra varied with attributes associated with carotenoid content, such as lycopene, antioxidant activity, and colour. Various linear models including multiple linear regression (MLR), principal component regression (PCR), partial least squares regression (PLSR) and non-
linear models including random forest (RF), support vector machine (SVM), and artificial neural network (ANN) were developed using 10-fold cross validation on 80-20% train-test split of dataset. MLR was observed to have exhibited linear relationship between lycopene and wavelengths 560nm, 645nm and 730nm with highest R-squared but support vector machine (SVM) outperformed all models with RMSE (Test) at 0.087 (p<0.05). In addition to machine learning regression models, optimised probabilistic and non-probabilistic classification models including logistic regression, Linear Discriminant Analysis (LDA), RF, ANN and SVM models were developed. In agreement with literature, 500–750 nm wavelength range dominated the classification of lycopene content. Logistic regression
and RF showcased accuracy of 80%, LDA and SVM at 90% while ANN outperformed all models with accuracy of 95% on test dataset. The results obtained in their study are better when compared with similar studies conducted in South America, Europe, East Asia, and Australia.
This study successfully augmented novel and first of its kind technological advancement in field of
spectroscopy for non-invasive quality assessment of fruit through tinyML framework. It is recommended to conduct similar detailed research for other climacteric fruits under different experimental settings for universal application of the proposed technology. tinyML community is called upon to consider the present work for commercial adaptation of this technology.

4:40 pm to 5:30 pm

Advancements in enablement and ecosystems for Tiny ML

Session Moderator: Jenny PLUNKETT, Senior Developer Relations Engineer, Edge Impulse

The State of TinyML Benchmarking: Current Landscape, Challenges, and Emerging Trends

Vijay Janapa REDDI, Associate Professor, Harvard University

Abstract (English)

Tiny machine learning (TinyML) promises to bring intelligence to devices at the edge of the network, but how do we measure success? This talk explores how benchmarking techniques are helping to advance TinyML hardware and models. It will discuss the state of the art, current challenges, emerging trends, and the importance of collaboration in building robust models and data-centric benchmarks. By working together, the TinyML community can unlock the full potential of intelligent edge devices and accelerate the progress of TinyML.

Deploying ONNX models on embedded devices with TensorFlow Lite inference engine using conversion-based approach

Robert KALMAR, Principal Machine Learning Engineer, NXP

Abstract (English)

In the context of embedded platforms, there are two widely used formats to deploy machine learning
models for inference – ONNX and TFLite. TensorFlow Lite ecosystem provides well established means
for inferencing on both embedded and mobile devices with extensive support in various NPUs (Neural
Processing Units). On the other hand, ONNX focuses on model representation interchangeability.
In this work, we present a converter-based approach to enable machine learning frameworks such as
PyTorch or MATLAB on embedded devices through ONNX as an interchange format and TensorFlow Lite as an inference engine using a tool called ONNX2TFLite, which is part of NXP eIQ®.
Despite multiple converters already being developed by communities or interest groups, they typically
rely on conversion to TensorFlow/Keras graph representation, and the export using TensorFlow Lite
Converter. This comes with multiple challenges of mapping ONNX features to TensorFlow Lite such as
quantization essential for most embedded accelerators or optimal operator representations.
We present the means for direct conversion from ONNX to TensorFlow Lite without the use of an
intermediate framework. The converter aims to produce a mathematically equivalent representation of
an ONNX model in TF Lite supporting both FP32 and quantized models. Apart from challenges in
mapping between the two formats, a substantial part of the work focuses on introducing NPU-aware
optimizations during the generation of the corresponding TensorFlow Lite model.
In our experiments, we achieved 10x inference speed-up on NXP’s i.MX8M Plus reference platform
leveraging the existing support of the NPU in TensorFlow Lite compared to the execution of ONNX
models on the CPU using the ONNX Runtime inference engine.

5:30 pm to 5:35 pm

Closing

5:35 pm to 7:30 pm

Dinner & Networking

7:30 am to 8:30 am

Registration/breakfast

8:30 am to 8:45 am

Welcome, Recap Day 1 and Agenda for Day 2

8:45 am to 9:45 am

Keynote

Session Moderator: Mouna ELKHATIB, CEO, CTO, and Co-Founder, AONDevices Inc.

Accelerating Edge AI Innovation

Parag BEERAKA, Senior Director, arm

Abstract (English)

Compute at the Edge is poised for a revolution, fueled by the transformative power of AI with on-device ML inference. This talk provides a vibrant picture of the future. Imagine edge devices capable of tailoring experiences in real-time, and optimizing operations on the fly. AI empowers edge computing with intelligent data synthesis, anomaly prediction, and personalized content generation – all at the edge, minimizing latency and maximizing privacy. Discover how Arm empowers you to unlock the full potential of AI at the edge, transforming your business and shaping a smarter, more connected future.

9:45 am to 10:05 am

Unlock the secrets of Tiny ML

Session Moderator: Nina DROZD, Software Developer, arm

Designing Embedded NPUs for Transformer Networks

Rakesh GANGARAJAIAH, Principal Engineer, arm

Abstract (English)

Energy efficiency is key for embedded neural processing units (NPUs), that often operate on a tight power budget. Optimizing networks, software, and hardware is essential to get the best performance out of every micro-Joule of energy spent on inference in an edge device. This presentation addresses the challenges of designing configurable NPUs capable of providing best-in-class performance and energy efficiency, while running the most popular classes of neural networks, including transformer-type networks.

10:05 am to 10:35 am

Break & Networking

10:35 am to 11:00 am

Unlock the secrets of Tiny ML - Part 2

Session Moderator: Nina DROZD, Software Developer, arm

Pascal-range accurate ultra tiny pressure drift compensation

Danilo PAU, Technical Director, IEEE & ST Fellow, System Research and Applications, STMicroelectronics

Abstract (English)

Problem statement
Pressure sensors are subject to thermo-mechanical stresses during their life span. Examples areexposure to high temperature and short and prolonged time. Despite many efforts in improving sensor MEMs geometries, metal, and material to fabricate them, these stresses create hysteresis effects which cause a drift of the sensor pressure measurement w.r.t.the expected values. These also decrease accuracy and increase uncertainties of the sensor pressure measurements.Unfortunately, approaches like one-point-calibration are highly inadequate and in accurate.
Solution:
To overcome such problems,we have introduced an innovative sensor prone edge computing solution. By devising an ultra-tiny neural network, our proposed solution is capable to drastically reduce the drift and lack of accuracy in various case studies:

1) after sensor solder process dealingwith up to reflows at 260 °C for 10-40 seconds;
2) high temperature stresses 150 °C for 1,000 hours.
3) on the shelf for 25 °C for 1,000 hours.
Relevance to tinyML:
This project serves as a practical example ofin sensor AI computing since itsolvesthe challenge to devise a tiny ML workload within significant restrictions in term of memory footprint and computing complexity assets with significant improvements to many industry contexts such as consumer and automotive. Our solution demonstrates the transformative potential of in sensor AI technology in real-world realistic industrial scenarioswhere data were acquired.
The near-term impact of the work:
The results of our work are set to demonstrate that TinyML can be practically deployed into a severely resource restricted sensor (much beyond initially conceived) which pave the way of truly achievable ultra tiny in sensor AI computing.
Technical approach and its novelty
Our solutionexplores a:
• 101parameters binary tiny temporal convolution requiring 182 multiply and accumulates.
• It has been designed in hw in just 7Kgates without multipliers, at 24 bits, and implemented in swtoo on ISPU intelligent sensor and STM32 U5 low power micro controller to demonstrate the versatility of the deployment in different system architectures.
• Lastly, we demonstrateddifferent waysto arrange the data to mimic supervised and on device learning across the sensor device which were subject of the data acquisitions.
Results and their significance to the tinyML community:
Our solutions achievedthe capability to reduce to few Pascalthe mean absolute error when using the HPS22DF Pressure Sensor used for data acquisition. These results underscore the system’s accuracy, marking a substantial contribution to the tinyML community interested to in sensor AI computing and highlighting the potential for broader application in various industrial domains.
Call to Action for the tinyML Community:
System integrators (automotive and consumers) shall urgently adopt such a Tiny ML solution, and the proposed work is a huge contribution in this respectready for adoption. Through this work we strongly push such an industry to be involved more than in the past.

11:00 am to 11:30 am

tinyML Award Ceremony

  • Best Tiny ML chip
  • Best Audio or Vision Application Product
  • Best Prototype of the Year
  • Best Paper tinyML Research Symposium

Session Moderator: Davis SAWYER, Co-founder, Deeplite

11:30 am to 12:30 pm

Demos (Exhibits) + Posters

12:30 pm to 1:30 pm

Lunch & Networking

1:30 pm to 2:30 pm

Data, datasets, and benchmarking for Tiny ML

Session Moderator: Petrut BOGDAN, Neuromorphic Architect, Innatera

Accelerating Model Optimization on the Edge Through Automated Performance Benchmarking and End-to-End Profiling

Nayara Aguiar, Performance Engineer, MathWorks

Abstract (English)

The resource-constrained nature of edge devices poses unique challenges in meeting strict performance
requirements. However, performance benchmarks for deployed models are often run manually and
infrequently, and other phases of the development workflow, such as the conversion of high-level
languages to C/C++ code, might not be evaluated for performance. While this strategy gives important
insights for improvements in the final product, the integration of performance testing throughout the
product development process enables early detection and mitigation of performance issues. In this work, we propose an automated workflow that streamlines the performance evaluation and optimization of deployed deep learning models on edge devices.
First, our approach automates the conversion of high-level code to optimized C/C++ code, seamlessly
deploys it to the target device, and runs the performance benchmark to evaluate the code deployed. The
benchmark incorporates the MLPerf Loadgen interface [1, 2] to evaluate metrics like latency and accuracy, so that we can effectively assess the impact of optimization techniques like pruning, quantization, and projection. The automation of this workflow saves time, reduces the likelihood of errors, and enables continuous evaluation of model performance. It also allows developers to iteratively refine their models and strike the right balance between performance and resource constraints. Second, once a performance gap is identified, developers can use our in-house profiling tool to investigate the performance issue. We showcase how this tool can be used in the end-to-end development cycle, from the high-level code to the generated code running on the edge device. This tool provides visualization of code execution stacks in the form of a timeline, making it easy for developers to pinpoint the source code responsible for performance bottlenecks and regressions. By including both automated benchmarking and end-to-end profiling as integral parts of the development cycle, we can ensure development teams can easily meet and improve the quality of our final shipping product. The insights shared can be leveraged for other use cases, and we can use our collective knowledge to drive advancements in performance for tinyML applications.

BiomedBench – A benchmark suite of ML-based biomedical applications targeting ultra-low-power wearable devices

Dimitrios SAMAKOVLIS, Ph.D. Student , Ecole Polytechnique Fédérale de Lausanne (EPFL) Embedded Systems Laboratory (ESL)

Abstract (English)

Machine learning (ML) has emerged as a transformative force in the biomedical domain, forging a symbiotic relationship that promises to revolutionize healthcare and research. This interdisciplinary synergy capitalizes on the vast quantities of daily biological and clinical data generated, offering innovative solutions for diagnostics, treatment optimization, drug discovery, and disease prediction. For the last decade, special attention has been given to monitoring patients through wearable devices. Real-time patient monitoring has enabled preventive action in chronic and critical diseases such as arrhythmia and epileptic seizures. Technological advances in chip manufacturing have significantly boosted the wearables domain, enabling complex ML algorithms to run on ultra-low power (ULP) devices within the μW range.
Numerous publications in the wearables domain have explicitly targeted either the software (SW) or hardware (HW) area. Ensuring continued progression in the wearables domain necessitates a more systemic approach to bridging SW and HW research efforts. In this work, we propose BiomedBench, a new benchmark suite composed of state-of-the-art (SoA) biomedical applications for real-time monitoring of patients using wearable devices. The applications include the complete pipeline from signal acquisition to signal preprocessing, feature extraction, and ML inference to classify the patient’s condition. BiomedBench aims to standardize HW evaluation as a benchmark suite and define the directions in HW design in the ULP wearables domain through a systematic application characterization that unveils the current needs and challenges. To this end, BiomedBench will be open-sourced, maintained, improved, and extended based on user feedback and research advancements.
Complementary to the benchmark suite, we performed a latency and energy analysis on multiple SoA ULP platforms featuring RISCV and ARM microarchitectures. The results showcase that no current ULP platform can efficiently manage all applications, hence proving the variability of challenges posed by different applications and validating the utility of BiomedBench in evaluating HW. During the process, we identified SW patterns in optimization and HW inefficiencies in SoA platforms. As a result, we have concluded with a set of key takeaways providing hints for future SW and HW developers in the wearable domain.
BiomedBench is perfectly aligned with the TinyML benchmarking area and can be directly utilized by the TinyML benchmarking group. BiomedBench includes four ML applications performing inference, including SVM, CNN, kNN, and Random Forest, and one application for on-device training of a CNN, using a novel training method. BiomedBench not only offers resource-constrained deployments of ML training and inference but also provides complete applications that lay the foundations for further research on the application of ML in the domain. To this end, we believe that the aforementioned contributions of BiomedBench are directly applicable to the TinyML community. In fact, BiomedBench can become a key benchmark SW in the TinyML community in the context of healthcare, and inspire future ML deployments and HW designs in resource-constrained devices. The open sourcing of BiomedBench can prove very beneficial for the TinyML and biomedical communities. On
the one hand, BiomedBench provides the ground for resource-constrained ML to flourish and further evolve.
On the other hand, to achieve maximum efficiency and ensure the implementations are up to date with the SoA, we invite the TinyML community to use BiomedBench to provide feedback and suggest improvements and optimizations in the ML part.

2:30 pm to 3:00 pm

Break & Networking

3:00 pm to 4:00 pm

Panel - Empower Edge AI with Gen AI

Panelists:

Maxime Loidreau – Schneider Electrics
Chintan Shah – NVIDIA
Pete Warden – Useful Sensors
Michael Dolbec – Momenta VC

 

Session Moderator: Davis SAWYER, Co-founder, Deeplite

4:00 pm to 4:15 pm

Closing

Schedule subject to change without notice.

Committee

Elias FALLON

Chair

Qeexo Co.

Mouna ELKHATIB

Co-chair

AONDevices Inc.

Lennart BAMBERG

NXP

Nina DROZD

arm

Evgeni GOUSEV

Qualcomm Research, USA

Anders HARDEBRING

imagimob

Wiebke HUTIRI

Sony AI

Sumeet KUMAR

Innatera

Mallik P. MOTURI

Syntiant

Danilo PAU

STMicroelectronics

Max PETRENKO

Amazon

Jenny PLUNKETT

Edge Impulse

Grant STRIEMER

The Procter & Gamble Company

Chetan SINGH THAKUR

Indian Institute of Science (IISc)

Brenda ZHUANG

Awards Committee

MathWorks

Sam Al-ATTIYAH

Awards Committee

imagimob

Ali O. ORS

Awards Committee

NXP Semiconductors

Davis SAWYER

Awards Committee

Deeplite

Speakers

Vikas CHANDRA

Keynote Speaker Tuesday

Meta Reality Labs

Parag BEERAKA

Keynote Speaker Wednesday

arm

Nayara Aguiar

MathWorks

Pierre BARET

Schneider Electrics

Francesco CASTELLI

NXP Semiconductors

Shinkook CHOI

Nota Inc

Michael GAMBLE

TDk Qeexo

Rakesh GANGARAJAIAH

arm

Vijay Janapa REDDI

Harvard University

MLCommons

Robert KALMAR

NXP

Michele MAGNO

ETH Zurich, D-ITET

Danilo PAU

STMicroelectronics

Dimitrios SAMAKOVLIS

Ecole Polytechnique Fédérale de Lausanne (EPFL) Embedded Systems Laboratory (ESL)

Arun SHARMA

Department of Food Engineering at the National Institute of Food Technology Entrepreneurship and Management

Mengtao YUAN

Meta

Andrea ZANELLINI

HPE Group

Sponsors

( Click on a logo to get more information)