March 28-March 30, 2022

About

The tinyML Summit 2022 brought together senior-level technical experts and decision-makers representing the fast-growing global tinyML community. This diverse ecosystem is composed of professionals from industry, academia, start-ups, and government labs worldwide working on leading-edge ultra-low power machine learning technologies for end-to-end solutions.

Venue

Hyatt Regency San Francisco Airport

1333 Bayshore Highway, Burlingame, CA 94010.

Room Booking

Contact us

Bette COOPER

enohP

liaM

Schedule

Speakers

Commitee

Downloads

News

November 04, 2021

Summit 2022 Awards

We will continue in 2022 to give out three awards to recognize the achievements of the industry and academia toward fulfilling the tinyML Foundation's vision of ultra-low power machine learning devices at the very edge of the cloud. We are seeking nominations for Best Product of the Year and Best Innovation of the Year.

October 19, 2021

tinyML Summit 2022 Sponsorship Opportunities

tinyML Summit 2022 will continue the tradition of high-quality state-of-the-art presentations. Find out more about sponsoring and supporting tinyML Foundation.

Schedule

Pacific Daylight Time / UTC-7

7:30 am to 9:00 am

Registration / Breakfast

8:00 am to 5:00 pm

Research Symposium

For Agenda details click here: Research Symposium Schedule

Pacific Daylight Time / UTC-7

8:00 am to 9:00 am

Registration and Breakfast

9:00 am to 4:40 pm

Posters and Sponsor tables open all day

9:00 am to 9:15 am

Welcome/Introduction

9:15 am to 10:00 am

Miniature dreams can come true!

Kate KALLOT, Head of emerging Areas, NVIDIA

10:00 am to 10:15 am

Break

10:15 am to 11:45 am

tinyML Vision

Computer vision is popular application for tiny machine learning. Many vision applications require real-time, low-latency processing, and privacy is a big priority, thus recognizing the images/videos locally on tiny edge devices has a lot of opportunity. However, the large computation and memory footprint poses a challenge on these low-power devices, which goes worse as the resolution requirement gets larger. Opportunities including model compression and neural network and accelerator co-design opens up a larger design space for tiny machine learning for vision, which has a large market in smart home, smart factory, smart driving, smart healthcare, and more.

Session Moderator: Adam FUKS, Fellow, MCU/MPU Architecture, NXP

TinyML for All: Full-stack Optimization for Diverse Edge AI Platforms

Di WU, Co-founder and CEO, OmniML

Abstract (English)

Today’s AI is too big, as modern deep learning requires a massive amount of computational resources, carbon footprint, and engineering efforts. This makes TinyML extremely difficult because of the limited hardware resource, the power budget, and the deploying challenges. The fundamental cause of this problem is the mismatch between AI models and hardware, and we are solving it from the root by improving the efficiency of a neural network through model compression, neural architecture rebalances, and new design primitives. Our research is highlighted by full-stack optimizations, including the neural network topology, inference library, and hardware architecture, which allows a larger design space to unearth the underlying principles. This enables us to deploy real-world AI applications on tiny microcontroller units (MCUs), despite the limited memory size and compute power. Based on this technology, we also launched a commercial company that helps the industry solve its Edge AI problems.

Tiny models with big appetites: Cultivating the perfect data diet

Jelmer NEEVEN, Deep learning scientist and software engineer, Plumerai

Abstract (English)

Although lots of research effort goes into developing small model architectures for computer vision, real gains cannot be made without focusing on the data pipeline. Production-worthy computer vision models need large quantities of training data, even when the models themselves are tiny. But since tiny models are eager to take shortcuts that don’t generalize in practice, we can’t tolerate low-quality data. In this talk, we cover the wide variety of techniques we use for curating optimal training datasets and designing better data sampling strategies. We also show the importance of measuring model robustness in diverse real-world environments. All of this is made possible by Plumerai’s in-house data infrastructure and tooling, built specifically for producing tiny computer vision models.

Enabling tiny camera sensors for Augmented Reality

Rakesh RANJAN, Senior Research Scientist Manager in Meta Reality Labs, Facebook

Abstract (English)

AR devices promise to enable immersive user experiences; however this requires overcoming challenges from form-factor, power and thermal constraints of such all-day wearable devices. In this talk, we will cover some of those challenges for an efficient AR camera and make a case for AI hardware and algorithm co-design to overcome them.

NDP200 tinyML Vision Processing

Dave GARRETT, Chief Architect, Syntiant

Abstract (English)

The Syntiant NDP200 is a special-purpose neural decision processor for deep learning and is ideal for always-on applications in battery-powered devices. The NDP200 applies neural processing to run multiple applications simultaneously with minimal battery power consumption. Built using the Syntiant Core 2™ programmable deep learning architecture, NDP200 is designed to natively run a variety of deep neural networks (DNN), such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW. With support of up to 900k 8-bit parameters, a direct DVP image interface, SPI + I2C peripheral controllers, the NDP200 enables ultra-low-power vision, sensor and speech interfaces in battery-powered systems and supporting always-on person presence detection and object classification use cases. The talk will show several use cases on the NDP9200 development board, jointly developed with PixArt, that includes an NDP200 neural decision processor, as well as the PAG7920 image sensor on a Raspberry Pi platform.

Dave Garrett

11:45 am to 12:15 pm

Networking / Break

12:15 pm to 1:15 pm

Lunch

1:15 pm to 3:00 pm

tinyML Audio

Talk to Me! Tiny opportunities in smart audio

Audio is a uniquely attractive target for Tiny machine learning. On one hand, audio, especially speech, hosts richly layered information on communication intent, identity, location, emotion and events. On the other hand, it is densely encoded so this rich diversity of information is surprisingly hard to extract and interpret. Machine Learning methods prove remarkably effective, opening up countless applications in speech recognition, event detection, voice trigger, speech transformation and generation. The relatively low bit-rates for audio make it possible to capture and process audio in small power budgets, and the privacy and latency concerns often make it necessary to concentrate audio processing at the edge – a perfect storm for tinyML audio opportunity.

Session Moderator: Chris ROWEN, VP of AI Engineering, Cisco

On device speech models optimization and deployment

Oleg RYBAKOV, Software Engineering Manager, Google

Abstract (English)

Real time on-device deep neural networks execution imposes significant constraints on model design: latency and memory footprint have to meet requirements defined by mobile hardware. Another aspect is end to end model deployment: it impacts speed of production iterations. To address it, the model should be designed with support of both streaming and quantization (sparsity is another option, but it is out of scope of this presentation). We will discuss the application of functional and subclass tf api for streaming aware model design and combining it with quantization. There are several techniques for model quantization: post training quantization and quantization aware training based on fake or native integer operations. We will review their pros and cons with selection criteria depending on the ML problem. In the end we will review benchmarks (on mobile) of several most popular model topologies, used for speech processing, based on residual convolutional and transformer neural networks.

Oleg Rybakov

AnalogML: Analog Inferencing for System-Level Power Efficiency

David GRAHAM, Co-founder and Chief Science Officer, Aspinity Inc.

Abstract (English)

In always-listening edge applications, the audio events of interest can occur unpredictably and/or infrequently. To ensure an event is not missed, significant power is wasted by needing to continuously digitize and process all sensor information – even though most is irrelevant noise. We will describe an innovative approach to eliminating this power inefficiency by using analog machine learning (analogML) to perform inferencing of raw, analog sensor data to determine relevancy prior to digitization. By using an extensive library of software-configurable analog circuits that can be programmed with standard machine learning techniques, analogML enables highly discriminating, ultra-low power audio event detection. Downstream digital systems can remain in sleep mode until an event has occurred and/or if further processing is needed, further saving power. AnalogML brings more intelligence into the analog domain, which results in new voice-first devices, acoustic-based security systems, and other always-listening edge devices that benefit from dramatically improved battery life.

David Graham

Dissecting a low power AI/ML edge application: Noise Suppression

Raj PAWATE, Group Director, Tensilica IPG, Cadence Design Systems Inc.

Abstract (English)

Noise suppression is an important pre-processing function that is needed to reduce cognitive loading whether you are listening to music or on a conference call in today’s work-from-home use case. These algorithms have transitioned from traditional spectral-subtraction-based algorithms to RNN-based algorithms and more recently to Transformer based algorithms with noticeable improvements in MOS (Mean-opinion-score). But these AI/ML based algorithms are computationally demanding with large model sizes running from a few hundred kilobytes to several megabytes. Deploying them in battery-powered devices such as earbuds or mobile phones is challenging. . In this talk, we discuss an example RNN-based Noise Suppression algorithm and demonstrate how some of the ML processing is offloaded from a DSP to an ML-optimized IP resulting in a significant reduction in energy. We discuss a robust software framework that enables developers to mix and choose the best of both traditional and ML-based processing functions for a pleasant user experience.

Pawate,Raj

Real-time deep speech enhancement system for embedded voice UI

Tess BOIVIN, ML Software Engineer, NXP

Abstract (English)

In this session, we will look at a low power real-time embedded mask-based beamformer for voice UI systems. Our solution is designed to improve wake word and voice commands trigger rates in real-life noisy scenarios and does not require any cloud interaction.

The voice UI system is built with a denoising audio front-end, a wake word engine and a voice command engine. Such a system is constrained by low power and high-performance requirements. In particular, real-time processing and noise robustness are the most challenging issues.

To meet the challenges, our solution is designed for embedded systems and is hybrid: a neural network is feeding a MWF-based multichannel processing algorithm. The 18k-parameter network is quantized in 16 bits and runs efficiently at 12MHz on an NXP-RT1060 MCU. In a 3-mics configuration, the complete speech enhancement solution is running on average at 160Mhz on the Arm Cortex-M7 device and leads to a 40% hit-rate improvement.

BoivinTess

3:00 pm to 3:45 pm

Poster session / Break

3:45 pm to 5:15 pm

Sensing for tinyML

tinySensing: Scaling tinyML solutions with optimized edge sensing capabilities

As more is done on the edge with less power, code and real estate, what does this mean for sensor requirements to stay aligned with the explosive tinyML growth? Going beyond popular tinyML applications in camera sensing for vision and MEMS microphones for audio, will traditional inertial, environmental, medical sensing, etc. capabilities suffice? What new innovations and ‘smart capabilities’ be required to squeeze more performance, bandwidth, lower power, memory, analog and ASIC capability into edge sensing solutions? Will changes in Machine Learning methods and tools be needed? Will printed sensing, with the promise of tiny footprints and power, be the future darling of tinyML? This session will delve into some of the opportunities, challenges and trends in the tinySensing world and provide a rigorous dialog on what is needed.

Session Moderator: Steve WHALLEY, CEO, Strategic World Ventures

Sensing Applications as a Driver for TinyML Solutions

Victor PANKRATIUS, Director / Head of Global Software Engineering, Bosch Sensortec GmbH

Abstract (English)

New generations of sensors are increasingly equipped with microcontrollers and computing capabilities that enable local machine learning in millimeter-sized packages. This talk presents examples and use cases where sensing applications have become a major driver for TinyML in ultra-low power contexts. Applications are shown for intelligent Micro-Electro-Mechanical Systems (MEMS) in motion learning, sports analytics, and gas and environmental sensing. Looking at the software stack, the talk also addresses the importance of formalizing and including domain knowledge into ML as an additional leverage for optimizations, such as shrinking memory footprints, making trade-offs in signal processing, and algorithmic choice. Learning from individual success stories, our insights help sketch a bigger picture for TinyML ecosystems and platforms that are beginning to take shape, and how various groups and communities can be engaged.

Brains into sensors with AI in the Edge

Andrea ONETTI, Executive Vice President, STMicroelectronics

Abstract (English)

AI solutions can be made significantly more efficient when data pre-processing and initial analysis is performed as close as possible to the sensing and actuating elements, rather than in the cloud. Applying this approach vastly reduces the amount of data transferred and offers enhanced data security and privacy. It decreases the processing and data storage resources required in cloud servers while allowing processing to take place in power-optimized components like ultra-low-power microcontrollers and sensors. It also minimizes latency allowing real-time responses in critical situations.

Today MEMS sensors with embedded AI can operate at microwatt levels of power consumption, ultra-low latency, and minimized silicon area thanks to on-die integration of sensor and logic processing. These devices are paving the way for the Onlife Era, where embedded sensors are enabling innovative products to sense, process, and take action. To demonstrate this is already a reality today, ST will present the world’s tiniest sensor solution that is bringing intelligence in the edge.

Onetti, Andrea

Sensors and ML: waking smarter for less

Abbas ATAYA, Director of Machine Learning and Software team, TDK InvenSense

Abstract (English)

Machine Learning at the Edge – where does the buzz stop and the value begin? If the edge is where the internet stops and the real world begins, sensors define the edge. By application, sensors convert physical phenomena into digital signals: data. What happens next is where ML is driving innovation.

Edge implies the connectivity of a node to a bigger system. The node has a sensor chip, created in a process optimized for cost, size, and power. Transmitting raw sensor data is resource-intensive. Converting the data into information or knowledge before transmission can greatly reduce the overall burden on the system. Machine learning is revolutionizing conversion. Nodes also have a radio chip for connectivity, similarly optimized, though radio chips are made in more advanced process nodes. As such, the radio chip will always have more resources than the sensor chip. The HW/SW node design must balance the conversion giving each chip a clearly defined role to improve accuracy and power conservation in the application.

I will explore how machine learning creates new opportunities to partition far reaching sensor systems and greatly reduce required resources such as dollars, volume, and watts. Using examples of systemic optimization studied in our lab, I will show how phasing the wake up of the nodes in the hierarchy of the system reduces resources while improving response. The analogies we discuss are similar to a person waking up in the night in response to a noise, a smell, a light, or a vibration.

Abbas Ataya

5:15 pm to 7:15 pm

Awards / Networking / Reception

Pacific Daylight Time / UTC-7

8:00 am to 9:00 am

Registration and Breakfast

9:00 am to 3:30 pm

Posters and Sponsor tables open all day

9:00 am to 9:15 am

Welcome

9:15 am to 11:10 am

Hardware

tinyHardware – there’s plenty of room at the bottom

tinyHardware stands at the crossroads between algorithm, technology, and architecture. Aggressive algorithmic optimizations such as quantization, pruning, and analog computation do not destroy Machine Learning’s capability to decode and interpret information – rather, they introduce smooth degradations that can often be tolerated. On the other hand, these optimizations offer a large “attack surface” for emerging technology such as non-volatile memory, analog in-memory computing, and innovative chip-to-chip links – as well as for novel architecture in terms of dataflow choices, sparsity support, core/memory coupling and caching schemes. Altogether, these opportunities define a huge design space spanning from low-bitwidth ISA extensions to multi-chip systolic arrays to System-on-Chips with core-coupled in-memory accelerators – a space that we have just started to navigate in the quest to raise system-level energy efficiency of tinyML applications while enabling more complex Tiny applications.

Session Moderator: Francesco CONTI, Assistant Professor, University of Bologna, Italy

Perspectives & Challenges for TinyML Hardware: a System-Level View

Francesco CONTI, Assistant Professor, University of Bologna, Italy

Abstract (English)

TinyML inference on ultra-low-power hardware is finally here to stay. But what are the perspectives for the future of this technology? Will Analog In-Memory Computing take up the scepter of crucial technology for TinyML from digital accelerators? Will TinyML hardware evolve to support learning as well as inference? In this session, we discuss the main challenges and opportunities ahead for TinyML hardware from the perspective of using it in real embedded systems.

Conti,Francesco

Programmable In-Memory Computing (IMC) Accelerator with >100 SRAM IMC Macros

Jae-Sun SEO, Associate Professor, ASU

Abstract (English)

Artificial intelligence and deep neural networks (DNNs) have been successful across many practical applications, but state-of-the-art algorithms require a large amount of computation and memory. To resolve the computation and memory access bottleneck in conventional hardware accelerators, in-memory computing (IMC) has emerged as a promising technique. While many single-macro-level IMC prototypes have been demonstrated, integration and programmability challenges remain for system-level IMC accelerators. To that end, we present a programmable IMC accelerator (PIMCA) integrating 108 capacitive coupling based IMC SRAM macros of a total size of 3.4 Mb, together with 1.5 Mb off-the-self activation memory, demonstrating a large-scale SRAM-based IMC system hardware. We will discuss circuit techniques and architecture design employed for the PIMCA chip, including a custom ISA featuring IMC and SIMD functional units with special hardware loop control, supporting a range of DNN layer types with up to 4X smaller program size. The 28nm prototype chip achieves high system-level peak/average energy-efficiency of 437/289 TOPS/W.

Seo,Jae-sun

Mastering the 3 Pillars of AI Acceleration: Algorithms, Hardware and Software

Swagath VENKATARAMANI, Research Staff Member, IBM

Abstract (English)

The success of Deep Neural Networks (DNNs) in performing complex AI tasks across many domains have been largely attributed to scale—the scale of the network, scale of the dataset of on which trained, among others. Hardware specialization and acceleration is regarded key to satiate the computational demands of DNNs, which requires synergistic cross-layer design across different layers of the compute stack. Guided by the evolution of AI workloads, this talk describes a holistic approach to designing specialized AI systems pioneered by IBM Research. This involves mastering the 3 key pillars of AI accelerator design: (i) approximate computing techniques to design low-precision DNNs models that maintain the same level of accuracy, (ii) hardware techniques to design scalable dense/sparse computational arrays that support a spectrum of precisions, and (iii) software methodologies to systematically map DNNs with diverse computational characteristics so as to extract maximum performance while simultaneously presenting intuitive (and familiar) programming and user interfaces.

Venkataramani,Swagath

Next-Generation Deep-Learning Accelerators: From Hardware to System

Sophia SHAO, Assistant Professor of Electrical Engineering and Computer Sciences , University of California, Berkeley

Abstract (English)

Machine learning is poised to substantially change society in the next 100 years, just as how electricity transformed the way industries functioned in the past century. In particular, deep learning has been adopted across a wide variety of industries, from computer vision, natural language processing, autonomous driving, to robotic manipulation. Motivated by the high computational requirement of deep learning, there has been a large number of novel deep-learning accelerators proposed in academia and industry to meet the performance and efficiency demands of deep-learning applications. In this talk, I will discuss challenges and opportunities for the next-generation of deep-learning accelerators, with a special focus on system-level implications of designing, integrating, and scheduling of future deep-learning accelerators.

Shao,Sophia

11:10 am to 11:55 am

Posters / Exhibits

11:55 am to 12:40 am

Lunch

12:40 am to 1:25 pm

Software / Tools

tinySW/Tools: Enabling tiny experiences for the real-world

Software and tools specifically designed for tiny machine learning (tinyML) are empowering researchers, data scientists and embedded engineers to pave the way for a whole new class of experiences. Whereas the focus was initially on creating software and tools to make each step of the process possible — on one hand enabling data scientists to create datasets, train and optimize models able to run on resource-constrained devices, on the other hand helping embedded engineers use the models to develop applications capable of efficiently performing inference on the microcontrollers themselves — we are now starting to see a shift towards software and tools that allow scientists and engineers to leverage tinyML in real-world use cases. Opportunities lie in tightly integrated end-to-end platforms, new data augmentation methods, performance analysis tools, as well as optimization tool techniques. This tinySW/Tools session will explore some of the recent advancements in the field, while uncovering some of the challenges we are facing as an industry.

Session Moderator: Alessandro GRANDE, Head of Product, Edge Impulse

Ecosystem of tools for better productivity

Danilo PAU, Technical Director, IEEE and ST Fellow, STMicroelectronics

Abstract (English)

With the increasing demand for a true ecosystem of productivity tools by embedded developers to help optimize their design phases, STMicroelectronics’ Danilo Pau will discuss how the ML community can help them increase productivity and introduce automation. An IEEE Fellow, associate editor of IEEE TNNLS, and Member of the Machine Learning, Deep Learning and AI in the CE (MDA) Technical Stream Committee IEEE Consumer Electronics Society (CESoc), Danilo is uniquely qualified to talk about how embedded developers working also on deeply quantized neural networks can benefit from ongoing research to drive the ML pipeline optimization process.

Pau,Danilo

Compiling TinyML Models with microTVM

Andrew REUSCH, Software Engineer, OctoML

Abstract (English)

Apache TVM is an open source deep-learning compiler that accepts models from popular frameworks like TFLite, ONNX, or PyTorch and compiles them for devices ranging from datacenter GPUs to mobile phones to bare-metal microcontrollers. In this talk, I’ll describe TVM’s architecture-agnostic approach to the model deployment problem and introduce microTVM, a project within Apache TVM that targets deeply-embedded TinyML applications. I’ll also give an update of our recent efforts around interpreter-free model execution, static memory planning, and heterogeneous compute support.

Building data-centric AI tooling for embedded engineers

Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse

Abstract (English)

In this talk, Daniel Situnayake—founding TinyML engineer at Edge Impulse and author of the book TinyML—will share the technical fundamentals behind the tooling that is used by tens of thousands of engineers to build edge AI applications. The talk will cover the infrastructure required to enable rapid prototyping on edge devices, the technologies that unlock a data-centric AI workflow, and the techniques that enable developers to close feedback loops while working end-to-end. Closing, Daniel will share his technology predictions for the next five years of edge AI tooling

Situnayake, Daniel

1:25 pm to 1:55 pm

Break

1:55 pm to 2:40 pm

Software / Tools

Session Moderator: Alessandro GRANDE, Head of Product, Edge Impulse

Expedited Model Deployment on the Edge with Recipes

Haya SRIDHARAN, Technical Product Manager, Latent AI

Abstract (English)

It is no secret that AI models need to be highly optimized to work efficiently on the edge. But model optimization is really challenging due to resource constraints on edge devices, lack of visibility into optimization tools and developer frustration from dealing with the idiosyncrasies of different hardware targets, compilers and development frameworks. To mitigate these challenges, we have come up with a recipe-based framework. Those recipes abstract the complexity of ML optimization away from developers, and allow developers to easily optimize their ML Models. All you have to do is “Bring your own data”, and the recipe helps you crank out models through deployment in a robust and consistent manner

Haya Sridharan

Suitability of TinyML for addressing predictive maintenance in high tech manufacturing

Christopher KNOROWSKI, CTO, SensiML Corp

Abstract (English)

In this talk, we discuss how SensiML’s Analytic Toolkit was combined with the Adapdix EdgeOps autonomous systems platform to predict equipment failures at the machine and sensor level of the industrial edge in high-tech manufacturing. The net result is a reduction in unplanned downtime, increased throughput, reduction in supply chain cost, and an increase in remote worker productivity.

Knorowski, Chris

Challenges for Large Scale Deployment of Tiny ML Devices

Gopal RAGHAVAN, Embedded AI Strategy, Microsoft

Abstract (English)

The last couple of years has shown remarkable progress in extending the limits of ML on Tiny Devices through innovations in device hardware and software. However, this has not resulted in a sizable increase in the number of deployed tiny devices for commercial customers. In this talk, we will examine what we have heard from our commercial customers on edge AI needs and the challenges associated with the large-scale deployment of ML on tiny devices. The highly fragmented nature of this market requires a broad MLDevOps solution supporting a variety of devices and toolchains. We will discuss how Azure and other cloud-based services can offer a solution to this problem and help us achieve the deployment of billions of devices from cloud to the heavy and light edge, all the way to the tiny edge.

Raghavan, Gopal

2:40 pm to 2:55 pm

Summit Wrap up

2:55 pm to 3:10 pm

Break

3:10 pm to 5:30 pm

Auto tinyML

Latest generation microcontrollers, sensors, digital signal processors, and ultra-low power accelerometers are opening numerous possibilities in compact machine learning (ML) applications with limited storage and power consumption. Now, the over 10 million C developers of the embedded community are calling for useful productivity tools to support ML pipeline design from the earliest concept and design phases. The tools should automate ML topology design and optimize configuration without requiring developers to craft new solutions for each issue.

This session will focus on tools for Automated Tiny ML design and Hyper Parameter Search, which the embedded community will certainly appreciate. Industrial and university experts will discuss the current and next generation tools for embedding ML in small, everyday applications. This is an excellent opportunity for attendees to learn how to deploy their models in tiny devices and ramp up ML design productivity.

Session Moderator: Danilo PAU, Technical Director, IEEE and ST Fellow, STMicroelectronics

EON Tuner: AutoML for constrained devices

Jan JONGBOOM, CTO, Edge Impulse

Abstract (English)

Finding the best machine learning model for analyzing sensor data isn’t easy. What pre-processing steps yield the best results, for example? And what signal processing parameters should you pick? The selection process is even more challenging when the resulting model needs to run on a microcontroller with significant latency, memory and power constraints. AutoML tools can help, but typically only look at the neural network, disregarding the important roles that pre-processing and signal processing play with tinyML.

The EON Tuner is here to help. It can search through thousands of combinations of pre-processing, DSP, ML and post-processing blocks (incl. custom code); and has a powerful profiler to get accurate latency and memory numbers for these models. Together they’ll help you find the optimal tinyML model within the constraints of your device. This session shows a quick intro to the EON Tuner, shows how to customize the search space, and how to evaluate models against real-world data. Because no-one *just* trusts an automatically found model!

Optimizing AutoML for the tinyML Future

Elias FALLON, Vice President for Machine Learning, Qeexo Co.

Abstract (English)

Automatic Machine Learning (AutoML) is a set of techniques applying optimization on top of machine learning hyperparameters to achieve the best ML performance. In traditional cloud-based machine learning/AI, that just means achieving the highest accuracy, often without regard for other metrics such as latency and power efficiency. In our tinyML community, we have the challenge that latency and power efficiency are often as important as absolute accuracy.

In the tinyML context, one key to achieving the best performance is taking full advantage of the unique capabilities of the inference hardware platform. AutoML tools and techniques have started to be more hardware-aware with ideas like quantization-aware training. But to truly optimize the overall system performance AutoML tools need to optimize across the full signal flow from the raw sensor to inference result.

Qeexo AutoML uses optimization and search techniques to select signal processing filters, feature extraction, and machine learning model optimizations, all aware of the unique hardware selected for the project. In this talk, we will describe the overall AutoML flow to incorporate full optimization across the sensor to inference flow. The automatic selection of signal processing filters for STMicroelectronics’ Machine Learning Core (MLC) will be detailed. The full flow for an activity detection wakeup model, executing on the MLC, as well as the expected power consumption and accuracy, will be demonstrated. AutoML tools and full signal flow optimization are key innovations to enabling the tinyML future.

Fallon,Elias-1

1 kB and not a bit more! The ideal weight for a tinyML model

Blair NEWMAN, CTO, Neuton

Abstract (English)

Nowadays, we are witnessing how tiny smart devices increasingly expand capabilities and take over all domains of our lives. This generates a huge demand for automated tools that can support machine learning pipeline design and streamline the process of embedding ML models into the smallest apps.Our team has automated the best data science practices and created a unique no-code platform, Neuton. Thanks to a patented neural network algorithm under the hood, Neuton automatically creates machine learning models optimal in size and accuracy, eliminating the need for compression, quantization, and pruning.In our 10-minute tutorial, we’ll explain how Neuton enables embedded engineers to:

automatically create compact ML models, up to 1,000x smaller than TensorFlow
embed models into memory-constrained hardware, even with 8 and 16-bit precision
perform tasks quickly and without any data science skills

We will demonstrate the capabilities of our platform and the tinyML approach by the case of determining food quality. You will learn the end-to-end process of creating a super tiny ML model, embedding it into an 8-bit sensor’s microcontroller, and monitoring the food quality based on the data from gas sensors.

Blair Newman

Model Optimization with QKeras’ Quantization-Aware Training and Vizier’s Automatic Neural Architecture Search

Daniele MORO, Software Engineer, Google

Abstract (English)

QKeras and Vizier are two open source libraries from Google that enable you to achieve huge efficiency gains for your quantized models. This talk will demonstrate how to use QKeras for advanced quantization-aware training in combination with Vizier for automated neural architecture and quantization scheme search. We will discuss how to train low-bit mixed-precision networks, how to make use of advanced quantization layers for activation calibration and efficient batch normalization, and how to make your own custom quantized layers. We will also see how to use the new Vizier API to define architecture and quantization search spaces that can be searched by hundreds of workers simultaneously through a centralized black-box optimization algorithm. We will end with some examples and case studies of the potential gains from using these tools, highlighting a paper published in Nature by a joint CERN-Google collaboration.

Moro, Daniele-1

Automated Machine Learning under model’s deployability on tiny devices

Antonio CANDELIERI, Assistant Professor, University of Milano-Bicocca, Italy

Abstract (English)

Nowadays, Automated Machine Learning and Neural Architecture Search tools are widely and successfully used to search for accurate Machine Learning models within a limited number of trials or wall-clock time. However, these tools typically run on large computational platforms – often cloud-based – without dealing with the deployability of the final Machine Learning model on a tiny device, such as a microcontroller. On the other hand, the need to deploy and run accurate Machine Learning models on tiny devices is emerging as one of the most relevant challenges, with a massive untapped market.

This talk presents an approach bridging the gap between Automated and Tiny ML. More specifically, it extends Bayesian Optimization to include the black-box constraints related to the limited hardware resources of the tiny device on which the final model is going to run. The talk provides step-by-step examples of how the approach works, considering benchmark classification tasks – with associated baseline Machine Learning models – and STMicroelectronics’s microcontrollers, proving that the approach identifies models which are both accurate and deployable, outperforming baselines within a small wall-clock time compared with the refining performed by a Machine Learning expert. Moreover, the approach does not exclude the possibility to also apply any model reduction or quantization techniques on the identified model.

Candelieri,Antonio

Automating Model Optimization for Efficient Edge AI: from automated solutions to open-source toolkit

Dave CHENG, Senior Deep Learning Researcher, Qualcomm AI Research

Abstract (English)

Edge devices including smartphones, IoT devices typically operate with stringent power and thermal budget. Running deep neural networks (DNNs) on such edge devices is extremely challenging due to DNN’s need for high memory, compute, and energy. While significant research has been dedicated to optimizing DNNs, it is still an ongoing challenge to provide techniques and tools that automate DNN optimization in user-friendly manner.
In this talk, we present the leading techniques for the automated design of DNN for edge devices. Starting from our research work “Distilling Optimal Neural Network Architectures (DONNA)” on hardware-aware neural architectures search (NAS), we show that neural architectures can be effectively shrunk to improve latency while maintaining accuracy. Furthermore, we provide automated quantization methods to enable energy efficient fixed-point inference of these optimized models. We also discuss our open-source projects such as the AI Model Efficiency Toolkit (AIMET) and AIMET Model Zoo that close the gap between research and practically useful tools, thus enabling AI community to meet efficient edge inference needs.