tinyML Asia 2021

tinyML Asia Technical Forum – Online LIVE

November 2-5, 2021

About tinyML Asia

Machine learning (ML) is at the forefront of providing artificial intelligence to all aspects of computing. It is the technology powering many of today’s advanced applications from image recognition to voice interfaces to self-driving vehicles and beyond. Many of these initial ML applications require significant computational resources most often found in cloud-scale data centers. To enable industry usage and adoption, it is therefore necessary to significantly reduce the power consumed to bring applications to end devices at the cloud edge (smartphones, wearables, vehicles, IoT devices, etc.) and to reduce the load on required data center resources.

tinyML Asia Technical Forum 2021 will be held on November 2-5, 2021 from 9 to 11:30 am (China Standard Time, UTC+8) each day. The online workshop will be focused on applications, end users, and supply chain for tiny ML from both a global and Asian perspective. Unlike other existing big industry and academic events that lack focus on low power ML solutions, tinyML events cover the entire ecosystem bringing industry and academia together.

Inaugural tinyML Asia 2020  was attended by more than 1800 people. As tinyML Community continues to grow fast, even bigger participation is expected at tinyML Asia 2021.

Contact us


周小磊 / Xiaolei “Joe” ZHOU


China Standard Time (CST) / UTC+8

9:00 am to 9:45 am

Welcome & Plenary

Plenary: ML@ExtremeEdge of Always-on Intelligent Sensor Networks

Mahesh MEHENDALE, Adjunct Professor & TI Fellow and leads the Nano-power Foundational Technology at Kilby Labs, Texas Instruments & IISc Bangalore

Abstract (English)

In Always-on intelligent IoT sensor nodes, detecting the event of interest at the End Node (Extreme Edge) as against on the Gateway or Cloud, provides significant advantages including low latency, privacy, reduced communication bandwidth and operation with no or unreliable connectivity. Deep Neural Networks (DNNs) have emerged as the promising machine learning technology for a number of such sensing applications including voice activity detection, voice command recognition, acoustic signature detection, object detection, face recognition, anomaly detection etc. working with different sensing modalities – including acoustic, image, vibration, current, voltage and others. DNNs are compute and data intensive, so implementing them on highly resource constrained (both in terms of cost and power) End Nodes while meeting latency/real-time constraints presents a huge challenge. In this talk we present system, algorithm, architecture, circuit and process technology level optimization techniques and highlight how co-optimization across all these levels is key to achieving the target two to three orders of magnitude reduction in area-power FoM.

9:45 am to 11:50 am

tinyML Hardware/Software co-design, security

Endpoint AI Revolution Driven by Standardized Computing Platform

Odin SHEN 沈綸銘, Principal Field Application Engineer, Arm

Abstract (English)

AI is a once in a generation change in computing that’s expanding the capabilities of cloud server to the tiniest IoT device. Today, most ML is still performed on Arm processors cross CPU, GPU and dedicated Neural Network Processors – NPUs. Arm has been on a mission to create the foundations to realize the opportunity of AI. Especially for TinyML, Arm provide robust open source software stack and related End-Point AI processor which can dramatically improve the efficiency and performance of systems in a world where more and more Tiny ML use case must be run. The software is ready for use and silicon is coming to the market soon, it’s time for the TinyML developer to get ready today.

Efficient on-device deep learning

Yunxin LIU , Guoqiang Professor at Institute for AI Industry Research (AIR), Tsinghua University

Abstract (English)

With the advances of hardware, software, and artificial intelligence (AI), there is a new computing paradigm shift from centralized intelligence in the cloud to distributed intelligence on the edge. In the era of edge computing, it is critical to infuse AI to empower diverse edge devices and applications. This talk overviews the challenges and opportunities of on-device deep learning and introduces our recent research work on making on-device deep-learning more efficient, focusing on how to build affordable AI models customized for diverse edge devices and how to maximize the performance of on-device model inference by fully utilizing the heterogeneous computing resources.

  • YouTube

Imaging Radars – Learning for Enhanced Vision

Ankit SHARMA, System Architect, Steradian

Abstract (English)

Modern day radar is quickly leaping ahead of being a 4-D imaging device, let alone 3-D imaging. The fine resolution offered by millimeter-wave radars has pushed the limits of imaging to 5-D, where the fifth dimension refers to the type/class of the object imaged by the radar. This brings in strong use cases of supervised/unsupervised learning to be used at an imaging radar. Though machine learning in its various forms could be used in estimating the traditional parameter set of radars such as the range, Doppler velocity and direction-of-arrival (DoA), estimating object dimensions; shapes, orientations and types opens a very relevant problem set to be solved by supervised learning. For traffic enforcement applications this could mean deciphering the vehicle type and for autonomous driving this could mean classifying the object type as road/pedestrian/car/bus and taking an appropriate action. These applications though traditional for the field of image signal processing, are quite novel to radar signal processing. As such, applications of machine learning in augmenting new dimensions to an imaging radar are crucial solution differentiators. To this end, we propose an application of machine learning to classify objects detected by the radar from an array of predefined classes. The classification algorithm runs in real-time and uses primarily the point-cloud detected by the radar and reflectivity of Electro-Magnetic (EM) waves at 80GHz as inputs. The classification rate thus obtained is quantified and shown as an accuracy measure.

  • YouTube

Hardware software co-optimizations for efficient privacy preserving computing in AIoT devices

Weifeng ZHANG, Fellow of Alibaba Cloud Intelligence and the Chief Scientist of Heterogeneous Computing, Alibaba Cloud Infrastructure

Abstract (English)

With emergence of more and more regulations on data privacy and protection, privacy preserving computing has become critical in the machine learning domain. However, existing data protection mechanisms, either through trusted execution environment (TEE) or using encryption technology such as homomorphic encryption (HE), often suffer huge performance loss due to limited computing resources dedicated to TEE or extremely complex HE algorithms. This is particularly challenging for AIoT devices with even more resource constraints. This talk will shed some lights on how to make privacy preserving computing more efficient via novel hardware software co-optimizations.

  • YouTube

Make the signal chain more intelligent and efficient with mixed signal processing and in memory computing

Hongjie LIU, Founder and CEO, Reexen

Abstract (English)

The traditional signal chain does majority signal processing after digitization which has the AD conversion bottle neck. The area and power on the AD dominates that of the whole analog front end.

The power and latency cost brought by data transfer between processing unit and the cache/dram also dominates that of the who digital processing part.

Reexen’s innovative architecture breaks down signal processing to mixed signal low level feature extraction before digitization and mixed signal high level in memory computing after digitization.

Reexen’s product can offer 1-2 orders improvement on energy consumption and 2-5 times cost improvement.

  • YouTube

11:50 am to 12:20 am

Partner Session with Edge Impulse & Syntiant

De-risking embedded machine learning projects using novel technologies

Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse

Abstract (English)

It’s estimated that 85% of machine learning projects end in failure. Since embedded machine learning comes with even more challenges, how do we avoid a situation where the majority of projects are unsuccessful? This talk introduces the design and engineering philosophy used by Edge Impulse to help ensure customer success.

Accelerate your Edge Compute with Syntiant

Mallik P. MOTURI, VP Product and Business Development, Syntiant

Abstract (English)

Syntiant Corp. is a provider of machine learning solutions making edge AI a reality for always-on applications in battery-powered devices. We introduce the Syntiant® TinyML platform, which is built upon Neural Decision Processors™ (NDP) that are purpose-built AI centric chips capable of 10-100x neural processing throughput while consuming 10-100x less power for the same task, relative to MCUs. The user-friendly architecture does not require a compiler, but models can be developed and managed using the Edge Impulse front end, and developers simply download the model package. In collaboration with Edge Impulse, Syntiant is delivering an easy-to-use platform that enables developers to develop and deploy strong machine learning algorithms making AI development accessible to anyone who wants to make their products smarter, improve privacy, reduce latency and more energy efficient.

  • YouTube

China Standard Time (CST) / UTC+8

9:00 am to 9:30 am

Plenary: Putting AI on a Diet: TinyML and Efficient Deep Learning

Song HAN, Assistant Professor, MIT EECS

Abstract (English)

Today’s AI is too big. Deep neural networks demand extraordinary levels of data and computation, and therefore power, for training and inference. In the global shortage of silicon, this severely limits the practical deployment of AI applications. I will present techniques to improve the efficiency of neural network by model compression, neural architecture search, and new design primitives. I’ll present MCUNet that enables ImageNet-scale inference on micro-controllers that have only 1MB of Flash. Next I will introduce Once-for-All Network, an efficient neural architecture search approach, that can elastically grow and shrink the model capacity according to the target hardware resource and latency constraints. Finally I’ll present new primitives for video understanding and point cloud recognition, which is the winning solution in the 3rd/4th/5th Low-Power Computer Vision Challenges and AI Driving Olympics NuScenes Segmentation Challenge. We hope such TinyML techniques can make AI greener, faster, and more accessible to everyone.

9:30 am to 11:30 am

Frameworks, Tools, tinyML for Good

Graphical Programming for TinyML, the Easiest Way to Start with Embedded ML

Huiying LAI, Application Engineer, Seeed Studio

Abstract (English)

Seeed will introduce you Graphical Programming for TinyML, the easiest way to start with embedded machine learning. Simply drag-and-drop blocks, acquire data, train and deploy models. Embedded machine learning becomes much easier and accessible to beginners when using Codecraft graphical programming and Wio Terminal. Besides the introduction, we will also use Codecraft to complete an application step by step. Hope the demo can aspire more developers to build more interesting TinyML applications in the future.

Learning compact representation with less (labelled) data from sensors

Flora SALIM , Professor, RMIT University, Melbourne, Australia

Abstract (English)

The proliferation of sensors and Internet of Things leads to new opportunities and challenges for modelling human behaviours. However, most representation learning techniques require a large amount of well-labelled training sets to achieve high performance. Due to the high expense of labelling human and/or system behaviours, approaches that require minimal to no labelled data are becoming more favourable. This motivated us to explore techniques that are data-efficient learning techniques to achieve efficient and compact representations. Approaches including domain adaptation (with minimal data) and pretraining (without labelled data) will be introduced.

  • YouTube

Tiny ONNC: unleashes your IoT device intelligent power

Peter CHANG, Co-founder and Technical Marketing Manager, Skymizer Taiwan Inc.

Abstract (English)

While AI has possessed the advantage on servers, there are still huge rooms on IoT devices. But, for IoT developers, the hardware limitation of IoT devices may hinder the IoT devices’ potential intelligent power.

Therefore, Skymizer introduces Tiny ONNC, an easy and efficient AI compiler. Tiny ONNC leverages the unique power of MLIR to support rich neural network frameworks, including PyTorch, Open Neural Network Exchange Format (ONNX), Tensorflow, Tensorflow Lite, TVM relay, and even Keras.

Tiny ONNC has abundant optimization approaches, such as automatic operator splitting and tensor splitting, addressing on memory constraints of microcontrollers. When an operator or a tensor is too big to fit in the cache, Tiny ONNC separates the big objects into small pieces and reorganizes the networks for reusing the memory. Tiny ONNC also supports operators which are not directly supported by CMSIS-NN by mathematical equivalent or approximate transformations.

These optimization approaches deliver strong empirical performances while keeping high memory utilization and high performance at the same time. On the MLPerf Tiny benchmark, Tiny ONNC achieves the same level (<2%) as TensorflowLite for Microcontrollers (TFLM) in terms of performance and precision. Under similar performance and precision, the memory footprint of the generated program is only 3/5 of TFLM and its code size is only 1/10 of TFLM in the best case. Moreover, when using aggressive optimizations, The code generated by Tiny ONNC could is up to 4.9 times faster than the code generated by TFLM.

In this talk, firstly, we will introduce Tiny ONNC and how to use Tiny ONNC. Secondly, we will dive into our optimization strategies and approaches. Finally, we will explain the experiment results to see how Tiny ONNC outperforms other competitors.

  • YouTube

Enterprise Health & Wellness using wearables

Anil BHASKARAN, Vice President APJ Innovation Office, SAP

Abstract (English)

Wearables such as Apple Watch, Fitbit etc packs so much of compute power and is able to calculate several vital parameters non-invasively. They are fundamentally changing the way users are looking at health and wellness. As a result the adoption of wearables has increased significantly over the years and a Stanford study concludes that over 54% people in US use digital health tracking. This is opening up employers to look at using wearables to promote health and wellness to elevate their employee experience. In this session, we will look at the trends, experiences, opportunities and future of health and wellness using wearables and help you formulate the strategy for your organizations.

  • YouTube

Extremely low-bit quantization for Transformers

DongSoo LEE 이동수, Executive Officer, NAVER CLOVA

Abstract (English)

The deployment of widely used Transformer architecture is challenging because of heavy computation load and memory overhead during inference, especially when the target device is limited in computational resources such as mobile or edge devices. Quantization is an effective technique to address such challenges. Our analysis shows that for a given number of quantization bits, each block of Transformer contributes to model accuracy and inference computations in different manners. Moreover, even inside an embedding block, each word presents vastly different contributions. Correspondingly, we propose a mixed precision quantization strategy to represent Transformer weights by an extremely low number of bits (e.g., under 3 bits). For example, for each word in an embedding block, we assign different quantization bits based on statistical property. We also introduce a new matrix multiplication kernel that does not require dequantization steps.

  • YouTube

11:30 am to 11:50 am

Video Posters

Imagimob – content pack for the Texas Instruments mmWave Radar Sensor IWR6843AOP

Sam Al-ATTIYAH, Head of Projects & Customers, imagimob

Abstract (English)

* Why a content pack
* What is included in a content pack
* What are the benefits for developers
Invited session. Details can be found here: https://www.imagimob.com/news/imagimob-announces-content-pack-for-texas-instruments-mmwave-radar-sensor-iwr6843aop

Bringing Big Ideas into Tiny Devices Bottoms-up Approach to Building Extremely Small Models from Neuton.ai

Blair NEWMAN, CTO, Neuton.ai

Abstract (English)

Bringing intelligence to edge devices will measure the world in new ways, providing opportunities for making smarter data-driven decisions that can change human lives for the better.

Why is our world not there yet? Why do we still face the difficulty of embedding large ML models into edge devices and evaluating model quality? Why do traditional data science algorithms fail for TinyML?

To highlight the matter from different angles, we’ll tackle the following points:

Explain why models built with traditional frameworks are not optimal in size and accuracy.
Compare traditional mathematical algorithms with a novel approach to building compact self-organizing neural networks with an excellent generalization capability.
Demonstrate this novel approach in action and explain how even non-data scientists can build predictive models in a few clicks, without any coding or loss of accuracy.
Compare the output metrics of our model with the model build with a traditional approach by using TensorFlow Lite.
At Neuton.ai, we believe that users of any tech level should be able to get actionable insights from their data effortlessly. That’s why we strive to make the process of solving real-world challenges by machine learning, super easy and intuitive.

  • YouTube

11:50 am to 12:20 am

Partner Session with SensiML & Latent AI

Create IoT Edge AI Code for 32-bit Down To 8-bit MCUs

Christopher B. ROGERS, CEO, SensiML Corp

Abstract (English)

Are you looking for ultra-low power and memory AI for your commercial, agricultural, or industrial sensor node, smarthome IoT device, or wearable design? Think your device’s architecture and resources are too limited to realistically utilize machine learning at the edge? Then this session was meant for you. We’ll cover how it’s achievable to generate pattern recognition and ML algorithms that can run in <10kB of memory with power consumption measured in microwatts. As the industry’s only solution certified by leading silicon manufacturers for use on 8-bit MCUs, you’ll learn why SensiML is your best choice for quickly deploying accurate, production-quality sensor recognition models across the a broad range of embedded platforms.

System Engineering Aspects of End-to-End tinyML

Jan ERNST, Director of AI, Latent AI

Abstract (English)

As the tinyML domain is growing and maturing, new and more complex applications become feasible and desirable. Their requirements will be less formulaic and demand flexible ways of expressing new tasks on new (or old) data. In building such systems, how does one choose the abstractions on the components of an ML pipeline from data to deployment, while being open to the outside and agnostic to the constituent parts? How can one set the stage for scaling the number of tasks, models and data without being in the way of going deep into tinyML technologies (quantize, prune, throttle, NAS, etc.)? What are caveats in composing a system from disparate components (data, model, evaluation) of varying origin? This talk
will briefly describe one approach to some of these questions in the context of building tiny models across edge devices.

  • YouTube

China Standard Time (CST) / UTC+8

9:00 am to 9:30 am

Plenary: A review of on-device fully neural end-to-end speech recognition and synthesis algorithms

Chanwoo KIM, Corporative Vice President, Samsung

Abstract (English)

In this talk, we review various end-to-end automatic speech recognition and speech synthesis algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite-State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. The inverse process of speech recognition is speech synthesis where a text sequence is converted into a waveform. Conventional speech synthesizers are usually based on parametric or concatenative approaches. Even though Text-to-Speech (TTS) systems based on the concatenative approaches have shown relatively good sound quality, they cannot be easily employed for on-device applications because of their immense size. Recently, neural speech synthesis approaches based on Tacotron and Wavenet started a new era of TTS with significantly better speech quality. More recently, vocoders based on LPCnet require significantly smaller computation than Wavenet, which makes it feasible to run these algorithms on on-device platforms. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms.

9:30 am to 11:30 am

Voice/Audio/Predictive analysis

TinyML in TmallGenie

Conggang HU 见明 见之则明, Staff Engineer, Alibaba

Abstract (English)

TmallGenie is alibaba’s smart speaker and AIOT bu.In past years,we have been focusing on the research and development of AIOT equipment.One of the main problems we faced was how to integrate AI capabilities into compact hardware devices.To solve this problem, we developed a TinyML framework to make the AI model smaller and faster.Now this easy-to-use, auto-optimized framework is applied to our entire AI capabilities(speech,cv,nlu and more) running in variety small devices,providing automatic NAS、pruning、quantization ability to made ml tiny.


Jingpeng XIANG, Product Director, Beijing Soundplus Technology Co.Ltd

Abstract (English)

Considering the limited battery capacity and processor performance of the earphone, it is extremely challenging to provide users with premier call quality on TWS earphones like calls on phone.

SoundPlus have extensively applied machine learning methods to all of speech enhancement algorithm (SVE-AI) to run on low power SoC & DSP, achieving the balance between power consumption and market-leading performance.

The SVE-AI solution has been adopted by TWS earphone products of mainstream mobile brand manufacturers and international audio brands, including TWS earphone equipped with from single microphone to four microphones.

Furthermore, SVE-AI also enhances active noise control performance on TWS and voice interaction experience. Therefore, a complete AI-enhanced audio solution can be rapid deployed on TWS and other wearable devices

  • YouTube

Lightweight visual localization with deep learning

Yihong WU, Professor, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, and at School of Artificial Intelligence, University of Chinese Academy of Sciences

Abstract (English)

Virtual reality (VR), augmented reality (AR), robotics, and autonomous driving have recently attracted much attention from the academic as well as the industrial community. Visual localization or SLAM(Simultaneous localization and mapping) plays important roles in these fields. While tremendous progress in autonomous navigation has been made in the past, many challenges remain. In this talk, I will present our recent research efforts on taking up these challenges. At first, I will give an overview of visual localization with learning, then introduce a fast Localization (or SLAM relocalization) in large scale environments by leveraging local and global CNN descriptors in parallel with cotraining both real and binary descriptors, and then introduce a flexible and efficient loop closure detection based on motion knowledge with CNN Hash codes. Also, a robust SLAM system with accurate and fast feature tracking is presented. Finally, future trends for visual localization are also shared.

Powering innovation of low-power smart speaker with Voice-Dedicated AI Chip

Shaorui HUANG, General Manager of Product R&D Center, Allwinner Technolgy

Abstract (English)

More and more application scenarios are migrating AI computing from the cloud to embedded devices. Due to the diversification of application requirements, the main problem embedded artificial intelligence hardware is facing is how to balance powerperformance and cost. In applications, we also need to achieve a unified deployment and operating environment based on different deep learning frameworks. In the past few years, we have been focusing on the research of smart speaker chip solutions. In order to deal with the challenges of the continuous increase in power consumption and performance of smart speaker products, we have developed a dedicated chip for smart speaker products, using a heterogeneous architecture of CPU+DSP+AIPU, based on deep learning frameworks, which can maximize the performance of parallel computing for voice AIand provide better performance at lower cost and power consumption.

  • YouTube

Airborne sound maintenance in remote sites using low power federated learning

Anton Kroger, Senior Director Natural Resources, SAP

Abstract (English)

In this presentation, we’ll detail the business and technical reasons in selecting TinyML for Contextualize Airborne Sound for Predictive Maintenance.  The objective of this solution is to:

  • Minimize planned & unplanned operational downtime by maximizing asset efficiency and availability.
  • Minimizing working capital for expensive spare parts holding following planned & unplanned operational downtime.
  • Minimize retrofitting expenses for upgrading existing machine infrastructure to be monitored and included in existing predictive maintenance models.

Due to the remote nature of these operations, using low power sensors and a Federated Learning approach, we can provide a solution that continuously learns and only shares scores associated to the sound data to adhere to GDPR regulation.

  • YouTube

11:30 am to 11:50 am

Video Posters

Bird Hotspots: A tinyML acoustic classification system for ecological insights

Hemanth Reddy SABBELLA, Research Assistant at NeuRonICS lab, Indian Institute of Science (IISc)

Abstract (English)

The global bird distribution and abundance are reported by birdwatchers manually going from place to place to e-bird community using a smartphone application. Creating chronological hotspots across landscapes can help to automate the identification of birds and map relevant spaces for birds. In this video poster, we present an in-house developed, low-power neuromorphic acoustic classifier system that uses a novel in-filter computing framework leveraging neuromorphic cochlea as an audio frontend. The framework uses an acoustic feature extractor based on Cascade of Asymmetric Resonators with Inner Hair Cells (CAR-IHC) that can act as SVM (Support Vector Machine) kernel, resulting in template-based SVM. The template SVM classifier is designed to be computationally efficient and can run on resource-constrained hardware. For hardware integration, we developed an easy-to-deploy framework to port this classifier into battery-powered, low-power microcontrollers. The complete system is demonstrated in real-time to detect and log bird species and their occurrences running on ARM Cortex M4 processor consuming 1.6mA of mean current. It is estimated to last for at least 2 months for detecting multiple bird species on 3 AA batteries and could be optimized to achieve backup durations up to 1-year. The logged data is used to make chronological hotspots of bird occurrences on google maps which could help understand valuable information towards species conservation.

Cyberon DSpotter: A phoneme-based local voice recognition solution

Alex LIOU, Vice President , Cyberon Corp

Abstract (English)

With speech recognition capability becoming popular in tiny, resource-constrained edge and endpoint IoT devices, Cyberon proposes an offline algorithm with a small memory footprint and low power consumption, called DSpotter, for always-on voice trigger and multiple command sets recognition. It uses phoneme-based approach, enabling developers to create a speaker-independent (SI) voice recognition model without requiring a costly training data collection process for specific commands to achieve high performance. The adoption of DNN-based modeling also ensures DSpotter has high robustness and accuracy under noise conditions even without speech preprocessing. DSpotter is equipped with more than 40 global language coverage, which facilitates developers to utilize voice technology on edge devices and get to global market in time. https://www.cyberon.com.tw/index.php?lang=en#

AI Enabled Low-Cost Stethoscope

Pratyush MALLICK, Firmaware Engineer, Analog Devices

Abstract (English)

A significant majority of the Indian population suffers from cardiovascular diseases & Respiratory diseases. Physical inspection may be a mandatory process for correct diagnosis. Stethoscope-based lung auscultation is the clinical standard for detection and treatment of respiratory disorders in a patient. Clinical signs are an integral segment of diagnosis and management of those diseases. The use of a stethoscope is however limited by the sporadic nature of data acquisition, also by the limits of human subjectivity within the recognition of symptoms. Some indications of a respiratory complication may include shortness of breath, coughing, wheezing, and laboured breathing. Unfortunately, there is a lack of objectively monitor these signs.

Diagnosis through an auscultation of the respiratory system normally requires the presence of an experienced doctor, but the most recent advances in artificial intelligence (AI) open up a possibility for the laymen to perform this procedure by himself. To achieve this, the system needs to include two main components: an algorithm for fast and accurate detection of breath phenomena in stethoscope recordings and an AI agent that interactively guides the end user through the auscultation process.

This non-invasive clinical-grade medical device which uses proprietary machine learning algorithms to identify key changes in pulmonary sounds and breathing patterns, and to notify the user about the respiratory health status of patients. The non-invasive device captures lung sounds and chest wall motion from which it extracts key features in the time and frequency domains to identify vital respiratory symptoms. Proprietary machine learning techniques, derived from state-of-the-art speech recognition algorithms, then use the characterized data to train models that automatically label areas of interest.

  • YouTube

11:50 am to 12:20 am

Partner Session with Qualcomm & arm

Always-on audio/speech network architectures, personalization, and framework

Kyuwoong HWANG, Senior Director, Qualcomm Research, Korea

Abstract (English)

We introduce efficient network architectures for audio/speech always-on applications such as keyword spotting and acoustic scene classification that leverages domain characteristics to achieve good performance with minimal memory and computation. Personalized pruning at a device with only a forward pass reduces a network when a specific person uses it, which is typical in many cases. We briefly introduce our deep machine learning framework, too.

Building and Enabling Voice Control with ARM Cortex-M

Abstract (English)

Sensory will present our Edge-AI technology suite and especially will demonstrate our VoiceHub platform, which is free for developers, to create custom wake words, phrase spotted voice control and large vocabulary grammars with NLU. All embedded voice models will be built, exported, and demonstrated on ARM Cortex-M.

  • YouTube

China Standard Time (CST) / UTC+8

9:00 am to 11:30 am

Sensor Fusion using Machine Learning: Smart Forehead Temperature Sensing

Joshua CHANG 張廷仰, Product Manager, PixArt Imaging Inc. Taiwan

Abstract (English)

Since the outbreak of the COVID-19 pandemic, measuring and recording forehead temperature has become an essential part of our daily lives. In response to the demand of an efficient and automated temperature measuring method, PixArt implemented a sensor fusion that incorporates its FIR sensor and its ultra-low-power CMOS image sensor, along with its ultra-low-power machine learning processing chip. This highly integrated solution will be able to quickly detect the presence of human beings, measure their forehead temperature, and identify if they are wearing a mask.

The talk will elaborate on the background and the application that this PixArt fusion solution can possibly help enable. More Tiny Machine Learning benefits on this application will be shared in this talk.


TinyML Heat Image Face Recognition on Wio-Terminal

Yuanhao ZOU, Senior Undergraduate Student, Southern University of Science and Technology

Abstract (English)

Secure live face recognition has long been considered a computation-demanding task, and our work explores the possibility of running face recognition on a cortex-M4 development board using a thermal imaging sensor. We use Edge-Impulse as the training platform and Wio Terminal as the computation platform to complete a low cost, lightweight and real time TinyML CV demo.

  • YouTube

An approach to dynamically integrate heterogenous AI components in a multimodal user authentication system use case

Haochen XIE 謝 昊辰; コトイ コウシン, Project Leader, Team Dragon, AnchorZ Inc.

Abstract (English)

In this talk, we will introduce our approach to a challenging task: to effectively and dynamically integrate multiple AI-backed components where each component varies in the kind of AI technologies it uses, in order to implement a single functionality — continuous multimodal user authentication.
In building our next-generation user authentication system — DZ Security —, we needed a way to effectively integrate multiple elemental authentication methods, such as facial recognization, voice recognization, touch pattern, etc., that employ very different types of AI technologies, such as DNN, RNN, analytical regression, etc., in a flexible and effective manner. We also needed the combination method to support an open set of elemental authentication methods, some of which may be provided by third parties. Furthermore, we needed to achieve a high degree of confidence that the overall system would perform well enough with regard to certain critical metrics, such as overall security ensurance and energy consumption performance. The latter is especially critical for a battery-powered device.
– Our approach tackles this challenge by firstly defining a common interface that all components must comply to, and developing a DSL (i.e. domain specific language) in which an “fusion” or “integration” program shall be written. The component interface contains unified APIs for invocation of the components, and provides access to performance metrics of each component. Upon the DSL, we then built a framework to make sure that the final system always meets a predefined minimal performance requirements expressed in a few key metrics, such as security risk indicators (e.g. estimated false acceptance rate) and power consumption estimations. This framework also essentially reduces the degree of freedom of the integration program to the equivalent of writing a dynamic strategy that decides when and how each available component should be invoked; where a “smarter” strategy will achieve a higher “score” (e.g. a lower false rejection rate), and no strategy could ever break the predefined requirements. Therefore we can aggressively optimize the component invocation strategy fearlessly without worrying about breaking the minimal performance requirements. The DSL also include a simulator that could be used to evaluate the performance of an integration program in simulated deployment situations, alongside a toolchain to compile for execution on different platforms. We could then use the simulator to guide writing the best strategies, utilizing either human intelligence or artificial intelligence, or both combined.
We hope the sharing of our approach provides hints to others who need to implement similar systems.

A lightweight face detection method working with Himax Ultra-Low Power WE-I Plus AI Processor

Justin KAO , Master Student of Electrical Engineering, National Cheng Kung University in Taiwan

Abstract (English)

For longer battery life, an always-on sensing product often has constrained system memory and computing power. Constraints in tinyML can make product design difficult, and finding a computer-vision solution that meets all requirements isn’t easy. In this talk, we will share with you the experience in carrying out the face detection function on the basis of Himax WE-I Plus platform. Our implementation not only represents a real-time application, but also shows a certain degree of accuracy in most face detection scenarios.

H3Dynamics powers Smart cities to get infrastructure-safe in a smarter way

Eric FEDDAL, Chief Revenue Officer, H3 Dynamics Holdings

Abstract (English)

H3Dynamics digitizes audit & inspections across various industries to offer actionable intelligence enabling rectification work
Our actionable AI value proposition focuses on O&M, we addressed immediate infrastructure pain points: Traditional inspections are manual, involves professional working at heights and finish with a lengthy report generation process … and since we are digital natives, H3Dynamics does offer API integration opportunity into customer ERP software to automatically send work orders and initiate rectification workflows.
If your customer values a TCO reduction and an innovation roadmap immediately actionable: why would you wait… do reach out to H3Dynamics?

An Introduction about Always On Vision(AONV) Sensor and its Trend

YY SUNG 宋尤昱, Associate Vice President, Himax Imaging Inc.

Abstract (English)

In the general image, CMOS image sensor(CIS) is usually used in photo and video recording. Except these traditional usages, CIS using in computer vision and machine learning become more and more popular. Since AIoT application become more noticeable and low power AI processor become more and more visible. The low power edge device with battery to operate long life without re-charging coming into our life. These kinds of smart detection devices become more realistic and improve the user experience to make our life to be more convenient. In such kind of application, low power will be the key. Except sensor itself needs to be low power, to make the whole sub-system to be lower power is also important. Since CIS is the first stage of such detection system, it should take an important role to smart wake up the whole system and let the wake up process to be efficient. How CIS to play this role will be shown in this talk.

11:30 am to 11:50 am

Video Posters

Fixed complexity tiny reservoir heterogeneous network for on-device ECG learning of anomalies

Danilo PAU, Technical Director, IEEE and ST Fellow, STMicroelectronics Italia

Abstract (English)

The electrocardiogram (ECG), being one of the most extensively used signals to monitor cardiovascular diseases (CVDs), captures the heart’s arrhythmias. Patients with such pathology are often monitored for extended periods of time, requiring data storage, and a very time-consuming off-line search of anomalies. This is especially inefficient when indicative patterns in the biological signals are infrequent, requiring more analysis time of medical doctors, and entailing a difficult visual search task for the diagnosis.

We propose an automated hybrid deep learning and machine learning pipeline based on reservoir computing (RC), followed by principal component analysis (PCA) and one-class support vector machine (OC-SVM). This machine learning pipeline can be used to perform on device personalized learning and real-time anomaly detection of pathological conditions and therefore enable an application to raise warnings.

The on-device learning step requires fixed computational complexity, latency and memory to fit into an off-the-shelf low-power microcontroller (MCU). During the learning phase, it uses a very limited amount of normal input data, e.g., 10,000 bytes, which makes this work suitable for fast personalization every time device restart is required also due to a change in carry position. The detection accuracy has been evaluated on the publicly available MIT-BIH arrhythmia dataset. This dataset originally segmented into individual heartbeats, was also modified to mimic temporal sliding of input tensors on the ECG streaming data. Best F1 score and accuracy are 91.5%, 95.4% respectively, with variance over the processed data of 0.05. On MCU, the learning can run within a latency of 83 seconds at 360Hz sampling frequency and at initialization time and achieves: 43 seconds on device learning, 2 inferences per second (estimated through multiply and accumulate operations) on STM32 M4 at 80MHz. While on STM32 M7, 480MHz, the learning takes less than 5 seconds and achieves 19 inferences per second.

We concluded that recurrent reservoir networks combined with PCA, OCSVM machine learning modules achieve adequate performances both in terms of detection accuracy and execution time, that are competitive to those obtained by more complex models like LSTM based via backpropagation methods (requiring up to 13 minutes), while requiring only few seconds to be trained online.

This makes this work suitable for on-device learning with fast personalization for real-time embedded applicat

Efficient inference of low-resolution optic flow on low power neuromorphic hardware

Felix BAUER, R&D Engineer, SynSense

Abstract (English)

Motion can be inferred from visual scenes by determining the optic flow. For tasks like ego-motion regression the spatial resolution of the optic flow field can be very low. This work presents a Spiking Neural Network (SNN) that infers a low-resolution optic flow field from Dynamic Vision Sensor (DVS) input. Based on a non-leaky Integrate-and-Fire (IAF) neuron model, it can be deployed on neuromorphic hardware such as Speck. Combining an event-based sensor with a low power event-driven asynchronous processor, Speck is suitable for efficient use in autonomous agents, such as drones, where low energy consumption is crucial. The model is trained and tested on a standard PC with data from drone-mounted DVS sensors and then deployed on Speck. We demonstrate that the network is capable of performing the highly temporal task of low-dimensional optic flow inference, making use of the temporal nature of the visual input in streaming mode. The approach therefore constitutes a viable alternative or extension to IMU based motion estimation for drones.

Plant Growth and LAI Estimation using quantized Embedded Regression models for high throughput phenotyping

Dhruv Sheth, Intern, Edge Impulse

Abstract (English)

Due to the influence of climate change, and due to it’s unpredictable nature, majority of agricultural crops have been affected in terms of production and maintenance. Hybrid and cost-effective crops are making their way into the market, but monitoring factors which affect the increase in yield of these crops, and conditions favorable for growth have to be manually monitored and structured to yield high throughput. Farmers are showing transition from traditional means to hydroponic systems for growing annual and perennial crops. These crop arrays possess growth patterns which depend on environmental growth conditions in the hydroponic units. Semi-autonomous systems which monitor these growth may prove to be beneficial, reduce costs and maintenance efforts, and also predict future yield beforehand to get an idea on how the crop would perform. These systems are also effective in understanding crop drools and wilt/diseases from visual systems and traits of plants.Forecasting or predicting the crop yield well ahead of its harvest time would assist the strategists and farmers for taking suitable measures for selling and storage. Accurate prediction of crop development stages plays an important role in crop production management. In this article, I~propose an Embedded Machine Learning approach to predicting crop yield and biomass estimation of crops using an Image based Regression approach using EdgeImpulse that runs on Edge system, Sony Spresense, in real time. This utilizes few of the 6 Cortex M4F cores provided in the Sony Spresense board for Image processing, inferencing and predicting a regression output in real time. This system uses Image processing to analyze the plant in a semi-autonomous environment and predict the numerical serial of the biomass allocated to the plant growth. This numerical serial contains a threshold of biomass which is then predicted for the plant. The biomass output is then also processed through a linear regression model to analyze efficacy and compared with the ground truth to identify pattern of growth. The image Regression and linear regression model contribute to an algorithm which is finally used to test and predict biomass for each plant semi-autonomously.

I’ll also attach my paper corresponding to the Abstract to get an in-depth understanding on the topic.

11:50 am to 12:05 pm

Partner Session with SynSense

SPECK– A Low power, low latency neuromorphic visual solution in a single chip

Yannan XING , Senior Algorithms Engineer, SynSense

Abstract (English)

While DNNs with accelerators have demonstrated remarkable performance in visual tasks, this gain comes with undesired costs — e.g. redundant computation and long processing latency. This is especially undesirable in the field of IoTs and other edge computing applications.

Neuromorphic Computing takes inspiration from efficient computation in biological neural networks, to build low-power ML applications. By using special-purpose computing architectures, neuromorphic computing introduces significant benefits for low-power and low-latency processing.

SynSense has developed a commercial neuromorphic smart vision sensor —SPECK — which integrates an event-based Dynamic Vision Sensor (DVS) and Neuromorphic Computing cores into an single-die SoC. SPECK provides end-to-end ultra low power, real-time solutions for application scenarios such as smart home, surveillance, and AI toys.

Schedule subject to change without notice.


Wei Xiao




Qualcomm Research, USA


Himax Technologies

Sean KIM

LG Electronics CTO AI Lab

Joo-Young KIM




Eric PAN

Seeed Studio and Chaihuo makerspace


Arm China


Indian Institute of Science (IISc)

Jacky XIE


Shouyi YIN 尹首

Tsinghua University


Tsinghua University




Joshua CHANG 張廷仰

PixArt Imaging Inc. Taiwan


Skymizer Taiwan Inc.


H3 Dynamics Holdings

Song HAN


Conggang HU 见明 见之则明


Shaorui HUANG

Allwinner Technolgy

Kyuwoong HWANG

Qualcomm Research, Korea

Justin KAO

National Cheng Kung University in Taiwan

Chanwoo KIM


Anton Kroger


Huiying LAI

Seeed Studio

DongSoo LEE 이동수


Hongjie LIU


Yunxin LIU

Tsinghua University


Texas Instruments & IISc Bangalore

Mallik P. MOTURI



RMIT University, Melbourne, Australia



Odin SHEN 沈綸銘



Himax Imaging Inc.

Yihong WU

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, and at School of Artificial Intelligence, University of Chinese Academy of Sciences

Jingpeng XIANG

Beijing Soundplus Technology Co.Ltd

Haochen XIE 謝 昊辰; コトイ コウシン

AnchorZ Inc.

Yannan XING


Weifeng ZHANG

Alibaba Cloud Infrastructure

Yuanhao ZOU

Southern University of Science and Technology


( Click on a logo to get more information)