tinyML Asia 2021

tinyML Asia Technical Forum – Online LIVE

November 2-5, 2021

About tinyML Asia

Machine learning (ML) is at the forefront of providing artificial intelligence to all aspects of computing. It is the technology powering many of today’s advanced applications from image recognition to voice interfaces to self-driving vehicles and beyond. Many of these initial ML applications require significant computational resources most often found in cloud-scale data centers. To enable industry usage and adoption, it is therefore necessary to significantly reduce the power consumed to bring applications to end devices at the cloud edge (smartphones, wearables, vehicles, IoT devices, etc.) and to reduce the load on required data center resources.

tinyML Asia Technical Forum 2021 will be held on November 2-5, 2021 from 9 to 11:30 am (China Standard Time, UTC+8) each day. The online workshop will be focused on applications, end users, and supply chain for tiny ML from both a global and Asian perspective. Unlike other existing big industry and academic events that lack focus on low power ML solutions, tinyML events cover the entire ecosystem bringing industry and academia together.

Inaugural tinyML Asia 2020  was attended by more than 1800 people. As tinyML Community continues to grow fast, even bigger participation is expected at tinyML Asia 2021.

Contact us


周小磊 / Xiaolei “Joe” ZHOU


China Standard Time (CST) / UTC+8

9:00 am to 9:45 am

Welcome & Plenary

Plenary: ML@ExtremeEdge of Always-on Intelligent Sensor Networks

Mahesh MEHENDALE, Adjunct Professor & TI Fellow and leads the Nano-power Foundational Technology at Kilby Labs, Texas Instruments & IISc Bangalore

Abstract (English)

In Always-on intelligent IoT sensor nodes, detecting the event of interest at the End Node (Extreme Edge) as against on the Gateway or Cloud, provides significant advantages including low latency, privacy, reduced communication bandwidth and operation with no or unreliable connectivity. Deep Neural Networks (DNNs) have emerged as the promising machine learning technology for a number of such sensing applications including voice activity detection, voice command recognition, acoustic signature detection, object detection, face recognition, anomaly detection etc. working with different sensing modalities – including acoustic, image, vibration, current, voltage and others. DNNs are compute and data intensive, so implementing them on highly resource constrained (both in terms of cost and power) End Nodes while meeting latency/real-time constraints presents a huge challenge. In this talk we present system, algorithm, architecture, circuit and process technology level optimization techniques and highlight how co-optimization across all these levels is key to achieving the target two to three orders of magnitude reduction in area-power FoM.

9:45 am to 11:30 am

tinyML Hardware/Software co-design, security

Endpoint AI Revolution Driven by Standardized Computing Platform

Odin SHEN 沈綸銘, Principal Field Application Engineer, Arm

Abstract (English)

AI is a once in a generation change in computing that’s expanding the capabilities of cloud server to the tiniest IoT device. Today, most ML is still performed on Arm processors cross CPU, GPU and dedicated Neural Network Processors – NPUs. Arm has been on a mission to create the foundations to realize the opportunity of AI. Especially for TinyML, Arm provide robust open source software stack and related End-Point AI processor which can dramatically improve the efficiency and performance of systems in a world where more and more Tiny ML use case must be run. The software is ready for use and silicon is coming to the market soon, it’s time for the TinyML developer to get ready today.

Efficient on-device deep learning

Yunxin LIU , Guoqiang Professor at Institute for AI Industry Research (AIR), Tsinghua University

Abstract (English)

With the advances of hardware, software, and artificial intelligence (AI), there is a new computing paradigm shift from centralized intelligence in the cloud to distributed intelligence on the edge. In the era of edge computing, it is critical to infuse AI to empower diverse edge devices and applications. This talk overviews the challenges and opportunities of on-device deep learning and introduces our recent research work on making on-device deep-learning more efficient, focusing on how to build affordable AI models customized for diverse edge devices and how to maximize the performance of on-device model inference by fully utilizing the heterogeneous computing resources.

Imaging Radars – Learning for Enhanced Vision

Ankit SHARMA, System Architect, Steradian

Abstract (English)

Modern day radar is quickly leaping ahead of being a 4-D imaging device, let alone 3-D imaging. The fine resolution offered by millimeter-wave radars has pushed the limits of imaging to 5-D, where the fifth dimension refers to the type/class of the object imaged by the radar. This brings in strong use cases of supervised/unsupervised learning to be used at an imaging radar. Though machine learning in its various forms could be used in estimating the traditional parameter set of radars such as the range, Doppler velocity and direction-of-arrival (DoA), estimating object dimensions; shapes, orientations and types opens a very relevant problem set to be solved by supervised learning. For traffic enforcement applications this could mean deciphering the vehicle type and for autonomous driving this could mean classifying the object type as road/pedestrian/car/bus and taking an appropriate action. These applications though traditional for the field of image signal processing, are quite novel to radar signal processing. As such, applications of machine learning in augmenting new dimensions to an imaging radar are crucial solution differentiators. To this end, we propose an application of machine learning to classify objects detected by the radar from an array of predefined classes. The classification algorithm runs in real-time and uses primarily the point-cloud detected by the radar and reflectivity of Electro-Magnetic (EM) waves at 80GHz as inputs. The classification rate thus obtained is quantified and shown as an accuracy measure.

Hardware software co-optimizations for efficient privacy preserving computing in AIoT devices

Weifeng ZHANG, Fellow of Alibaba Cloud Intelligence and the Chief Scientist of Heterogeneous Computing, Alibaba Cloud Infrastructure

Abstract (English)

With emergence of more and more regulations on data privacy and protection, privacy preserving computing has become critical in the machine learning domain. However, existing data protection mechanisms, either through trusted execution environment (TEE) or using encryption technology such as homomorphic encryption (HE), often suffer huge performance loss due to limited computing resources dedicated to TEE or extremely complex HE algorithms. This is particularly challenging for AIoT devices with even more resource constraints. This talk will shed some lights on how to make privacy preserving computing more efficient via novel hardware software co-optimizations.

Make the signal chain more intelligent and efficient with mixed signal processing and in memory computing

Hongjie LIU, Founder and CEO, Reexen

Abstract (English)

The traditional signal chain does majority signal processing after digitization which has the AD conversion bottle neck. The area and power on the AD dominates that of the whole analog front end.

The power and latency cost brought by data transfer between processing unit and the cache/dram also dominates that of the who digital processing part.

Reexen’s innovative architecture breaks down signal processing to mixed signal low level feature extraction before digitization and mixed signal high level in memory computing after digitization.

Reexen’s product can offer 1-2 orders improvement on energy consumption and 2-5 times cost improvement.

China Standard Time (CST) / UTC+8

9:00 am to 9:30 am

Plenary: Putting AI on a Diet: TinyML and Efficient Deep Learning

Song HAN, Assistant Professor, MIT EECS

Abstract (English)

Today’s AI is too big. Deep neural networks demand extraordinary levels of data and computation, and therefore power, for training and inference. In the global shortage of silicon, this severely limits the practical deployment of AI applications. I will present techniques to improve the efficiency of neural network by model compression, neural architecture search, and new design primitives. I’ll present MCUNet that enables ImageNet-scale inference on micro-controllers that have only 1MB of Flash. Next I will introduce Once-for-All Network, an efficient neural architecture search approach, that can elastically grow and shrink the model capacity according to the target hardware resource and latency constraints. Finally I’ll present new primitives for video understanding and point cloud recognition, which is the winning solution in the 3rd/4th/5th Low-Power Computer Vision Challenges and AI Driving Olympics NuScenes Segmentation Challenge. We hope such TinyML techniques can make AI greener, faster, and more accessible to everyone.

9:30 am to 11:30 am

Frameworks, Tools, tinyML for Good

Graphical Programming for TinyML, the Easiest Way to Start with Embedded ML

Huiying LAI, Application Engineer, Seeed Studio

Abstract (English)

Seeed will introduce you Graphical Programming for TinyML, the easiest way to start with embedded machine learning. Simply drag-and-drop blocks, acquire data, train and deploy models. Embedded machine learning becomes much easier and accessible to beginners when using Codecraft graphical programming and Wio Terminal. Besides the introduction, we will also use Codecraft to complete an application step by step. Hope the demo can aspire more developers to build more interesting TinyML applications in the future.

Learning compact representation with less (labelled) data from sensors

Flora SALIM , Professor, RMIT University, Melbourne, Australia

Abstract (English)

The proliferation of sensors and Internet of Things leads to new opportunities and challenges for modelling human behaviours. However, most representation learning techniques require a large amount of well-labelled training sets to achieve high performance. Due to the high expense of labelling human and/or system behaviours, approaches that require minimal to no labelled data are becoming more favourable. This motivated us to explore techniques that are data-efficient learning techniques to achieve efficient and compact representations. Approaches including domain adaptation (with minimal data) and pretraining (without labelled data) will be introduced.

Tiny ONNC: unleashes your IoT device intelligent power

Peter CHANG, Co-founder and Technical Marketing Manager, Skymizer Taiwan Inc.

Abstract (English)

While AI has possessed the advantage on servers, there are still huge rooms on IoT devices. But, for IoT developers, the hardware limitation of IoT devices may hinder the IoT devices’ potential intelligent power.

Therefore, Skymizer introduces Tiny ONNC, an easy and efficient AI compiler. Tiny ONNC leverages the unique power of MLIR to support rich neural network frameworks, including PyTorch, Open Neural Network Exchange Format (ONNX), Tensorflow, Tensorflow Lite, TVM relay, and even Keras.

Tiny ONNC has abundant optimization approaches, such as automatic operator splitting and tensor splitting, addressing on memory constraints of microcontrollers. When an operator or a tensor is too big to fit in the cache, Tiny ONNC separates the big objects into small pieces and reorganizes the networks for reusing the memory. Tiny ONNC also supports operators which are not directly supported by CMSIS-NN by mathematical equivalent or approximate transformations.

These optimization approaches deliver strong empirical performances while keeping high memory utilization and high performance at the same time. On the MLPerf Tiny benchmark, Tiny ONNC achieves the same level (<2%) as TensorflowLite for Microcontrollers (TFLM) in terms of performance and precision. Under similar performance and precision, the memory footprint of the generated program is only 3/5 of TFLM and its code size is only 1/10 of TFLM in the best case. Moreover, when using aggressive optimizations, The code generated by Tiny ONNC could is up to 4.9 times faster than the code generated by TFLM.

In this talk, firstly, we will introduce Tiny ONNC and how to use Tiny ONNC. Secondly, we will dive into our optimization strategies and approaches. Finally, we will explain the experiment results to see how Tiny ONNC outperforms other competitors.

Enterprise Health & Wellness using wearables

Anil BHASKARAN, Vice President APJ Innovation Office, SAP

Abstract (English)

Wearables such as Apple Watch, Fitbit etc packs so much of compute power and is able to calculate several vital parameters non-invasively. They are fundamentally changing the way users are looking at health and wellness. As a result the adoption of wearables has increased significantly over the years and a Stanford study concludes that over 54% people in US use digital health tracking. This is opening up employers to look at using wearables to promote health and wellness to elevate their employee experience. In this session, we will look at the trends, experiences, opportunities and future of health and wellness using wearables and help you formulate the strategy for your organizations.

Extremely low-bit quantization for Transformers

DongSoo LEE 이동수, Executive Officer, NAVER CLOVA

Abstract (English)

The deployment of widely used Transformer architecture is challenging because of heavy computation load and memory overhead during inference, especially when the target device is limited in computational resources such as mobile or edge devices. Quantization is an effective technique to address such challenges. Our analysis shows that for a given number of quantization bits, each block of Transformer contributes to model accuracy and inference computations in different manners. Moreover, even inside an embedding block, each word presents vastly different contributions. Correspondingly, we propose a mixed precision quantization strategy to represent Transformer weights by an extremely low number of bits (e.g., under 3 bits). For example, for each word in an embedding block, we assign different quantization bits based on statistical property. We also introduce a new matrix multiplication kernel that does not require dequantization steps.

China Standard Time (CST) / UTC+8

9:00 am to 9:30 am

Plenary: A review of on-device fully neural end-to-end speech recognition and synthesis algorithms

Chanwoo KIM, Corporative Vice President, Samsung

Abstract (English)

In this talk, we review various end-to-end automatic speech recognition and speech synthesis algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite-State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. The inverse process of speech recognition is speech synthesis where a text sequence is converted into a waveform. Conventional speech synthesizers are usually based on parametric or concatenative approaches. Even though Text-to-Speech (TTS) systems based on the concatenative approaches have shown relatively good sound quality, they cannot be easily employed for on-device applications because of their immense size. Recently, neural speech synthesis approaches based on Tacotron and Wavenet started a new era of TTS with significantly better speech quality. More recently, vocoders based on LPCnet require significantly smaller computation than Wavenet, which makes it feasible to run these algorithms on on-device platforms. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms.

9:30 am to 11:30 am

Voice/Audio/Predictive analysis

TinyML in TmallGenie

Conggang HU 见明 见之则明, Staff Engineer, Alibaba

Abstract (English)

TmallGenie is alibaba’s smart speaker and AIOT bu.In past years,we have been focusing on the research and development of AIOT equipment.One of the main problems we faced was how to integrate AI capabilities into compact hardware devices.To solve this problem, we developed a TinyML framework to make the AI model smaller and faster.Now this easy-to-use, auto-optimized framework is applied to our entire AI capabilities(speech,cv,nlu and more) running in variety small devices,providing automatic NAS、pruning、quantization ability to made ml tiny.


Jingpeng XIANG, Product Director, Beijing Soundplus Technology Co.Ltd

Abstract (English)

Considering the limited battery capacity and processor performance of the earphone, it is extremely challenging to provide users with premier call quality on TWS earphones like calls on phone.

SoundPlus have extensively applied machine learning methods to all of speech enhancement algorithm (SVE-AI) to run on low power SoC & DSP, achieving the balance between power consumption and market-leading performance.

The SVE-AI solution has been adopted by TWS earphone products of mainstream mobile brand manufacturers and international audio brands, including TWS earphone equipped with from single microphone to four microphones.

Furthermore, SVE-AI also enhances active noise control performance on TWS and voice interaction experience. Therefore, a complete AI-enhanced audio solution can be rapid deployed on TWS and other wearable devices

Lightweight visual localization with deep learning

Yihong WU, Professor, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, and at School of Artificial Intelligence, University of Chinese Academy of Sciences

Abstract (English)

Virtual reality (VR), augmented reality (AR), robotics, and autonomous driving have recently attracted much attention from the academic as well as the industrial community. Visual localization or SLAM(Simultaneous localization and mapping) plays important roles in these fields. While tremendous progress in autonomous navigation has been made in the past, many challenges remain. In this talk, I will present our recent research efforts on taking up these challenges. At first, I will give an overview of visual localization with learning, then introduce a fast Localization (or SLAM relocalization) in large scale environments by leveraging local and global CNN descriptors in parallel with cotraining both real and binary descriptors, and then introduce a flexible and efficient loop closure detection based on motion knowledge with CNN Hash codes. Also, a robust SLAM system with accurate and fast feature tracking is presented. Finally, future trends for visual localization are also shared.

Airborne sound maintenance in remote sites using low power federated learning

Anton Kroger, Senior Director Natural Resources, SAP

Abstract (English)

In this presentation, we’ll detail the business and technical reasons in selecting TinyML for Contextualize Airborne Sound for Predictive Maintenance.  The objective of this solution is to:

  • Minimize planned & unplanned operational downtime by maximizing asset efficiency and availability.
  • Minimizing working capital for expensive spare parts holding following planned & unplanned operational downtime.
  • Minimize retrofitting expenses for upgrading existing machine infrastructure to be monitored and included in existing predictive maintenance models.

Due to the remote nature of these operations, using low power sensors and a Federated Learning approach, we can provide a solution that continuously learns and only shares scores associated to the sound data to adhere to GDPR regulation.

China Standard Time (CST) / UTC+8

9:00 am to 11:30 am

Sensor Fusion using Machine Learning: Smart Forehead Temperature Sensing

Joshua CHANG 張廷仰, Product Manager, PixArt Imaging Inc. Taiwan

Abstract (English)

Since the outbreak of the COVID-19 pandemic, measuring and recording forehead temperature has become an essential part of our daily lives. In response to the demand of an efficient and automated temperature measuring method, PixArt implemented a sensor fusion that incorporates its FIR sensor and its ultra-low-power CMOS image sensor, along with its ultra-low-power machine learning processing chip. This highly integrated solution will be able to quickly detect the presence of human beings, measure their forehead temperature, and identify if they are wearing a mask.

The talk will elaborate on the background and the application that this PixArt fusion solution can possibly help enable. More Tiny Machine Learning benefits on this application will be shared in this talk.


TinyML Heat Image Face Recognition on Wio-Terminal

Yuanhao ZOU, Senior Undergraduate Student, Southern University of Science and Technology

Abstract (English)

Secure live face recognition has long been considered a computation-demanding task, and our work explores the possibility of running face recognition on a cortex-M4 development board using a thermal imaging sensor. We use Edge-Impulse as the training platform and Wio Terminal as the computation platform to complete a low cost, lightweight and real time TinyML CV demo.

An approach to dynamically integrate heterogenous AI components in a multimodal user authentication system use case

Haochen XIE 謝 昊辰; コトイ コウシン, Project Leader, Team Dragon, AnchorZ Inc.

Abstract (English)

In this talk, we will introduce our approach to a challenging task: to effectively and dynamically integrate multiple AI-backed components where each component varies in the kind of AI technologies it uses, in order to implement a single functionality — continuous multimodal user authentication.
In building our next-generation user authentication system — DZ Security —, we needed a way to effectively integrate multiple elemental authentication methods, such as facial recognization, voice recognization, touch pattern, etc., that employ very different types of AI technologies, such as DNN, RNN, analytical regression, etc., in a flexible and effective manner. We also needed the combination method to support an open set of elemental authentication methods, some of which may be provided by third parties. Furthermore, we needed to achieve a high degree of confidence that the overall system would perform well enough with regard to certain critical metrics, such as overall security ensurance and energy consumption performance. The latter is especially critical for a battery-powered device.
– Our approach tackles this challenge by firstly defining a common interface that all components must comply to, and developing a DSL (i.e. domain specific language) in which an “fusion” or “integration” program shall be written. The component interface contains unified APIs for invocation of the components, and provides access to performance metrics of each component. Upon the DSL, we then built a framework to make sure that the final system always meets a predefined minimal performance requirements expressed in a few key metrics, such as security risk indicators (e.g. estimated false acceptance rate) and power consumption estimations. This framework also essentially reduces the degree of freedom of the integration program to the equivalent of writing a dynamic strategy that decides when and how each available component should be invoked; where a “smarter” strategy will achieve a higher “score” (e.g. a lower false rejection rate), and no strategy could ever break the predefined requirements. Therefore we can aggressively optimize the component invocation strategy fearlessly without worrying about breaking the minimal performance requirements. The DSL also include a simulator that could be used to evaluate the performance of an integration program in simulated deployment situations, alongside a toolchain to compile for execution on different platforms. We could then use the simulator to guide writing the best strategies, utilizing either human intelligence or artificial intelligence, or both combined.
We hope the sharing of our approach provides hints to others who need to implement similar systems.

A lightweight face detection method working with Himax Ultra-Low Power WE-I Plus AI Processor

Justin KAO , Master Student of Electrical Engineering, National Cheng Kung University in Taiwan

Abstract (English)

The talk will elaborate on the background of this PixArt fusion device, which fulfills TinyML’s goal to realize machine-vision applications with the least consumption of power and cost – and is exactly what PixArt has to offer. By combining PixArt’s ultra-low-power CMOS image sensor and ultra-low-power neural-network processing chip together, it allows the temperature measuring device to maintain its high accuracy without an additional heat sink or fan. PixArt’s processing chip can also save the cost for data computing and transferring by directly output readily usable information (not the bulky image data), making it more friendly for the cloud AIoT environment. We also managed to stretch the measuring range of our 8×8 thermal array sensor to 1 meter to maximize the cost-effectiveness of the device. More TinyML benefits on this application will be shared in this talk.

H3Dynamics powers Smart cities to get infrastructure-safe in a smarter way

Eric FEDDAL, Chief Revenue Officer, H3 Dynamics Holdings

Abstract (English)

H3Dynamics digitizes audit & inspections across various industries to offer actionable intelligence enabling rectification work
Our actionable AI value proposition focuses on O&M, we addressed immediate infrastructure pain points: Traditional inspections are manual, involves professional working at heights and finish with a lengthy report generation process … and since we are digital natives, H3Dynamics does offer API integration opportunity into customer ERP software to automatically send work orders and initiate rectification workflows.
If your customer values a TCO reduction and an innovation roadmap immediately actionable: why would you wait… do reach out to H3Dynamics?

An Introduction about Always On Vision(AONV) Sensor and its Trend

YY SUNG 宋尤昱, Associate Vice President, Himax Imaging Inc.

Abstract (English)

In the general image, CMOS image sensor(CIS) is usually used in photo and video recording. Except these traditional usages, CIS using in computer vision and machine learning become more and more popular. Since AIoT application become more noticeable and low power AI processor become more and more visible. The low power edge device with battery to operate long life without re-charging coming into our life. These kinds of smart detection devices become more realistic and improve the user experience to make our life to be more convenient. In such kind of application, low power will be the key. Except sensor itself needs to be low power, to make the whole sub-system to be lower power is also important. Since CIS is the first stage of such detection system, it should take an important role to smart wake up the whole system and let the wake up process to be efficient. How CIS to play this role will be shown in this talk.

Schedule subject to change without notice.


Wei Xiao




Qualcomm Research, USA


Himax Technologies

Sean KIM

LG Electronics CTO AI Lab

Joo-Young KIM




Eric PAN

Seeed Studio and Chaihuo makerspace


Arm China


Indian Institute of Science (IISc)

Jacky XIE


Shouyi YIN 尹首

Tsinghua University


Tsinghua University




Joshua CHANG 張廷仰

PixArt Imaging Inc. Taiwan


Skymizer Taiwan Inc.


H3 Dynamics Holdings

Song HAN


Conggang HU 见明 见之则明


Kyuwoong HWANG

Qualcomm Research, Korea

Justin KAO

National Cheng Kung University in Taiwan

Chanwoo KIM


Anton Kroger


Huiying LAI

Seeed Studio

DongSoo LEE 이동수


Hongjie LIU


Yunxin LIU

Tsinghua University


Texas Instruments & IISc Bangalore

Mallik P. MOTURI



RMIT University, Melbourne, Australia



Odin SHEN 沈綸銘



Himax Imaging Inc.

Yihong WU

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, and at School of Artificial Intelligence, University of Chinese Academy of Sciences

Jingpeng XIANG

Beijing Soundplus Technology Co.Ltd

Haochen XIE 謝 昊辰; コトイ コウシン

AnchorZ Inc.

Weifeng ZHANG

Alibaba Cloud Infrastructure

Yuanhao ZOU

Southern University of Science and Technology


( Click on a logo to get more information)