November 16-19, 2020

About tinyML Asia

Machine learning (ML) is at the forefront of providing artificial intelligence to all aspects of computing. It is the technology powering many of today’s advanced applications from image recognition to voice interfaces to self-driving vehicles and beyond. Many of these initial ML applications require significant computational resources most often found in cloud-scale data centers. To enable industry usage and adoption, it is therefore necessary to significantly reduce the power consumed to bring applications to end devices at the cloud edge (smartphones, wearables, vehicles, IoT devices, etc.) and to reduce the load on required data center resources.

tinyML Asia Technical Forum 2020 will be the first tinyML “regional” event and will be held on November 16-19, 2020 from 9 to 11:30 am (China Standard Time, UTC+8) each day. The online workshop will be focused on applications, end users, and supply chain for tiny ML from both a global and Asian perspective. Unlike other existing big industry and academic events that lack focus on low power ML solutions, tinyML events cover the entire ecosystem bringing industry and academia together.

Contact us

Rosina HABERL

enohP

liaM

周小磊 / Xiaolei “Joe” ZHOU

enohP

liaM

Schedule

Speakers

Commitee

Schedule

China Standard Time (CST) / UTC+8

9:00 am to 11:30 am

State of the Art

Teaching Old Sensors New Tricks: the Algorithms Underpinning TinyML

从数据到智慧，设计传感器解决方案

Jan JONGBOOM, CTO, Edge Impulse

Abstract (English)

TinyML promises us to teach the smallest of devices to feel, hear and see things – an amazing feat. But Machine Learning used to be the domain of large data centers full of servers, so how do we suddenly run these algorithms on battery-powered devices with kilobytes of RAM? In this talk we’ll look at the technologies that underpins TinyML, how traditional signal processing is still incredibly helpful, and at the state-of-the-art algorithms for processing vibration, biosignal, audio and image data directly on your sensors.

摘要 (Chinese)

tinyML 能实现以最小设备进行感知、倾听并看到—这真了不起！机器学习一直被用于满是大型数据中心的领域，而我们该如何突然在仅有数千字节 RAM 的电池供电设备上运行这些算法呢？在本次演讲中，我们将了解到 tinyML 的基础技术，传统信号处理仍将如何发挥出不可思议的作用，以及用于直接在传感器上处理振动、生物信号、音频和图像数据的最新算法。

tinyMLAsia2020d1p1-Jongboom

Tiny but powerful: Hardware for High Performance, Low Power Machine Learning

小而强大：高性能，低功耗机器学习硬件

Matthew MATTINA, Distinguished Engineer and Senior Director, Arm

An overview of the state of the art of hardware for ultra-low power machine learning at the edge.

tinyMLAsia2020d1p2-Mattina

Big Opportunities for tinyML Applications: Everywhere and Always-On

tinyML 应用的巨大机遇：“无处不在，无时不在”

Evgeni GOUSEV, Senior Director, Qualcomm Research

Abstract (English)

Recent progress in computing hardware, machine learning algorithms and networks and availability of large datasets for model training have created a strong momentum in development and wide deployment of game changing AI applications. Dedicated hardware becomes tiny and very energy efficient (with mW or less power consumption), algorithms and models – smaller (down to 10s of kB of memory requirements), software – lighter down to deployment on deeply embedded platforms. This enormous technology innovative wave and fast growing tinyML ecosystem create enormous opportunities for new applications and business models. This presentation will review the state-of-the-art of tinyML, its applications in various “verticals”, describe some practical examples of technologies and products pioneered at Qualcomm and else, and discusses near-term trends and opportunities, as well as call to actions. As an end result, tinyML will create a better, healthier and more sustainable environment for all.

摘要 (Chinese)

应用的巨大机遇：“无处不在，无时不在

摘要：
在计算硬件，机器学习算法和网络以及用于模型训练的大型数据集的可用性方面的最新进展，为改变游戏规则的 AI 应用程序的开发和广泛部署创造了强大的推动力。专用硬件变得非常小巧，并且非常节能（功耗只有 mW 甚至更少），算法和模型更小（内存需求降低至10s kB），软件更为轻巧，可以部署在深度嵌入式平台上。这种巨大的技术创新浪潮和快速增长的 tinyML 生态系统为新的应用和商业模式创造了巨大的机会。本演讲将回顾 tinyML 的最新技术及其在各种“垂直领域”中的应用，介绍高通公司率先推出的技术和产品的一些实际示例，并讨论近期趋势和机遇，并呼吁大家行动起来投身于此。最终，tinyML 将为所有人创造一个更好，更健康，更可持续的环境。

tinyMLAsia2020d2p1-Welcome-Gousev

China Standard Time (CST) / UTC+8

9:00 am to 10:00 am

Welcome & Plenary

Welcome & Opening Remarks

Evgeni GOUSEV, Senior Director, Qualcomm Research

Plenary: “The Road to Innovation for AI Chips in China”

中国AI芯片的创新之路

Shaojun WEI 魏少军 , Director, Institute of Microelectronics, Tsinghua University

Abstract (English)

According to CCID, an industry consulting agency, the China’s AI chip market will keep a growth rate of more than 50% in the next three years, and further increase to 30.57 billion yuan in 2021. Driven by the rapid development of the market, China’s AI chips continue to attract the attention of industry and academia. A large number of start-up companies have sprung up in China, such as SenseTime, Ali Pingtouge, Cambrian and other start-up enterprises to cut into the emerging subdivision fields, overtake in different lanes, and catch up with international giants, which has attracted extensive attention. Domestic academic circles are also promoting the development of new technology of artificial intelligence chip in China. In recent years, the combination of chips based on traditional computing architecture and various hardware and software acceleration schemes has achieved great success in some scenarios, but the general artificial intelligence chip technology which can flexibly adapt to diverse needs has not yet emerged. Reconfigurable computing technology has the characteristics of real-time dynamic configuration, which makes the software definition no longer limited to the functional level, and the calculation accuracy, performance and energy efficiency of the algorithm can be included in the scope of software definition, which makes the intelligent computing chip not only obtain high flexibility, but also has more advantages in computing energy efficiency. Therefore, in order to better adapt to the characteristics and requirements of intelligent computing and to improve the resource utilization and energy efficiency ratio of neural network computing, highly parallel computing architecture with hardware reconfiguration capability will continue to be a research hotspot. In a word, this field is still in the primary stage of “a hundred schools of thought contend”, and there is a huge space for innovation in scientific research and industrial application.

摘要 (Chinese)

中国 AI 芯片的创新之路

摘要：
赛迪顾问预计，未来三年中国 AI 芯片市场规模仍将保持 50%以上的增长速度，2021 年将进一步增长至 305.7 亿元。在快速发展的市场推动下，中国 AI 芯片持续吸引产业界、学术界共同关注。国内涌现大量创业公司，如商汤科技、阿里平头哥、寒武纪等多家初创企业切入新兴细分领域，换道超车，追赶国际巨头，受到广泛关注。国内学术界也在持续推进我国人工智能芯片新技术的发展。近几年，基于传统计算架构的芯片和各种软硬件加速方案相结合，在一些场景下取得了巨大成功，但能够灵活适应多样化需求的通用化人工智能芯片技术尚未出现。可重构计算技术具有实时动态配置特点，使得软件定义不再局限于功能层面，算法的计算精度、性能和能效等都可以纳入软件定义的范畴，这使得智能计算芯片不仅能获得极高的灵活度，而且在计算能效方面也更具优势。因此为了更好地适应智能计算特点及需求，提升神经网络计算的资源利用率和计算能效比，具有硬件重构能力的高度并行计算架构将持续成为研究热点。总而言之，该领域仍处于“百家争鸣”的初级阶段，科研和产业应用都存在巨大的创新空间.

tinyMLAsia2020d2p2-Plenary-Wei

10:00 am to 11:30 am

Session #1 – Hardware

Embedding AI in Everything: mW-level Neural Network Processor

让人工智能无处不在：毫瓦级神经网络处理器

Shouyi YIN 尹首, Professor, Tsinghua University

Abstract (English)

Deep neural networks (DNNs) have achieved great success in many applications of artificial intelligence (AI). To enable always-on and pervasive AI applications in mobile and IoT devices, ultra-low power neural network processors are required. With the progresses of both neural network algorithms and computing architectures, it is possible to design mW-level neural network processors. In this talk, we introduce Thinker processors which have the potential to embed AI in everything.

摘要 (Chinese)

摘要：深度神经网络在大量人工智能应用中取得了巨大成功。要将人工智能进一步普及到移动和物联网设备，则需要超低功耗的神经网络处理器。随着神经网络算法和计算体系结构的发展，设计毫瓦级神经网络处理器成为可能。此次报告的主要内容就是介绍能够让人工智能无处不在的毫瓦级超低功耗神经网络处理器——Thinker。

tinyMLAsia2020d2p3-Yin

Powering innovation in a new world of AI devices on the edge with microNPUs

Arm Ethos-U 微型 NPU 引领终端侧人工智能设备的创新

Tanuj ARORA, Product Manager, Arm

Abstract (English)

With the explosion of endpoint/IOT devices, the demand for compute intelligence on the edge continues to increase. However accelerating Machine Learning (ML) /AI on the edge is a complex task. Newly available micro-network processor units (micro-NPUs) enables a significant change in Machine Learning processing on the edge. By enabling low cost and highly efficient AI solutions, the micro-NPUs such as Arm’s Ethos-U unlock new applications/capabilities on embedded devices like never before. In this session, learn more about the latest advancements in Machine Learning solutions for the next generation of embedded devices.

摘要 (Chinese)

随着端点和物联网设备的爆炸性增长，对人工智能终端侧设备的需求持续攀升。然而，加速终端侧人工智能仍然面临很多挑战。Arm 最新推出的微型 NPU 将在终端测带来崭新的机器学习处理能力。通过支持更加低功耗和高效能的人工智能技术，Ethos-U 微处理器将在嵌入设备中解锁新的应用并带来前所未有的用户体验。在这个讲座中，我们将分享机器学习领域的最新突破如何帮您打造新一代智能嵌入式解决方案。

tinyMLAsia2020d2p4-Arora

Toward Data-Driven Applications with AIoT Devices

以 AIoT 裝置邁向資料驅動的創新應用

Chenyi LEE 李鎮宜, Professor, National Chiao Tung University

Abstract (English)

More and more data-driven applications have been widely accepted and deployed in the market to improve life quality, where local intelligence solutions are embedded in the IoT devices to reach better data security control and real-time performances under limited bandwidth constraints. In the first part of this talk, model reduction related to network architecture search (NAS) and adaptive quantization (AQ) will be addressed to see how both storage capacity and computing complexity can be reduced while maintain acceptable accuracy, energy-efficiency, and service quality. In the second part, a new approach related to meta-learning will be introduced to enhance inference accuracy for AIoT devices deployed in different environments. Some examples related to image/video services will be provided to demonstrate the feasibility of our proposal in real-life applications. Finally some topics will be outlined for future AIoT research.

摘要 (Chinese)

摘要：目前在市場已佈建越來越多的資料驅動應用與裝置，來改善我們周遭環境與生活品質，其中所使用的智慧物聯網(AIoT)裝置，逐漸導入區域智慧解決方案，在有限頻寬的條件下，達到更佳的資料安全與即時作業效能。首先我們探討如何透過網路架構搜尋(NAS)與適應性量化(AQ)機制，有效降低現有使用模型的容量，進而降低儲存容量與運算複雜度，在維持可接受的應用準確度下，提升智慧物聯網裝置的能源效率與服務品質。接著將更深入探討新的學會學習(Meta-Learning)，透過此新的機制可有效提升智慧物聯網裝置佈建於不同環境下的推演準度。最後藉由影像/視訊服務相關的應用案例，來展示上述所提技術方案的可行性，同時也列出一些與 AIoT 未來發展的研究議題.

tinyMLAsia2020d2p5-Lee

China Standard Time (CST) / UTC+8

9:00 am to 10:00 am

How TinyML Could Help Developing Countries

TinyML 如何帮助发展中国家

Pete WARDEN, Technical Lead, Google

Abstract (English)

Machine learning on embedded hardware is a new field that’s only beginning to be used in the developed world, but it has some characteristics that may make it particularly appropriate for developing countries. The hardware and software required to train students, advance research, and build products is much cheaper than the equivalents for Cloud ML, and since the devices don’t need reliable network or even mains power connections, they can be deployed almost anywhere. Pete Warden is the Technical Lead of the TensorFlow Micro open source project from Google, and has previously helped projects like Plant Village use on-device deep learning in West Africa. In this talk he’ll introduce some of the possibilities that edge compute opens up for the world outside the West, and will be looking to learn from the audience what problems in their domains it might be applicable to.

摘要 (Chinese)

摘要：
嵌入式设备上的机器学习是一个新兴领域。这项技术在发达国家方兴未艾，但是它的一些特性可能更适合于发展中国家。与基于云端的机器学习相比，嵌入设备机器学习所需要的硬件，软件价格相对低廉。无论是培训学生，推进研究和构建产品所需的成本更少，并且由于设备不需要依赖网络甚至是主电源连接，它们可以部署在任何地方。Pete Warden 是 Google TensorFlow Micro 开源项目的技术负责人，之前曾在西非应用基于设备上的深度学习帮助诸如 Plant Village 之类的项目。在本次演讲中，他将介绍边缘计算在西方以外的世界所带来的可能性，并将向听众寻求嵌入设备机器学习如何能在他们各自领域中的用例.

tinyMLAsia2020d3p1-Plenary-Warden

10:00 am to 11:00 am

Session #2 – Algorithms

Structured Quantization for Neural Network Language Model Compression

神经网络语言模型压缩中的结构化量化方法

Kai YU 俞凯, Research Professor, Shanghai Jiao Tong University

Abstract (English)

Neural network language model (NNLM) has shown to be a fundamental component for speech recognition and natural language processing in the deep learning era. Unfortunately, large memory consumption of prohibits its use in many resource-constrained scenarios. Effective NN LM compression approaches that are independent of NN structures are therefore of great interest. However, most compression approaches usually achieve a high compression ratio at the cost of significant performance loss. We will show that, with advanced structured quantization techniques, it is possible to achieve a very high NNLM compression ratio, 70-100, without losing performance compared to the uncompressed models.

摘要 (Chinese)

神经网络语言模型是深度学习时代的语音识别和自然语言处理的基础组件。虽然在云端得到了广泛使用，但在低资源的条件下，内存占用过大的问题严重影响了大词汇神经网络语言模型的小型化。因此，与网络结构无关的高效的神经网络压缩算法越来越得到学术和产业界的高度重视。大多数压缩算法，在实现较高压缩比的时候，往往都伴随着性能的显著下降。本报告将介绍一种新型的“结构化量化”算法，可以在几乎没有性能损失或损失很小的情况下，实现70-100倍的极高的神经网络语言模型压缩比，这为大规模神经网络模型在语音和语言处理任务中的小型化提供了坚实的基础。

tinyMLAsia2020d3p2-Yu

When will enter the era of In-memory Computing with thousandfold energy-efficiency? — an algorithm-architecture co-design approach

何时进入千倍能效优势的存算一体热兵器时代？——一种算法-架构协同设计方法

Li JIANG 蒋力, Associate Professor, Shanghai Jiao Tong University

Abstract (English)

The rapid rising computing power over the past decade has supported the advance of Artificial Intelligence. Still, in the post-Moore era, AI chips with traditional CMOS process and VanNeumann architectures face huge bottlenecks in memory walls and energy efficiency wall. In-memory computing architecture based on emerging memristor technology has become a very competitive computing paradigm to deliver two order-of-magnitude higher energy efficiency. The memristor process has apparent advantages in power consumption, multi-bit, and cost.However, it faces challenges of the low manufacturing scalability and process variation, which lead to the instability of computation and limited capability of accommodate large and complex neural networks. This talk will introduce the algorithm and architecture co-optimization approach to solve the above challenges.

摘要 (Chinese)

过去十年算力以惊人的速度发展，支撑着人工智能领域的前行，然而在后摩尔时代，传统工艺和架构的计算芯片面临着存储墙和能效墙的巨大瓶颈。为了解决存储墙和能效墙的问题，基于忆阻器新工艺的存算一体架构成为极具竞争力的计算形态。忆阻器工艺在智能计算，功耗，集成度和成本都有明显优势。然而，也面临着集成密度、可扩展性不足，工艺波动导致计算不稳定问题，最终使其承载大型复杂神经网络的能力受到限制。本次报告将会介绍我们通过算法/架构的件协设计与优化方法，来解决基于忆阻器的存算一体架构可扩展性和计算稳定性方面的问题，加速这一新兴技术的产业落地。

tinyMLAsia2020d3p3-Jiang

11:00 am to 11:30 am

Video Posters

Enabling Embedded Vision For All With Ultra-Low Power Image Classification

Semir HADDAD, Senior Director Product Marketing, Eta Compute

Applying machine learning capabilities to wearable IoT devices for boxing technique management

Anthony JOSEPH, Chief Technology Officer, My House Geek

China Standard Time (CST) / UTC+8

9:00 am to 10:30 am

Session #3 – Applications & Systems

System Software for Machine Learning at the Edge

边缘侧机器学习的系统软件

Tulika MITRA, Professor, National University of Singapore

Abstract (English)

The recent years have witnessed unprecedented advances in machine learning (ML) accelerators on mobile and IoT devices at the edge. The system-on-chips, pairing regular processor cores with ML-capable GPU, DSP, and ASIC accelerators, are dominating the edge landscape. However, the current state-of-the-art is inadequate in the software dimension despite tremendous progress at the hardware level. This talk will put the spotlight on the compiler and runtime software approaches to unleash the full potential of the hardware for ML in mobile and IoT applications. In particular, we will present deep neural network specific optimizations, workload partitioning, and voltage-frequency scaling to orchestrate the different on-chip compute resources in a synergistic manner and achieve low-power, real-time edge ML.

摘要 (Chinese)

近年来机器学习在边缘侧的移动设备及物联网设备上取得了突飞猛进的发展。片上系统将常规处理器内盒与支持机器学习的 GPU、DSP 和 ASIC 加速器配对的方式在边缘场景中占主导地位。但尽管硬件方面有了巨大进步，当前的最新技术在软件方面还不够。本次演讲将聚焦编译器和运行时的软件方法，以便为移动端和物联网应用中的机器学习释放全部硬件潜力。尤其是，我们将为深度神经网络提供特定优化、工作负载分配和电压频率缩放，用协同方式协调不同的片上计算资源，以实现低功耗、实时的边缘机器学习。

tinyMLAsia2020d4p1-Mitra

Acceleration of Deep Learning Inference on Raspberry Pi’s VideoCore GPU

利用树莓派的VideoCore 核心处理器加速深度学习推理

Koichi NAKAMURA, CEO and Founder, Idein Inc.

Abstract (English)

Raspberry Pi, a small, inexpensive computer favored by developers around the world, is equipped with a VideoCore IV/VI GPU, but it is underutilized as a computing resource. We have developed device programming tools, libraries, math kernels and an optimizing graph compiler specialized for machine learning models to use VideoCore for GPGPU. And we have achieved a significant speedup of deep learning inference on Raspberry Pi series without the use of computation-reducing techniques, such as quantization, which may compromise accuracy. This talk will introduce the architectural features of VideoCore and techniques for the acceleration, as well as other our research and development results on ARM CPUs, Intel GPUs, FPGAs, etc.

摘要 (Chinese)

树莓派是一款受到世界各地开发人员青睐的小型，廉价的计算机。它配备了VideoCore IV / VI GPU，但是其提供的计算资源并没有被充分利用。我们针对VideoCore GPGPU开发了设备编程工具，软件库，数学内核以及专门用于机器学习模型的优化编译器。这套产品在不使用量化或者其他可能影响模型准确性的优化方案的前提下，大大提高了树莓派深度学习推理的速度。本讲座将介绍VideoCore的架构功能和加速技术，以及我们在ARM CPU，Intel GPU，FPGA等方面的其他研发成果。

tinyMLAsia2020d4p2-Nakamura

Accelerating AIoT development with TencentOS

赋能物联终端，TencentOS 让AIoT解决方案开发更便捷

Jack ZHAO 赵健, Sr. Engineer, Tencent

Abstract (English)

In this session, we are going to introduce TencentOS Tiny open source project, and how it accelerates end to end AIoT application development on MCUs. We will also present our solutions in retail, smart conferencing and other industrial verticals.

摘要 (Chinese)

1. TencentOS Tiny开源项目简介 2. 基于TencentOS Tiny快速构建在微处理器上端到端 AIoT应用 3. TencentOS Tiny 在AIoT领域的思考与行动