tinyML Talks: Demoing the world’s fastest inference engine for Arm Cortex-M

Recently we announced Plumerai’s inference engine for 8-bit deep learning models on Arm Cortex-M microcontrollers. We showed that it is the world’s most efficient on MobileNetV2, beating TensorFlow Lite for Microcontrollers with CMSIS-NN kernels by 40% in terms of latency and 49% in terms of RAM usage with no loss in accuracy. However, that was just on a single network and it might have been cherry-picked. Therefore, we will give a live demonstration of a new service that you can use to test your own models with our inference engine. In this talk we will explain what we did to get these speedups and memory improvements and we will show benchmarks for the most important publicly available neural network models.

Date

January 4, 2022

Location

Virtual

Contact us

Discussion

Schedule

Timezone: PST

Demoing the world’s fastest inference engine for Arm Cortex-M

Cedric NUGTEREN, Deep learning software engineer

Plumerai

Recently we announced Plumerai’s inference engine for 8-bit deep learning models on Arm Cortex-M microcontrollers. We showed that it is the world’s most efficient on MobileNetV2, beating TensorFlow Lite for Microcontrollers with CMSIS-NN kernels by 40% in terms of latency and 49% in terms of RAM usage with no loss in accuracy. However, that was just on a single network and it might have been cherry-picked. Therefore, we will give a live demonstration of a new service that you can use to test your own models with our inference engine. In this talk we will explain what we did to get these speedups and memory improvements and we will show benchmarks for the most important publicly available neural network models.

Cedric NUGTEREN, Deep learning software engineer

Plumerai

Cedric Nugteren is a software engineer focussed on writing efficient code for deep learning applications. After he received his MSc and PhD from Eindhoven University of Technology he optimized GPU and CPU code for various companies using C++, OpenCL and CUDA. Then, he worked for 4 years on deep learning for autonomous driving at TomTom, after which he joined Plumerai where he is now writing fast code for the smallest microcontrollers.

Schedule subject to change without notice.