tinyML Talks: A Practical Guide to Neural Network Quantization

Neural network quantization is an effective way of reducing the power requirements and latency of neural network inference while maintaining high accuracy. The success of quantization has led to a large volume of literature and competing methods in recent years, and Qualcomm has been at the forefront of this research. This talk aims to cut through the noise and introduce a practical guide for quantizing neural networks inspired by our research and expertise at Qualcomm. We will begin with an introduction to quantization and fixed-point accelerators for neural network inference. We will then consider implementation pipelines for quantizing neural networks with near floating-point accuracy for popular neural networks and benchmarks. Finally, you will leave this talk with a set of diagnostic and debugging tools to address common neural network quantization issues.

 

You can find more information about the theory and algorithms we will discuss in this talk in our White Paper on Neural Network Quantization at the following arXiv link: https://arxiv.org/abs/2106.08295

Date

September 28, 2021

Location

Virtual

Contact us

Discussion

Schedule

Timezone: PDT

A Practical Guide to Neural Network Quantization

Marios FOURNARAKIS, Deep Learning Researcher

Qualcomm AI Research, Amsterdam

Neural network quantization is an effective way of reducing the power requirements and latency of neural network inference while maintaining high accuracy. The success of quantization has led to a large volume of literature and competing methods in recent years, and Qualcomm has been at the forefront of this research. This talk aims to cut through the noise and introduce a practical guide for quantizing neural networks inspired by our research and expertise at Qualcomm. We will begin with an introduction to quantization and fixed-point accelerators for neural network inference. We will then consider implementation pipelines for quantizing neural networks with near floating-point accuracy for popular neural networks and benchmarks. Finally, you will leave this talk with a set of diagnostic and debugging tools to address common neural network quantization issues.

 

You can find more information about the theory and algorithms we will discuss in this talk in our White Paper on Neural Network Quantization at the following arXiv link: https://arxiv.org/abs/2106.08295

Marios FOURNARAKIS, Deep Learning Researcher

Qualcomm AI Research, Amsterdam

Marios Fournarakis is a Deep Learning Researcher at Qualcomm AI Research in Amsterdam, working on power-efficient training and inference of neural networks, focusing on quantization techniques and compute-in-memory. He is also interested in low-power AI applications and equivariant neural networks. He completed his graduate work in Machine Learning at University College London and holds a Master’s in Engineering from the University of Cambridge. Prior to Qualcomm, he worked as a Computer Vision research intern at Niantic Labs in London on ML-based video anonymization, and at Arup as a structural engineering consultant.

Schedule subject to change without notice.