tinyML Talks: Tutorial on micro-kernel based hardware acceleration


August 13,2020



Contact us



Timezone: PDT

Tutorial on micro-kernel based hardware acceleration

Manu RASTOGI, Machine Learning Engineer


Energy and compute are both scarce for deep learning deployment at the edge. Rapid innovation in new layer types and network topologies makes it even more challenging. There is also increased pressure on hardware designs and toolchain development for automated and efficient model deployment. Often the hardware and toolchains lag behind in the support of new layers. Since deep learning is becoming more ubiquitous there is stiff competition among different hardware vendors to provide the most energy-efficient solutions. The key piece to model deployment at the edge is the mico-kernels or the micro-code that orchestrate the data movement and the computation of these networks on hardware. As part of this talk, we will walk through the matrix multiplication micro-code. We will understand the various trade-offs between different optimization strategies and extend these principles to neural networks.

Manu RASTOGI, Machine Learning Engineer


Manu Rastogi received his B.Tech from India and his MS and Ph.D. from the University of Florida in 2012. Since graduation, he has worked at Qualcomm Research and HP Labs. As a member of the Qualcomm research team, he worked on the Qualcomm Zeroth processor in various capacities and later on the Qualcomm deep learning engine. His roles at Qualcomm varied from developing signal processing algorithms, model development, and deep learning model optimizations. At HP he led the efforts around machine learning at the edge and self-supervised learning methods using mutual information for speaker identification.

Schedule subject to change without notice.