tinyML Talks: The extreme compression of LSTM models using sparse structured additive matrices


November 3, 2020



Contact us



Timezone: PST

The extreme compression of LSTM models using sparse structured additive matrices

Urmish THAKKER, Principal Engineer

SambaNova Systems Inc

Structured matrices, such as those derived from Kronecker products (KP), are effective at compressing neural networks, but can lead to unacceptable accuracy loss when applied to large models. In this paper, we propose the notion of doping -addition of an extremely sparse matrix to a structured matrix. Doping facilitates additional degrees of freedom for a small number of parameters, allowing them to independently diverge from the fixed structure. To train LSTMs with doped structured matrices, we introduce the additional parameter matrix while slowly annealing its sparsity level. However, we find that performance degrades as we slowly sparsify the doping matrix, due to co-matrix adaptation(CMA) between the structured and the sparse matrices. We address this overdependence on the sparse matrix using a co-matrix dropout regularization (CMR)scheme. We provide empirical evidence to show that doping, CMA and CMR are concepts generally applicable to multiple structured matrices (Kronecker Product, LMF, Hybrid Matrix Decomposition). Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 − 25x) across 4 natural language processing applications with minor loss in accuracy. Doped KP compression technique outperforms previous state-of-the-art compression results by achieving 1.3−2.4xhigher compression factor at a similar accuracy, while also beating strong alternatives like pruning and low-rank methods by a large margin (8% or more).Additionally, we show that doped KP can be deployed on commodity hardware using the current software stack and achieve 2.5 − 5.5x inference run-time speed-upover baseline.

Urmish THAKKER, Principal Engineer

SambaNova Systems Inc

Urmish is a Principal Engineer at SambaNova Systems. Previously, he was a Senior Research Engineer at Arm’s ML Research Lab where he worked on efficient execution of NN on Arm devices. His worked spanned both algorithms and hardware for ML. He has published 20+ papers and patents on topics like model compression (pruning, quantization, low rank decomposition, structured matrices, NAS and conditional computation), efficient libraries for NN and hardware and accelerators for NLP Applications.

Along with extensive experience in the field of machine learning, Urmish has also worked on performance modeling, design and verification of CPUs and memory controllers at Texas Instruments, AMD and Broadcom.

He holds a Master’s from University of Wisconsin Madison, USA and Bachelor’s from Birla Institute of Technology and Science, India.

Schedule subject to change without notice.