Title: Factorized Layers Revisited: Compressing Deep Neural Networks Without Playing the Lottery
Abstract: Machine learning models are rapidly growing in size, leading to increased training and deployment costs. While the most popular approach for training compressed models is trying to guess good "lottery tickets" or sparse subnetworks, we revisit the low-rank factorization approach, in which weights matrices are replaced by products of smaller matrices. We extend recent analyses of optimization of deep networks to motivate simple initialization and regularization schemes for improving the training of these factorized layers. Empirically these methods yield higher accuracies than popular pruning and lottery ticket approaches at the same compression level. We further demonstrate their usefulness in two settings beyond model compression: simplifying knowledge distillation and training Transformer-based architectures such as BERT. This is joint work with Neil Tenenholtz, Lester Mackey, and Nicolo Fusi.
CMU AI Seminar website: [ Ссылка ]
![](https://s2.save4k.ru/pic/ip2nig53uOA/maxresdefault.jpg)