CORE MACHINE LEARNING

Align, then memorise: the dynamics of learning with feedback alignment

July 18, 2021

Abstract

Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to backpropagation for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory of feedback alignment algorithms. We first show that learning in shallow networks proceeds in two steps: analignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This two-step process has a degeneracy breaking effect: out of all the low-loss solutions in the landscape, a net-work trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment, and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorize process occurs sequentially from the bottom layers of the network to the top.

Download the Paper

AUTHORS

Written by

Maria Refinetti

Stephane d’Ascoli

Ruben Ohana

Sebastian Goldt

Publisher

ICML 2021

Research Topics

Core Machine Learning

Related Publications

November 03, 2020

CORE MACHINE LEARNING

Robust Embedded Deep K-means Clustering

Deep neural network clustering is superior to the conventional clustering methods due to deep feature extraction and nonlinear dimensionality reduction.…

Rui Zhang, Hanghang Tong Yinglong Xia, Yada Zhu

November 03, 2020

December 07, 2020

CORE MACHINE LEARNING

Adversarial Example Games

The existence of adversarial examples capable of fooling trained neural network classifiers calls for a much better understanding of possible attacks to guide the development…

Avishek Joey Bose, Gauthier Gidel, Andre Cianflone, Pascal Vincent, Simon Lacoste-Julien, William L. Hamilton

December 07, 2020

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.