November 03, 2021
Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.
Written by
Boris Knyazev
Michal Drozdzal
Graham Taylor
Adriana Romero Soriano
Publisher
NeurIPS
Research Topics
Core Machine Learning
May 04, 2023
Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran
May 04, 2023
May 01, 2023
Keegan Harris, Ioannis Anagnostides, Gabriele Farina, Mikhail Khodak, Zhiwei Steven Wu, Tuomas Sandholm, Maria-Florina Balcan
May 01, 2023
April 26, 2023
Ashkan Yousefpour, Shen Guo, Ashish Shenoy, Sayan Ghosh, Pierre Stock, Kiwan Maeng, Schalk Krüger, Mike Rabbat, Carole-Jean Wu, Ilya Mironov
April 26, 2023
February 28, 2023
Quentin Garrido, Adrien Bardes, Yann LeCun, Yubei Chen, Laurent Najman
February 28, 2023
Latest Work
Our Actions
Newsletter