CORE MACHINE LEARNING

THEORY

From Low Probability to High Confidence in Stochastic Convex Optimization

Feb 26, 2021

Abstract

Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rare, and typically either rely on “light-tail” noise assumptions or exhibit worse sample complexity. In this work, we show that a wide class of stochastic optimization algorithms for strongly convex problems can be augmented with high confidence bounds at an overhead cost that is only logarithmic in the confidence level and polylogarithmic in the condition number. The procedure we propose, called proxBoost, is elementary and builds on two well-known ingredients: robust distance estimation and the proximal point method. We discuss consequences for both streaming (online) algorithms and offline algorithms based on empirical risk minimization.

Download the Paper

AUTHORS

Written by

Damek Davis

Dmitriy Drusvyatskiy

Lin Xiao

Junyu Zhang

Research Topics

Theory

Core Machine Learning

Related Publications

June 10, 2019

THEORY

Stochastic Gradient Push for Distributed Deep Learning | Facebook AI Research

Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed…

Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Mike Rabbat

June 10, 2019

March 12, 2018

THEORY

Geometrical Insights for Implicit Generative Modeling | Facebook AI Research

Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum…

Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

March 12, 2018

April 30, 2018

THEORY

mixup: Beyond Empirical Risk Minimization | Facebook AI Research

Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup…

Hongyi Zhang, Moustapha Cisse, Yann Dauphin, David Lopez-Paz

April 30, 2018

April 01, 2020

THEORY

Residual Energy-Based Models for Text Generation

Text generation is ubiquitous in many NLP tasks, from summarization, to dialogue and machine translation…

Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc’Aurelio Ranzato, Arthur Szlam

April 01, 2020

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.