THEORY

REINFORCEMENT LEARNING

Bandits with Knapsacks beyond the Worst-Case Analysis

November 12, 2021

Abstract

Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider “simple regret” in BwK, which tracks algorithm’s performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general “reduction” from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from Agrawal and Devanur (2014), providing new analyses thereof.

Download the Paper

AUTHORS

Written by

Karthik Abinav Sankararaman

Aleksandrs Slivkins

Publisher

NeurIPS

Research Topics

Theory

Reinforcement Learning

Core Machine Learning

Related Publications

December 06, 2021

THEORY

CORE MACHINE LEARNING

Learning on Random Balls is Sufficient for Estimating (Some) Graph Parameters

Takanori Maehara, Hoang NT

December 06, 2021

December 05, 2021

REINFORCEMENT LEARNING

Local Differential Privacy for Regret Minimization in Reinforcement Learning

Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

December 05, 2021

December 05, 2021

REINFORCEMENT LEARNING

Hierarchical Skills for Efficient Exploration

Jonas Gehring, Gabriel Synnaeve, andreas krause, Nicolas Usunier

December 05, 2021

November 09, 2021

REINFORCEMENT LEARNING

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

Simone Parisi, Victoria Dean, Deepak Pathak, Abhinav Gupta

November 09, 2021