December 05, 2021
Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side. We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework. We establish a lower bound for regret minimization in finite-horizon MDPs with LDP guarantees which shows that guaranteeing privacy has a multiplicative effect on the regret. This result shows that while LDP is an appealing notion of privacy, it makes the learning problem significantly more complex. Finally, we present an optimistic algorithm that simultaneously satisfies $\varepsilon$-LDP requirements, and achieves $\sqrt{K}/\varepsilon$ regret in any finite-horizon MDP after $K$ episodes, matching the lower bound dependency on the number of episodes $K$.
Written by
Evrard Garcelon
Vianney Perchet
Ciara Pike-Burke
Matteo Pirotta
Publisher
NeurIPS
Research Topics
January 06, 2024
Geng Ji, Wentao Jiang, Jiang Li, Fahmid Morshed Fahid, Zhengxing Chen, Yinghua Li, Jun Xiao, Chongxi Bao, Zheqing (Bill) Zhu
January 06, 2024
December 11, 2023
Dishank Bansal, Ricky Chen, Mustafa Mukadam, Brandon Amos
December 11, 2023
October 01, 2023
Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu
October 01, 2023
September 12, 2023
Bill Zhu, Alex Nikulkov, Dmytro Korenkevych, Fan Liu, Jalaj Bhandari, Ruiyang Xu, Urun Dogan
September 12, 2023
Product experiences
Foundational models
Product experiences
Latest news
Foundational models