UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM PETER AUER AND RONALD ORTNER A BSTRACT. In the stochastic multi-armed bandit problem we consider a modification of…
Noname manuscript No. (will be inserted by the editor) Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes Ronald Ortner the date of receipt and acceptance…
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning Odalric-Ambrym Maillard odalricambrym.maillard@gmail.com Montanuniversität Leoben, Franz-Josef-Strasse 18,…
Competing with an Infinite Set of Models in Reinforcement Learning Phuong Nguyen Australian National University and NICTA Canberra ACT 0200, AUSTRALIA Odalric-Ambrym Maillard1 Technion, Faculty of…
Regret Bounds for Restless Markov Bandits Ronald Ortner∗, Daniil Ryabko∗∗, Peter Auer∗, Rémi Munos∗∗ Abstract We consider the restless Markov bandit problem, in which the state of each arm evolves…
Selecting Near-Optimal Approximate State Representations in Reinforcement Learning Ronald Ortner1 , Odalric-Ambrym Maillard2 , and Daniil Ryabko3 1 Montanuniversitaet Leoben, Austria 2 The Technion,…