Structured and Continuous Reinforcement Learning

FWF project P 26219-N15 (April 2014 - May 2016)

Project leader: Ronald Ortner

Department für Mathematik und Informationstechnologie
Lehrstuhl für Informationstechnologie
Montanuniversität Leoben
Franz-Josef-Straße 18
A-8700 Leoben

Tel.: +43 3842 402-1503
Fax: +43-3842-402-1502
E-mail: ronald.ortner(at)unileoben.ac.at

About the project

In the precursor project (see below), we were able to define very general similarity structures for reinforcement learning problems in finite domains and to achieve improved theoretical regret bounds when the underlying similarity structure is known. The developed techniques and algorithms also led to the first theoretical regret bounds for reinforcement learning in continuous domains (see the NIPS 2012 paper below). In the current project we want to take the research on continuous reinforcement learning - a setting which is of particular importance for applications - a step further, not only by improving over the known bounds, but also by the development of efficient algorithms. Moreover, we also want to investigate in more general settings where the learner does not have direct access to the domain information, but only to a set of possible models. Also for this setting, the precursor project has produced first theoretical results, assuming finite domains and that the set of possible models contains the correct model (see ICML 2013 and AISTATS 2013 paper below). In the current project, we aim at generalizing this to infinite domains and loosening the assumption on the model set, which shall not necessarily contain the correct model, but only a good approximation of it.

 

 

Precursor project (Erwin Schrödinger scholarship, FWF project J3259-N13)

Final report

Jobs

Currently, no jobs are available.

Publications (also of the precursor project)

P.Auer, C.-K.Chiang: An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR Workshop and Conference Proceedings Volume 49: Proceedings of the 29th Conference on Learning Theory, COLT 2016, pp. 116-120.
(pdf)

 

P.Auer, C.-K.Chiang, R.Ortner, M.Drugan: Pareto Front Identification from Stochastic Bandit Feedback,
JMLR Workshop and Conference Proceedings Volume 51 : Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016.
(pdf)

 

V.Gabillon, A.Lazaric, M.Ghavamzadeh, R.Ortner, P.Bartlett: Improved Learning Complexity in Combinatorial Pure Exploration Bandits,
JMLR Workshop and Conference Proceedings Volume 51 : Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016.
(pdf)

R.Ortner: Optimal Behavior is Easier to Learn than the Truth,
Minds and Machines, to appear
(open access pdf)

 

K.Lakshmanan, R.Ortner, and D.Ryabko: Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning,
JMLR Workshop and Conference Proceedings Volume 37 : Proceedings of The 32nd International Conference on Machine Learning, ICML 2015.
(preprint pdf)

 

R.Ortner, O.Maillard, and D.Ryabko: Selecting Near-Optimal Approximate State Representations in Reinforcement Learning,
Proceedings of the 25th International Conference on Algorithmic Learning Theory, ALT 2014.
Lecture Notes in Computer Science 8776, Springer 2014, pp. 140-154.
(extended submission pdf)

 

R.Ortner, D.Ryabko, P.Auer, and R.Munos: Regret Bounds for Restless Markov Bandits,
Theoretical Computer Science 558, 62-76 (2014)
(preprint pdf)


R.Ortner: Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes,
Ann.Oper.Res. 208 / 1, 321-336 (2013)
(preprint pdf)

 

R.Ortner, D.Ryabko, P.Auer, and R.Munos: Regret Bounds for Restless Markov Bandits,
In: Proceedings of the 23th International Conference on Algorithmic Learning Theory, ALT 2012.
Lecture Notes in Computer Science 7568, Springer 2012, pp. 214-228.
(preprint pdf)

 

R.Ortner, D.Ryabko: Online Regret Bounds for Undiscounted Continuous Reinforcement Learning,
In: Advances in Neural Information Processing Systems 25 (2012), pp. 1772-1780.
(preprint pdf)

 

O.Maillard, P.Nguyen, R.Ortner, and D.Ryabko: Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning,
JMLR Workshop and Conference Proceedings Volume 28 : Proceedings of The 30th International Conference on Machine Learning, ICML 2013, pp. 543-551.
(corrected preprint pdf)

 

P.Nguyen, O.Maillard, D.Ryabko, R.Ortner: Competing with an Infinite Set of Models in Reinforcement Learning,
JMLR Workshop and Conference Proceedings Volume 31 : Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, AISTATS 2013, pp. 463-471.
(preprint pdf)