Ronald Ortner

Department für Mathematik und Informationstechnologie
Lehrstuhl für Informationstechnologie
Montanuniversität Leoben
Franz Josef-Straße 18
A-8700 Leoben

Tel.: +43 3842 402-1503
E-mail: ronald.ortner(at)unileoben.ac.at

Sprechstunde: nach Übereinkunft

Lehre/Abschlussarbeiten

meine Lehrveranstaltungen im MU Online
Themen für Bakkalaureats- und Masterarbeiten

Projects

FWF project J 3259-N13 ''Structure in Reinforcement Learning'' (2012)

FWF project P 26219-N15 ''Structured and Continuous Reinforcement Learning'' (2014-2016)

OEAD project SI 13/2020 ''Structural and symmetry properties of graph products '' (2020-2022)

FWF project TAI-590 ''Reinforcement Learning: Beyond Optimality'' (2022-2024)

FWF project PAT6918624 “From Satisficing to Optimization” (2025-2027)

Book

Christos Dimitrakakis and Ronald Ortner:
Decision Making Under Uncertainty and Reinforcement Learning - Theory and Algorithms.
Intelligent Systems Reference Library 223, Springer 2022, ISBN 978-3-031-07612-1, pp. 1-238

Papers

Markov Decision Processes and Reinforcement Learning

H.Hajiabolhassan, R.Ortner: Online Regret Bounds for Satisficing in MDPs,
to appear in: Mathematics of Operations Research,
(pdf)
R.Ortner: Markov Chain Estimation, Approximation, and Aggregation for Average Reward Markov Decision Processes and Reinforcement Learning,
to appear in: Handbook of Statistics 52 (2025),
(pdf)
T.Michel, H.Hajiabolhassan, R.Ortner: Regret Bounds for Satisficing in Multi-Armed Bandit Problems,
In: Transactions on machine learning research. 08 2023,
(pdf)
P.Gajane, P.Auer, R.Ortner: Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms,
In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023). pp. 3714-3722.
(pdf)
R.Ortner: Regret Bounds for Reinforcement Learning via Markov Chain Concentration,
Journal of Artificial Intelligence Research 67, 115-128 (2020)
(pdf)
Note: Corollarys 2 and 3 actually only bound the difference to the (expected) distribution after n steps, not the stationary distribution as claimed. However, the distance between the respective terms µ^n·r and µ·r can simply be bounded by b/n, where b is the bias span of the Markov reward process. The bounds one obtains accordingly contain the maximal bias span over all policies as additional parameter, but are otherwise only worse by a constant.
R.Ortner, M.Pirotta, A.Lazaric, R.Fruit, O.Maillard: Regret Bounds for Learning State Representations in Reinforcement Learning,
In: Advances in Neural Information Processing Systems 32 (2019), pp. 12717-12727.
(pdf)
Note: Unfortunately, the proofs of the regret bounds for the schemes given in Sections 5 and 6.2 (when guessing the diameter resp. the size of the state space) have a gap that seems not easy to close. The other results are however not affected.
R.Ortner, P.Gajane, P.Auer: Variational Regret Bounds for Reinforcement Learning,
Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019, paper 16
(improved version pdf)
P.Auer, P.Gajane, R.Ortner: Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes,
PMLR Volume 99 : Conference on Learning Theory, COLT 2019, pp. 138-158
(pdf)
P.Auer, Y.Chen, P.Gajane, C.Lee, H.Luo, R.Ortner, C.Wei: Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information (joint abstract),
PMLR Volume 99 : Conference on Learning Theory, COLT 2019, pp. 159-163
(pdf)
R. Fruit, M. Pirotta, A. Lazaric, R. Ortner: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning,
PMLR Volume 80 : Proceedings of the 35th International Conference on Machine Learning, ICML 2018, pp. 1573-1581
(pdf)
P.Auer, C.-K.Chiang, R.Ortner, M.Drugan: Pareto Front Identification from Stochastic Bandit Feedback,
JMLR Workshop and Conference Proceedings Volume 51 : Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, pp. 939-947.
(pdf)
V.Gabillon, A.Lazaric, M.Ghavamzadeh, R.Ortner, P.Bartlett: Improved Learning Complexity in Combinatorial Pure Exploration Bandits,
JMLR Workshop and Conference Proceedings Volume 51 : Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, pp. 1004-1012.
(pdf)
R.Ortner: Optimal Behavior is Easier to Learn than the Truth,
Minds and Machines 26(3), 243-252 (2016)
(open access pdf)
K.Lakshmanan, R.Ortner, and D.Ryabko: Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning,
JMLR Workshop and Conference Proceedings Volume 37 : Proceedings of The 32nd International Conference on Machine Learning, ICML 2015, pp. 524-532.
(preprint pdf)
R.Ortner, O.Maillard, and D.Ryabko: Selecting Near-Optimal Approximate State Representations in Reinforcement Learning,
Proceedings of the 25th International Conference on Algorithmic Learning Theory, ALT 2014.
Lecture Notes in Computer Science 8776, Springer 2014, pp. 140-154.
(extended submission pdf)
Note: The paper uses material from ICML 2013 paper that contains an error, see below. However, some of the results should still hold when using the algorithm suggested in the NeurIPS 2019 paper.
R.Ortner, D.Ryabko, P.Auer, and R.Munos: Regret Bounds for Restless Markov Bandits,
Theoretical Computer Science 558, 62-76 (2014)
(preprint pdf)
P.Nguyen, O.Maillard, D.Ryabko, R.Ortner: Competing with an Infinite Set of Models in Reinforcement Learning,
JMLR Workshop and Conference Proceedings Volume 31 : Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, AISTATS 2013, pp. 463-471.
(preprint pdf)
O.Maillard, P.Nguyen, R.Ortner, and D.Ryabko: Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning,
JMLR Workshop and Conference Proceedings Volume 28 : Proceedings of The 30th International Conference on Machine Learning, ICML 2013, pp. 543-551.
(corrected preprint pdf)
Note: Unfortunately, there is a serious error detected by Ronan Fruit in the proof of the main result that seems not easy to repair, see App. A.2 of this paper for details. However check out the NeurIPS 2019 paper, which is able to re-establish some of the results with a simpler algorithm.
R.Ortner: Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes,
Ann.Oper.Res. 208(1), 321-336 (2013)
(preprint pdf)
R.Ortner, D.Ryabko: Online Regret Bounds for Undiscounted Continuous Reinforcement Learning,
In: Advances in Neural Information Processing Systems 25 (2012), pp. 1772-1780.
(preprint pdf)
R.Ortner, D.Ryabko, P.Auer, and R.Munos: Regret Bounds for Restless Markov Bandits,
In: Proceedings of the 23th International Conference on Algorithmic Learning Theory, ALT 2012.
Lecture Notes in Computer Science 7568, Springer 2012, pp. 214-228.
(extended version pdf)
P.Auer and R.Ortner: UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem,
Period.Math.Hungar. 61(1-2), pp. 55-65 (2010).
(corrected preprint pdf)
T.Jaksch, R.Ortner, and P.Auer: Near-optimal Regret Bounds for Reinforcement Learning,
J.Mach.Learn.Res. 11, pp. 1563-1600 (2010).
R.Ortner: Exploiting Similarity Information in Reinforcement Learning. Similarity Models for Multi-Armed Bandits and MDPs.
In: ICAART 2010. Proceedings of the 2nd International Conference on Agents and Artificial Intelligence, Volume 1 (Artificial Intelligence), pp. 203-210.
(pdf)
R.Ortner: Online Regret Bounds for Markov Decision Processes with Deterministic Transitions,
Theoretical Computer Science 411(29-30), pp. 2684-2695 (2010).
(pdf)
P.Auer, T.Jaksch, and R.Ortner: Near-optimal Regret Bounds for Reinforcement Learning,
In: Advances in neural information processing systems 21 (2009), pp. 89-96.
R.Ortner: Optimism in the Face of Uncertainty Should be Refutable,
Minds and Machines 18(4), pp. 521 - 526 (2008).
(preprint pdf)
R.Ortner: Online Regret Bounds for Markov Decision Processes with Deterministic Transitions,
In: Proceedings of the 19th International Conference on Algorithmic Learning Theory, ALT 2008.
Lecture Notes in Computer Science 5254, Springer 2008, pp. 123-137.
(preprint pdf), (improved journal version pdf)
R.Ortner: Pseudometrics for State Aggregation in Average Reward Markov Decision Processes,
In: Proceedings of the 18th International Conference on Algorithmic Learning Theory, ALT 2007.
Lecture Notes in Computer Science 4754, Springer 2007, pp. 373-387.
(extended version pdf)
P.Auer, R.Ortner, and C. Szepesvári: Improved Rates for the Stochastic Continuum-Armed Bandit Problem,
In: Learning Theory, 20th Annual Conference on Learning Theory, COLT 2007.
Lecture Notes in Computer Science 4539, Springer 2007, pp. 454-468.
(preprint pdf)
R.Ortner: Linear Dependence of Stationary Distributions in Ergodic Markov Decision Processes,
OR Letters 35(5), pp. 619-626 (2007).
(preprint pdf)
P.Auer, R.Ortner: Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning,
In: Advances in Neural Information Processing Systems 19 (2007), pp. 49-56.
(pdf)

Combinatorial Geometry

R.Ortner: Forcing Subarrangements in Complete Arrangements of Pseudocircles,
J. Comput. Geom. 6(1), pp. 235-248 (2015).
J.Linhart, R.Ortner: A Note on Convex Realizability of Arrangements of Pseudocircles,
Geombinatorics Vol XVIII, Issue 2, pp. 66-71 (2008).
(corrected version pdf)
R.Ortner: Improved Upper Bounds on the Number of Vertices of Weight <= k in Particular Arrangements of Pseudocircles,
24th European Workshop on Computational Geometry, pp. 35-38 (2008).
(corrected version pdf)
R.Ortner: Embeddability of Arrangements of Pseudocircles into the Sphere,
European J. Combin. 29, pp. 457-469 (2008).
(corrected preprint pdf)
J.Linhart, R.Ortner: An Arrangement of Pseudocircles not Realizable with Circles,
Beiträge Algebra Geom. Vol. 46, No. 2, pp. 351-356 (2005).
(preprint pdf)
J.Linhart, R.Ortner: On the Combinatorial Structure of Arrangements of Oriented Pseudocircles
Electron. J. Combin. 11 (2004), Research Paper 30, 13 pp. (electronic).
(pdf)

Graph Theory, Random Walks, and Markov Chains

R.Ortner: A Note on the Bias and Kemeny's Constant in Markov Reward Processes with an Application to Markov Chain Perturbation.
submitted (2024)
(preprint pdf)
E.Dobson, A.Hujdurovic, W.Imrich, R.Ortner: When is Cartesian Graph Product a Cayley Graph?
In: Proceedings of the 12th European Conference on Combinatorics, Graph Theory and Applications, pp. 362-367 (2023)
(pdf)
R.Ortner, W.Woess: Non-backtracking Random walks and Cogrowth of Graphs,
In: Journal Canadian J. Math. 59(4), pp. 828-844 (2007).
(preprint pdf)

None of the above

R.Ortner: Adaptive Algorithms for Meta-Induction.
In: Journal for General Philosophy of Science 54, pp.433-450 (2023).
(preprint)
M.Heininger and R.Ortner: Predicting Packaging Sizes Using Machine Learning.
In: Operations Research Forum Vol. 3, No. 3 (2022).
P. Auer, G. Dósa, T. Dulai, A. Fügenschuh, P. Näser, R. Ortner, and Á. Werner-Stark: A new heuristic and an exact approach for a production planning problem,
In: Central European Journal of Operations Research 29, pp. 1079-1113 (2021).
H.Leitgeb and R.Ortner: Mechanizing Induction,
In: Dov Gabbay, Stephan Hartmann, John Woods (ed.), Handbook for the History of Logic, Vol. 10: Inductive Logic. Elsevier. pp 719-772, 2011.
M.Antenreiter, R. Ortner, and Peter Auer: Combining Classifiers for Improved Multilabel Image Classification,
In: Learning from Multi-label Data, MLD Workshop at ECML 2009, pp. 16-27.
(pdf)
P.Auer, R.Ortner: A New PAC-bound for Intersection-closed Concept Classes,
In: Machine Learning, Vol. 66, No. 2-3, pp. 151-163 (2007).
(preprint)
P.Auer, R.Ortner: A Boosting Approach to Multiple Instance Learning,
In: Machine Learning, 15th European Conference on Machine Learning, ECML 2004.
Lecture Notes in Artificial Intelligence 3201, Springer 2004, pp. 63-74.
(extended version)
P.Auer, R.Ortner: A New PAC-bound for Intersection-closed Concept Classes,
In: Learning Theory, 17th Annual Conference on Learning Theory, COLT 2004.
Lecture Notes in Computer Science 3120, Springer 2004, pp. 408-414.
(extended version)

Wir verwenden Cookies