Applied Markov Decision Processes and Reinforcement Learning

Course

Postgraduate

Semester

Electives

Subject Code

AVD871

Subject Title

Applied Markov Decision Processes and Reinforcement Learning

Syllabus

Review of basic probability and stochastic processes. Introduction to Markov chains. Markov models for discrete time dynamic systems, Reward, Policies, Policy evaluation, Markov decision processes, Optimality criteria, Bellman’s optimality principle, Dynamic programming, Optimality equations, Policy search, Policy iteration, Value iteration. Generalized Policy Iteration, Approximate dynamic programming. Exploration versus Exploitation in Reinforcement learning, Multi armed and Contextual Bandits, Reinforcement learning setup and Model free learning, Monte Carlo learning, Q-learning & SARSA, Temporal difference learning, Function approximation, Policy gradient methods, Actor-critic methods, Stochastic approximation and its applications to reinforcement learning, Neural networks in reinforcement learning, Deep reinforcement learning. Applications and case studies of Markov decision processes and Reinforcement Learning in Machine Learning, Control, Communication, Robotics, and Optimization.

Text Books

Same as Reference

References

1. Reinforcement learning: An introduction, Richard S.Sutton and Andrew G.Barto. MIT press, 2018.

2. Dynamic programming and optimal control, Dimitri P. Bertsekas, Vols. I and II, Athena scientific, 2005.

3. Applied probability models with optimization applications, Sheldon M.Ross. Courier Corporation, 2013.

4. Introduction to stochastic dynamic programming, Sheldon M. Ross. Academic press, 2014.

Course Outcomes (COs):

CO1: Understand probability, stochastic processes (especially Markov chains) and their use in modelling of discrete time stochastic systems

CO2: Understand the theory of Markov decision processes and the problem of controlling discrete time dynamical systems and their formulation as Markov decision problems.

CO3: Implement various methods (such as value iteration and policy iteration) for solving Markov decision processes.

CO4: Design the reinforcement learning framework and apply to fundamental problems of exploration-exploitation dilemma and credit assignment.

CO5: Design, and implement reinforcement learning agents