Syllabus
Review of basic probability and stochastic processes. Introduction to Markov chains. Markov models for discrete time dynamic systems, Reward, Policies, Policy evaluation, Markov decision processes, Optimality criteria, Bellman’s optimality principle, Dynamic programming, Optimality equations, Policy search, Policy iteration, Value iteration. Generalized Policy Iteration, Approximate dynamic programming. Exploration versus Exploitation in Reinforcement learning, Multi-armed and Contextual Bandits, Reinforcement learning setup and Model free learning, Monte Carlo learning, Q-learning & SARSA, Temporal difference learning, Function approximation, Policy gradient methods, Actor-critic methods, Stochastic approximation and its applications to reinforcement learning, Neural networks in reinforcement learning, Deep reinforcement learning.
Applications and case studies of Markov decision processes and Reinforcement Learning in Machine Learning, Control, Communication, Robotics, and Optimization.
Text Books
Same as Reference
References
1. Reinforcement learning: An introduction, Richard S. Sutton and Andrew G. Barto, MIT press, 2018.
2. Dynamic programming and optimal control, Dimitri P. Bertsekas, Vols. I and II, Athena scientific, 2005.
3. Applied probability models with optimization applications, Sheldon M. Ross. Courier Corporation, 2013.
4. Introduction to stochastic dynamic programming, Sheldon M. Ross. Academic press, 2014.
Pre-requisites: Undergraduate Probability and Random Processes, Programming background
Course Outcomes (COs):
CO1: Understand probability, stochastic processes (especially Markov chains) and their use in modelling of discrete time stochastic systems
CO2: Understand the theory of Markov decision processes and the problem of controlling discrete time dynamical systems and their formulation as Markov decision problems.
CO3: Apply various methods (such as value iteration and policy iteration) for solving Markov decision processes.
CO4: Understand the reinforcement learning framework and the fundamental problems of exploration-exploitation dilemma and credit assignment.
CO5: Understand, design, and implement reinforcement learning agents and apply them to real world problems.