Project Description

Offline Reinforcement Learning is one of the hottest area in the machine learning community, and an indispensable step towards applying reinforcement learning to real life applications.  Recently, a good empirical benchmark for off-policy evaluation methods in RL has been proposed [1], showing that model-based approaches that use marginalized importance weighting [2] and fitted Q-iterations can significantly outperform traditional approaches in offline evaluation.   The project aims at empirically evaluating these state-of-the-art methods developed for offline policy learning, and hopefully, providing empirical insight into what methods work better in each regime of interest.

References:  

[1]. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning: https://arxiv.org/abs/1911.06854

[2]. Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling: https://arxiv.org/abs/1906.03393

[3]. Breaking the curse of horizon: Infinite-horizon off-policy estimation: https://arxiv.org/abs/1810.12429

Team Members

Professor and Mentors

Meeting Time

  • Meeting with Mentor:
    • Location: Zoom
    • Time: Tuesdays, 10am
  • Meeting with Team:
    • Location: Zoom
    • Time: Saturdays, 2pm
  • Meeting with Prof. Mirza and Prof. Eiers:
    • Location: Zoom
    • Time: Th, 9:30-10am

Links to Proposals and Presentation

  • Proposal link
  • Final presentation:

Individual Logs

Peer Review

Project Documentation and Resources

Poster