Project Description
Offline Reinforcement Learning is one of the hottest area in the machine learning community, and an indispensable step towards applying reinforcement learning to real life applications. Recently, a good empirical benchmark for off-policy evaluation methods in RL has been proposed [1], showing that model-based approaches that use marginalized importance weighting [2] and fitted Q-iterations can significantly outperform traditional approaches in offline evaluation. The project aims at empirically evaluating these state-of-the-art methods developed for offline policy learning, and hopefully, providing empirical insight into what methods work better in each regime of interest.
References:
[1]. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning: https://arxiv.org/abs/1911.06854
[2]. Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling: https://arxiv.org/abs/1906.03393
[3]. Breaking the curse of horizon: Infinite-horizon off-policy estimation: https://arxiv.org/abs/1810.12429
Team Members
- Ari Polakof (a_polakof@ucsb.edu)
- Noah Pang (noahpang@ucsb.edu)
- Qiru Hu (qiru@ucsb.edu)
- Sara Mandic (smandic@ucsb.edu)
Professor and Mentors
- Professor: Prof. Yu-Xiang Wang (yuxiangw@cs.ucsb.edu)
- Mentor: Ming Yin (ming_yin@ucsb.edu)
Meeting Time
- Meeting with Mentor:
- Location: Zoom
- Time: Tuesdays, 10am
- Meeting with Team:
- Location: Zoom
- Time: Saturdays, 2pm
- Meeting with Prof. Mirza and Prof. Eiers:
- Location: Zoom
- Time: Th, 9:30-10am
Links to Proposals and Presentation
- Proposal link
- Final presentation: