Professor Yu-Xiang Wang's Group

Project Description

Offline Reinforcement Learning is one of the hottest area in the machine learning community, and an indispensable step towards applying reinforcement learning to real life applications. Recently, a good empirical benchmark for off-policy evaluation methods in RL has been proposed [1], showing that model-based approaches that use marginalized importance weighting [2] and fitted Q-iterations can significantly outperform traditional approaches in offline evaluation. The project aims at empirically evaluating these state-of-the-art methods developed for offline policy learning, and hopefully, providing empirical insight into what methods work better in each regime of interest.

References:

[1]. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning: https://arxiv.org/abs/1911.06854

[2]. Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling: https://arxiv.org/abs/1906.03393

[3]. Breaking the curse of horizon: Infinite-horizon off-policy estimation: https://arxiv.org/abs/1810.12429

Team Members

Ari Polakof (a_polakof@ucsb.edu)
Noah Pang (noahpang@ucsb.edu)
Qiru Hu (qiru@ucsb.edu)
Sara Mandic (smandic@ucsb.edu)

Professor and Mentors

Professor: Prof. Yu-Xiang Wang (yuxiangw@cs.ucsb.edu)
Mentor: Ming Yin (ming_yin@ucsb.edu)

Meeting Time

Meeting with Mentor:
- Location: Zoom
- Time: Tuesdays, 10am
Meeting with Team:
- Location: Zoom
- Time: Saturdays, 2pm
Meeting with Prof. Mirza and Prof. Eiers:
- Location: Zoom
- Time: Th, 9:30-10am

Project Description

References:

[2]. Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling: https://arxiv.org/abs/1906.03393

[3]. Breaking the curse of horizon: Infinite-horizon off-policy estimation: https://arxiv.org/abs/1810.12429

Team Members

Professor and Mentors

Meeting Time

Links to Proposals and Presentation

Individual Logs

Peer Review

Project Documentation and Resources