Project Description

Natural Language Processing and Computer Vision are driving the breakthrough in deep learning research in recent years. In particular, pretrained transformer language models such as BERT, Roberta, and GPT-3 are achieving state-of-the-art results in many language understanding problems, while text-to-image models such as Open AI's Dall-e 2 and CLIP, Google's Imagen and Parti have obtained photorealistic generation results.  We believe that the next step for AI is to connect language+vision with actions in embodiments of AI agents. In this project, we aim to use large language models for robotic planning. Language planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps (e.g., from the goal to the detailed steps to make a cup of coffee by AI). Such procedural reasoning ability is essential for applications such as household robots and virtual assistants. Although language planning is a basic skill set for humans daily, it remains a challenge for large language models (LLMs) that lack deep-level commonsense knowledge in the real world. Previous methods require manual exemplars or annotated programs to acquire such ability from LLMs. In contrast, this project proposes using LLMs to elicit commonsense knowledge for complex problems.

Team Members

  • Ryan He
  • Danny Rose
  • Vaishnavi Himakunthala
  • Andy Ouyang

Professor and Mentors

  • Prof. William Wang
  • Grad mentor: Yujie Lu and Alex Mei

Meeting Time

  • Meeting with the Professor
    • Wednesdays 3:30-4p Henley Hall 1002
  • Meeting with Grad mentor
    • Wednesdays 5-6p Henley Hall 1002
  • ERSP meeting with central mentors
    • Chinmay: TBD
    • Diba: TBD
  • ERSP group meetings
    • Fridays 1:30-3:30p

Links to Proposals and Presentation

  • Proposal (first draft): link
  • Proposal (after peer review): link
  • Final Proposal (after instructor's feedback): link
  • Final presentation: link

Individual Logs

Peer Review

Project Documentation and Resource