Project Description

LLMs have demonstrated impressive capabilities in various code-related scenarios, such as CoPilot, an LLM-based coding assistant that generates code based on user-provided function descriptions. However, existing systems and evaluation benchmarks primarily focus on tasks at the function level. This project aims to evaluate LLMs' performance in tasks that require understanding the entire codebase, such as adding new functionality and enhancing code efficiency.

Several challenges emerge in this context, including effectively prompting LLMs to process and comprehend multiple code files and identify specific sub-regions that require modification. As an initial step, we plan to create an evaluation benchmark using public GitHub repositories and develop baselines to explore the capabilities of LLMs.

The students will be involved in the following tasks:

1. Collecting an evaluation benchmark using diverse sources of information from public GitHub repositories, including commit history, messages, feature requests, and bug reports in repository issues.

2. Designing and implementing baseline systems to evaluate the performance of LLMs in these tasks.

Prerequisite Information

None

Knowledge/Skills to Acquire (with guidance from mentors)

Students will be expected to learn to work with PyTorch and learn specific ML, NLP, and numerical optimization concepts as needed for their project.

Team Members

  • Peiyang Song
  • Lindsey Wen
  • Ethan Solomon
  • Riya Gupta

Professor and Mentors

  • Prof. Shiyu Chang
  • Grad mentors: Yujian Liu and Jiabao Ji 

Meeting Times

  • Mentor Meetings
    • TBD
  • ERSP Team Meetings
    • Wednesdays, 3:30-5:30 p.m.

Research Logs