Project Description

Human interaction with the world is inherently multimodal [1]. However, much of HCI

research has historically focused on unimodal interaction models, with a single input

datastream and a single output datastream. Augmented Reality enables the physical

world to be our interaction interface, and allows users to interact with virtual content

situated in their real environment, rather than on a screen. Multimodal interaction has

even more value in this computing paradigm, since the primary goal of AR is to enable

seamless interaction with both virtual content and real objects in the environment.

The broader goal of this project is to evaluate and optimize multimodal interaction in

augmented reality. This will be achieved through an analysis of existing unimodal and

multimodal interactions (from large-scale publicly available datasets, as well as smaller

datasets collected in the lab), and then applying the insights to the design of a

multimodal interface in AR that is optimized for the user’s current environment and task.

Possible directions for the project (depending on student interest) include:

  • applying computer vision techniques on large-scale publicly available video

datasets(e.g. EPIC-KITCHENS, Ego4D [4]), to extract information about the use

of specific modalities in real-life situations

  • predicting the optimal (combination of) modalities for specific tasks using

available datasets

  • using Unity to develop and benchmark standard interaction tasks on different

headsets using modalities of interest.

Prerequisite Information

The project is coding-heavy, so students need to have

completed at least one lower division programming course (CS 9/16/24). Some

experience with computer vision (OpenCV) or Unity will be helpful, depending on the

direction they choose.

Knowledge/Skills to Acquire (with guidance from mentors)

Machine learning and computer vision concepts, Unity and graphics concepts

Team Members

  • Ajay Liu
  • Qi Wu
  • Frank Zhong
  • Towela Phiri

Professor and Mentors

  • Prof. Tobias Hollerer
  • Grad mentors: Radha Kumaran and Alex Rich

Meeting Times

  • Mentor Meetings
    • TBD
  • ERSP Team Meetings
    • Tuesdays, 5-7 p.m.

Research Logs

References

[1]Turk, M. (2014). Multimodal interaction: A review. Pattern recognition letters, 36, 189-195.

[2] Oviatt, S. (2022). Multimodal interaction, interfaces, and analytics. In Handbook of Human

Computer Interaction (pp. 1-29). Cham: Springer International Publishing.

[3] Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., & Liu, J. (2022). Human action

recognition from various data modalities: A review. IEEE transactions on pattern analysis and machine intelligence.

[4] Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J.,

Jiang, H., Liu, M., Liu, X. and Martin, M., 2022. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition (pp. 18995-19012).

[5] Park, K. B., Choi, S. H., Lee, J. Y., Ghasemi, Y., Mohammed, M., & Jeong, H. (2021). Hands- free human–robot interaction using multimodal gestures and deep learning in wearable mixed reality. IEEE Access, 9, 55448-55464.

[6] Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of

multimodal input in an augmented reality environment. Virtual Reality, 17, 293-305.