Project Description
Human interaction with the world is inherently multimodal [1]. However, much of HCI
research has historically focused on unimodal interaction models, with a single input
datastream and a single output datastream. Augmented Reality enables the physical
world to be our interaction interface, and allows users to interact with virtual content
situated in their real environment, rather than on a screen. Multimodal interaction has
even more value in this computing paradigm, since the primary goal of AR is to enable
seamless interaction with both virtual content and real objects in the environment.
The broader goal of this project is to evaluate and optimize multimodal interaction in
augmented reality. This will be achieved through an analysis of existing unimodal and
multimodal interactions (from large-scale publicly available datasets, as well as smaller
datasets collected in the lab), and then applying the insights to the design of a
multimodal interface in AR that is optimized for the user’s current environment and task.
Possible directions for the project (depending on student interest) include:
- applying computer vision techniques on large-scale publicly available video
datasets(e.g. EPIC-KITCHENS, Ego4D [4]), to extract information about the use
of specific modalities in real-life situations
- predicting the optimal (combination of) modalities for specific tasks using
available datasets
- using Unity to develop and benchmark standard interaction tasks on different
headsets using modalities of interest.
Prerequisite Information
The project is coding-heavy, so students need to have
completed at least one lower division programming course (CS 9/16/24). Some
experience with computer vision (OpenCV) or Unity will be helpful, depending on the
direction they choose.
Knowledge/Skills to Acquire (with guidance from mentors)
Machine learning and computer vision concepts, Unity and graphics concepts
Team Members
- Ajay Liu
- Qi Wu
- Frank Zhong
- Towela Phiri
Professor and Mentors
- Prof. Tobias Hollerer
- Grad mentors: Radha Kumaran and Alex Rich
Meeting Times
- Mentor Meetings
- TBD
- ERSP Team Meetings
- Tuesdays, 5-7 p.m.
Research Logs
References
[1]Turk, M. (2014). Multimodal interaction: A review. Pattern recognition letters, 36, 189-195.
[2] Oviatt, S. (2022). Multimodal interaction, interfaces, and analytics. In Handbook of Human
Computer Interaction (pp. 1-29). Cham: Springer International Publishing.
[3] Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., & Liu, J. (2022). Human action
recognition from various data modalities: A review. IEEE transactions on pattern analysis and machine intelligence.
[4] Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J.,
Jiang, H., Liu, M., Liu, X. and Martin, M., 2022. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (pp. 18995-19012).
[5] Park, K. B., Choi, S. H., Lee, J. Y., Ghasemi, Y., Mohammed, M., & Jeong, H. (2021). Hands- free human–robot interaction using multimodal gestures and deep learning in wearable mixed reality. IEEE Access, 9, 55448-55464.
[6] Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of
multimodal input in an augmented reality environment. Virtual Reality, 17, 293-305.