Contribution

Recording a Dataset of Audiovisual Speech for Augmented Reality Studies

* Presenting author
Day / Time: 20.03.2025, 09:40-10:00
Type: Lecture (in a structured session)
Abstract ID: DAS-DAGA2025/541
Abstract: Access to high-quality audio stimuli is crucial for virtual and augmented reality audio research. Specifically, audio-visual speech recordings that can be superposed in real scenes are critical for studies regarding augmented reality telepresence applications. Here, we present a dataset of anechoic speech recordings, including 3D point cloud videos. Speech from 21 subjects was captured in a large anechoic chamber using a high-quality, low-noise, calibrated microphone system with a recording distance of 1.5 meters. A depth camera and a green screen were employed to capture the speakers visually. The dataset includes Harvard sentences, scripted conversations between three speakers, sentences in the subjects' native languages, and the same sentence spoken at varying voice levels. This contribution provides documentation of the dataset and points to initial studies using the data, highlighting its potential applications.