12:00 |
-
Masking versus Cognition during Speech Recognition in noise and reverberation: Can different sentence tests provide a quantitative estimate?
Anna Warzybok, Jan Rennies-Hochmuth, Birger Kollmeier
[Abstract]
This study investigates the effect of noise and reverberation on speech recognition for an open- and a closed-set sentence test. While both tests yield approximately the same recognition threshold in trained normal-hearing listeners, their performance may differ due to cognitive factors, i.e., the closed-set test is more sensitive to training effects while the open-set test is more affected by language familiarity. The experimental data were compared to predictions of the speech transmission index as a measure of pure acoustic effects. The largest differences between the open- and closed-set speech tests were measured in reverberation indicating a considerable influence of non-acoustic, cognitive factors. The recognition scores were on average 50% higher for the closed-set test with syntactically fixed and semantically unpredictable sentences than for the open-set test consisting of everyday sentences. To examine the underlying reasons, the closed-set test was presented to naïve listeners, with no training prior the measurements and no information about the test’s structure. Removing this information, the differences between the tests were not present indicating that the degree of familiarity with the speech material has a major impact on speech recognition. This indicates a strong cognitive factor which cannot be predicted by the speech transmission index.
|
12:20 |
-
A Steered-Response Power (SRP) based Framework for Sound Source Localization using Microphone Arrays in Reverberant Rooms for Enhancement of Speech Intelligibility
Muhammad Imran, Jin Yong Jeon
[Abstract]
Sound source localization methods using multi-channel signal processing have been widely adopted for 3D audio capturing, speech enhancement, and speaker localization. We propose a speech source localization framework based on the steered-response power (SRP) methods, for closed spaces containing ambient noise and reverberation, capable of estimating the robust direction of a speech source, with enhanced speech intelligibility.
For this study, a speech source localizer (SSL) algorithm, utilizing 3D microphone arrays, was designed using the array steered-response to improve the localization accuracy of single as well as multiple speech sources in full 3D space, with improved intelligibility of the captured sound. To design a framework for SRP, an adaptive beamformer, minimum variance distortionless response (MVDR) beamformer, was evaluated and purposed for accurate localization applications. In addition, an optimization framework based on clustering algorithms is suggested to improve the accuracy of localization estimation, and the confidence level of the measurements.
The measurements were carried out in a typical room, under different reverberation and background noise conditions, to validate the practicality of the algorithms. The results obtained demonstrate the efficiency of the proposed algorithms for the localization of a speech source with improved speech intelligibility.
|
12:40 |
-
The Effect of Objective Room Acoustic Parameters on Auditory Steady-State Responses
Valentina Zapata Rodriguez, James M Harte, Cheol-Ho Jeong, Jonas Brunskog
[Abstract]
Verification that Hearing Aids (HA) have been fitted correctly
in pre-lingual infants and hard-to-test adults is an important
emerging application in technical audiology. These test
subjects are unable to undergo reliable behavioral testing,
so an objective method is required. Auditory steady-state
responses (ASSR), recorded in a sound field is a promising
technology to verify the hearing aid fitting. The test involves
the presentation of the auditory stimuli via a loudspeaker,
unlike the usual procedure of delivering via insert
earphones. Room reverberation clearly may significantly
affect the features of the stimulus important for eliciting a
strong electrophysiological response, and thus complicate
its detection. This study investigates the effect of different room acoustic
conditions on recorded ASSRs via an auralisation approach
using insert earphones. Fifteen normal-hearing listeners
were tested using narrow-band (NB) CE-Chirps centered at
the octave-bands of 0.5, 1.0, 2.0 and 4.0 kHz. These stimuli
were convolved with impulse responses of three rooms
simulated using a Green’s function approach to recreate
different sound-field conditions. Comparisons with the
unmodified stimuli recordings (reference condition)
quantified that room acoustics significantly affects the
amplitudes of the ASSRs.
|
14:20 |
-
Reproduction of spherical microphone array impulse response measurements using higher order Ambisonics
Michelle C. Vigeant, David A. Dick
[Abstract]
The perception of listener envelopment is being studied using reproduced spherical microphone array impulse response (IR) measurements taken in a 2,000 seat concert hall. These IR measurements were obtained using a 32-element, 8.4cm-diameter spherical microphone array (mh acoustics Eigenmike em32) and were reproduced over a 30 loudspeaker array arranged in circular rings in an anechoic chamber. Playback over the loudspeaker array is accomplished using third-order Ambisonics, where the 3D sound field is represented in the spherical harmonics domain. Radial filters were used to equalize the measured spherical harmonic components and Heller’s Ambisonic Decoder Toolbox is used to decode the spherical harmonic signals into loudspeaker signals, which is implemented using a VST plug-in in the digital audio workstation software REAPER. The two-band decoder crosses over from basic decoding to max-rE decoding at 400Hz, which reduces side lobes and improves high-frequency localization cues. In addition, the decoder includes time delay and magnitude corrections to account for the distance from each loudspeaker to the center of the array and order-dependent high-pass filters for near field compensation. FIR filters are applied to equalize the signals for the frequency responses of the measurement loudspeaker, microphone capsules, and reproduction loudspeakers. [Work supported by NSF Award 1302741.]
|
14:40 |
-
Intelligibility of spatially reproduced speech over headphones under ambient noise
Noam R. Shabtai, Jonathan Sheaffer, Zamir Ben-Hur, Itai Nehoran, Matan Ben-Asher, Boaz Rafaely
[Abstract]
In speech communication applications, headset based spatial audio systems
may employ binaural output channels in order to improve the intelligibility of
speech in multi-speaker scenarios. In such a case, both the far-end speaker
and the far-end noise signals are usually transfered by the binaural spatial
audio system to the listener. However, in many applications such as a train
drive, shopping in a busy mall, or call centers, the listener is surrounded by
noise signals that are generated in its own environment. In such scenarios, the
source signal cannot be filtered from the noise before it is transmitted to the
listener. This work examines the effect of an artificially induced binaural output
on the intelligibility of speech as perceived by a listener who is surrounded by a
bubble noise. Binaural reproduction is obtained through non-personalized
head-related impulse responses, room simulations for enhanced
externalization, and a low latency head-tracking system to improve localization.
Two systems are investigated, the first is based on the Sound Scape Renderer
(SSR), and the second is based on a renderer developed by Waves Audio Ltd.
Speech reception threshold (SRT) is calculated using a listening test and
compared with the SRT when a mono source signal is used.
|
15:00 |
-
A robust 3D microphone array development for speaker tracking in ambient and noisy environments using GCC-PHAT Technique with improved SNR in speech
Jong Gak Seo, Jin Yong Jeon, Muhammad Imran
[Abstract]
A robust 3D microphone array system was designed and
developed to localize and track speech sources in environments
containing ambient noise and reverberation with improved speech
quality. In our array design, six microphone elements were
arranged in an open spherical configuration pattern with uniform
distribution. Previous methods to estimate speech source
localization typically depend on conventional time-delay estimation
techniques between two microphone pairs. However, such
methods often ignore the ambient noise, reflections from the
surroundings, and reverberations. The Generalized Cross
Correlation (GCC) with Phase Transform (PHAT) was adopted as
the weighting function, in order to accurately detect, localize, and
track speech sources. The GCC-PHAT weighting function is
capable of localizing speech in noisy and reverberant
environmental conditions. The algorithms developed were
evaluated for the proposed six channel microphone array system
under different conditions of noise and reverberation, to validate
the practicality of the system. The results demonstrate an
accuracy of 5∘. in speech source localization (Azimuth, Elevation),
with a Root Mean Square Error (RMSE) of 3∘.
|
15:20 |
-
Room-acoustic investigations of coprime linear microphone arrays
Ning Xiang, Dane Bush
[Abstract]
Coprime linear microphone arrays can obtain narrow beam patterns with significantly fewer, sparsely distributed microphones, exceeding the half wavelength limit. A coprime microphone array consists of two nested uniform linear subarrays with M and N microphones, where M and N are coprime with each other. When the subarray outputs are properly combined the two subarray outputs overlap with one another completely in just one direction retaining the shared beam while mostly canceling the other superfluous grating lobes. The narrow beampattern is on the order of that achieved using uniform linear arrays consisting of M times N microphones using delay-and-sum processing. Recently, the present authors have experimentally validated the coprime array theory [D. Bush & N. Xiang, J. Acoust. Soc. Am. 138, 447-456 (2015)], confirming that broadband processing of array signals can achieve narrow beam patterns while suppressing unwanted sidelobes. After a brief introduction of coprime array theory and broadband implementation, this paper will discuss results obtained in extremely reverberant room-acoustic environments using the coprime microphone arrays. Potential applications of enhancing speech intelligibility will also be discussed.
|
15:40 |
-
Performance Measures for Converting Partial-Sphere Array Recordings
Hannes Pomberger
[Abstract]
Spherical arrays with rigid angular boundaries allow for a representation of the sound field in terms of basis function orthogonal on the angular range. A corresponding set of orthogonal functions is non-isotropic due to the bounded angular range. Hence, direct rendering on a surrounding loudspeaker setup might exhibit direction-dependent panning artifacts which might disturb the perceived spatial image. This is in contrast to conventional Ambisonics employing spherical harmonics, an isotropic orthogonal basis on the sphere. Futhermore, the Ambisonical independence of recording and playback is impeded as the basis is tailored to the restricted angular range. To obtain independence, a suitable conversion to spherical harmonics is needed. The conversion to an isotropic basis tends to diminish the direction-dependent artifacts. Within this article we will discuss perceptually motivated performance measures to evaluate different conversion strategies in terms of mis-localization and direction-dependent loudness, source width and timbre variations and demonstrate their suitability in several case studies.
|
16:40 |
-
Dynamic voice directivity in room acoustics auralizations
Barteld Postma, Brian Katz
[Abstract]
The use of room acoustic auralizations has been increasing due to
the improving computing power available and the quality of
numerical modelling software. In such auralizations, it is often
possible to prescribe the directivity of an acoustic source in order to
better represent the way in which a given acoustic source excites the
room. However, such directivities are static, being defined according
to source ordination as a function frequency for the numerical
simulation. While sources such as a piano vary little over the course
of playing, it is known that voice directivity varies, sometimes
considerably, due to both dynamic orientation and phoneme
dependent radiation patterns linked to changes in mouth geometry.
This study presents an investigation of the inclusion of dynamic
directivity of the voice in auralizations for room acoustics. Said study
includes a presentation of the means in which dynamic directivity has
been incorporated into the geometrical acoustic modelling software
as well as subjective evaluations of the effect of including dynamic
directivity in a room acoustic auralization with a vocal source.
|
17:00 |
-
3D Surround Sound Transmission System between Remote Rooms for Coexistent Reality
Sanghun Nam, Joong-Jae Lee, Ju-Hyun Maeng, Eun-Mi Lee, Jong Gak Seo, Jin Yong Jeon, Bum-Jae You
[Abstract]
In this paper, we proposed a 3D surround sound transmission
system based on the Coexistent Reality Software Framework
(CRSF) in order to provide users with a coexistent experience to
communicate and collaborate in natural manners with the feelings of
immersive sounds. The CRSF registers the remote users in a
coexistent space and renders 3D surround sounds by using
binauralization for headphone. The panning of the sound is
dynamically updated by localizing the remote speakers in terms of its
directions (azimuth and elevation) and distance from listener and
updating the rendered sound in accordance with their locomotion in
real time using tracking devices. The three-dimensional virtual sound
produced by users’ interactions is generated and mixed with real 3D
surround sounds. Because different size and frequency of the
various types of data need to be exchanged with the remote users,
the CRNE (Coexistent Reality Network Engine) supports multi-
channel topology to avoid interference between various transfer
data. The proposed sound transmission system merges multiple
space on a network to be a coexistent space and generates the
surround sound, stereoscopic image and vibration effect
simultaneously. The result of experiment with 3D surround sound
have confirmed the important role of the improvement of
coexistence.
|
17:20 |
-
Spatial Sub-Nyquist Sampling Layouts for Compact Microphone Arrays
Till Rettberg, Sascha Spors
[Abstract]
Spatial audio recording with microphone arrays is a challenging task due to the comparatively large bandwidth of interest. At higher frequencies, the number of sensors required to achieve the spatial Nyquist rate is practically infeasible. Since spatial band-limiting filters are likewise not available in practice, the captured sound field is compromised by spatial aliasing. The surge of interest in Compressed Sensing in recent years has produced numerous results for alternative sampling strategies. These can be utilized to partially alleviate the problem of spatial aliasing in certain cases: If the sound field of interest can be represented by a small number of sources, sampling and reconstruction is often possible with fewer sensors than the Shannon-Nyquist criterion dictates. The associated sampling theorems are usually probabilistic in nature and hinge on randomly distributed sampling points. In this paper, we investigate whether these reconstruction guarantees are applicable to the moderate numbers of sensors typically feasible in non-sequential microphone arrays. Empirical results for recovery performance of random sampling layouts are presented in comparison to deterministic non-uniform layouts.
|
17:40 |
-
Speech Intelligibility with Spatially Symmetric Maskers in Reverberant Environments
Thomas Biberger, Stephan D. Ewert
[Abstract]
In daily life, verbal communication often takes place in indoor
situations and in the presence of interferers. Hereby, speech
intelligibility is affected by (i) masking caused by other interfering
sound sources and (ii) reverberation. In spatial configurations with a
frontal target speaker and two interfering sources symmetrically
placed to either side, spatial release from masking (SRM) is
observed in comparison to the configuration with co-located target
and interferers. In this case, the auditory system can make use of
temporally fluctuating interaural differences. Room reverberation
affects the temporal representation of the target and maskers, and,
moreover, the interaural differences depending on the spatial
positions of the listener, target and interferers in the room as well as
room acoustical properties. Here the effect of room acoustical
properties (T60, frequency dependent absorption coefficients),
temporal structure of the interferers, and target-masker positions on
SRM was assessed. Speech reception thresholds were measured
using the Oldenburger Satztest in a simulated room using
headphone-based virtual acoustics. The interferers were either
placed co-located with or symmetrical (± 60∘) to the frontal target
and the room acoustical properties were systematically changed.
The results are discussed and compared to predictions of a binaural
speech intelligibility model.
|