|
-
Design of Kronecker Product Beamformers with Cuboid Microphone Arrays
Xuehan Wang, Jacob Benesty, Gongping Huang, Jingdong Chen, Israel Cohen
[Abstract]
Microphone array beamforming has been widely used in a
wide range of acoustic applications. To make it effective in
suppressing noise yet able to preserve the fidelity and
quality of the broadband speech signals of interest, the
beamformer needs to be designed with high spatial gain,
consistent responses at different frequencies and high
robustness against array imperfections. A great deal of
efforts have been devoted in the literature to achieving this
goal, among which the Kronecker product beamforming
method developed recently has demonstrated some
interesting properties. This method decomposes the entire
beamforming filter into two smaller filters, each
corresponding to a virtual subarray. Then, each sub
beamforming filter is optimized so that the global
beampattern and directivity factor meet the design target.
This approach has been investigated for two dimensional
microphone arrays such linear and rectangular ones. In this
work, we extend this method to three-dimensional arrays.
Focusing on cuboid shape of microphone arrays, we discuss
how to decompose the global beamforming filter into a
Kronecker product of two to three sub-filters. Algorithms are
presented to design each sub-filter so that the global
beamformer has a high directivity factor and can be
steered flexibly in the three-dimensional space.
|
|
-
Impulse source localization with background noise in a reverberant environment by multiple sensors
Tiangang Wang, Yat-Sze Choy, Jungang Zhang
[Abstract]
During vibration testing of space hardware, breaking sounds can be heard when cracks happen in the structure. However, much efforts are needed to determine the position of fault in a complex large-scale model such as a satellite with background noise in reverberant environment. This paper focuses on implementing non-contact localization of the sound sources which are attributed to structure failures. In order to achieve this goal, a system for fault detection by sound source localization with limited number of sensors under the moderate reverberant environment is established based on the TDOA technology. The low-quefrency filtering technique is adopted as a dereverberation pre-processing approach to alleviate the reverberation effects due to the echoes from the reflections of the laboratory. In order to reject the interference sound sources, two criteria, the geometrical criterion and the cyclical check criterion, are introduced. A series of experiments are conducted to verify the performance of this system. The results show that this system can localize the crack sources and hitting sources accurately in a short time.
|
|
-
On the detection quality of early room reflection directions using compressive sensing on rigid spherical microphone array data
Frank Schultz, Sascha Spors
[Abstract]
The estimation of acoustic reflection coefficients from in-room measurements using one or more sources and microphone arrays has been addressed in various ways. One particularly illustrative method is the plane wave decomposition. Peaks in this representation can be assigned to the location and strength of mirror image sources. Traditional approaches employ rigid spherical microphone arrays together with modal beamforming. Compressive sensing (CS) techniques aim at undersampling by assuming sparsity of the soundfield in some given representation. By careful design of the sensing matrix, the number of required measurements can be considerably reduced compared to Nyquist sampling and reconstruction. However, often sparsity of the problem at hand is not given and a sufficient low mutual coherence of the sensing matrix is violated, yielding CS reconstruction with low robustness. We aim at robust detection of early, discrete reflections by means of CS using rigid spherical microphone array data. We discuss the interaction of room characteristics, sensing matrix and technical measures to qualify the detection performance. The influence of practical limitations, like the number of microphones and their self-noise is investigated, as well as the overall gain of applying CS to the problem compared to traditional modal beamforming techniques.
|
|
-
Reducing Transfer Function Measurement in Local Sound Field Reproduction using Acoustic Modelling
Qiaoxi Zhu, Xiaojun Qiu, Philip Coleman, Ian Burnett
[Abstract]
Broadband local sound field reproduction over an extended spatial region is a challenging problem, when only limited transfer function measurement is available. In this paper, an acoustic modelling based approach is proposed to reduce the required transfer function measurements in the local sound field reproduction. The proposed method only requires measuring the transfer functions from each source to a few samples over the boundary of the controlled region, and the transfer functions to the samples inside the controlled region are estimated through an efficient acoustic modelling. The simulation demonstrates that the proposed method requires less transfer function measurement than existing methods such as the least squares and the spatial harmonic decomposition methods.
|
|
-
Control of Sound Pressure in Audible Spot using Parametric Speakers
Takumi Hakamata, Hiroyoshi Yamashita, Keisuke Watanabe, Kotaro Hoshiba, Takenobu Tsuchiya, Nobuyuki Endoh
[Abstract]
We have studied about forming audible spot using parametric
speakers for playing an audible signal to particular person. In
previous work, the method to form audible spot using two
parametric speakers was proposed. In the method, different
tones of ultrasonic signals which have different frequency sent
by two parametric speakers are played at crossed point of
each signal. However, it has the problem that formed audible
area was not stable because frequency characteristics of each
speaker were not considered. This paper describes the
technique to form stably audible area. In this technique, sound
pressure of each speaker is controlled considering frequency
characteristics. The proposed method was evaluated by
experiments. Forming circular audible spot which diameter is 110
mm and sound pressure level is over 60 dB was tried. Although
audible area changed about 75% when sound pressure was
uncontrolled, stably audible spot could be formed by the
proposed method with the change less than 20%. As the result, it
is confirmed the usability of the proposed method.
|
|
-
Left-right sound localization outside loudspeaker positions in stereo reproduction with parametric loudspeakers
Shigeaki Aoki, Kouki Ito, Kazuhiro Shimizu, Suehiro Shimauchi
[Abstract]
The parametric loudspeaker is known as a super directional
loudspeaker by using nonlinear interaction between
ultrasounds. At first, the listening tests in stereo reproduction with
the parametric loudspeakers and the ordinary loudspeakers
were conducted. These loudspeakers were set at angle of -30
and +30 degrees. The obtained test results were analyzed and
discussed. It was confirmed that when ILDs were used as
binaural information, the directions of sound localization with
the parametric loudspeakers were more outside than those
with the ordinary loudspeakers. The obtained results were
interesting especially when ILDs were minus infinity and plus
infinity. In using the parametric loudspeakers, the directions of
sound localization were outside the loudspeaker. On the other
hand, in using the ordinary loudspeakers, those were not
outside the loudspeaker but inside. Moreover it was confirmed
that in the sound reproduction with the right and left
parametric loudspeakers by using HRTFs of -90 and 90 degrees
in the horizontal plane, the sound localization was able to
control to localize outside the position of the loudspeakers.
|
|
-
Pressure-matching-based 2D sound field synthesis with equivalent source array
Izumi Tsunokuni, Kurokawa Kakeru, Yusuke Ikeda
[Abstract]
Pressure matching(PM) Method is one of well-known effective
methods for physical sound field synthesis using massive loudspeaker
array. In PM method, to match a sound pressure at each matching
point as the desired one, all transfer functions between the
loudspeakers and matching points must be measured. Thus, as the
number of matching points and loudspeakers increases, it is more
difficult to measure the transfer functions by using a microphone. In
addition, it is also difficult to place a microphone with high accuracy
without a robot or massive microphone array. In this paper, we
proposed a method of local sound field synthesis with PM method
without large-scale measurements by estimating transfer functions
between the loudspeakers and the matching points from transfer
functions between the loudspeakers and small number of control
points. By sparse optimization, each equivalent source of the
loudspeaker is selected from a dictionary of mono pole sources placed
near each loudspeaker based on small number of transfer functions.
Driving functions of loudspeaker can be derived from transfer
functions between a lot of virtual matching points and all equivalent
sources. To evaluate our proposed method, two-dimensional
simulation experiments are conducted.
|
|
-
Determination of Optimal Parameters Using Metaheuristics for the Sound Zone Generation by the Least-Squares
Kazuya Yasueda, Daisuke Shinjo, Akitoshi Kataoka
[Abstract]
We propose a method to determine optimal parameters by metaheuristics for sound zone generation using the least-square method. We realize reproduction with sound pressure difference in multiple areas using loudspeaker array. It is realized by controlling the sound field with inverse filter based on transfer functions between each loudspeaker and control point. At that time, it is necessary to determine the number of control points, control point arrangement and regularization parameter. In the conventional method, these parameters are experimentally and empirically determined. Particularly, the control points often are set with regularly form in the target control area. In this paper, we discuss that determine these parameters using by the metaheuristics. The metaheuristics is the solution of mathematical optimization. We studied on applying metaheuristics such as the genetic algorithm and the simulated annealing to determine these parameters. We propose an appropriate evaluation function for sound pressure in each zone. We evaluated performance with computer simulation and performed experiment in actual environment. As the result, we confirmed improvement of performance with relative sound pressure level, spectral distortion and frequency response.
|
|
-
Single-channel signal Features for Estimating Microphone Utility for Coherent Signal Processing
Michael Günther, Andreas Brendel, Walter Kellermann
[Abstract]
Many microphone array signal processing techniques, e.g.,
for beamforming or localization, rely on coherent input
signals. However, low inter-channel coherence may result
from the occlusion of microphones, reverberation, or the
presence of undesired signal components, so that the
according signals contribute little to the overall algorithmic
performance. Thus, ranking the microphone channels by
their utility for subsequent coherent signal processing
schemes is of considerable interest. Direct estimation of the
pair-wise coherence is often straight-forward in compact
microphone arrays when all microphones share a common
sampling clock, while acoustic sensor networks require a
potentially costly time synchronization of the microphone
signals. In this case, estimating the channel utility ranking
from simpler, per-channel features, e.g., statistical moments
of the time-domain signal waveform or of the corresponding
magnitude spectrum, instead of the signal coherence
facilitates the clustering of useful sensor nodes for a particular
task. This approach further offers a way to determine whether
it is worth the effort to synchronize the sensor signals in a
sensor network, thereby saving computational power and
data rate. Therefore, in this contribution, we investigate the
efficacy of different signal features for the estimation of the
microphone channel utility ranking.
|
|
-
On the Use of Spherical Microphone Arrays in a Classical Musical Recording Scenario
Johann-Markus Batke
[Abstract]
The usage of spherical microphone arrays in virtual reality production environments have been seen to be steadily increasing. This is understood to be a natural progression since the microphone arrayës output is easily converted to the Higher Order Ambisonics (HOA) sound field representation. However, for the recording of classical music, there is still much debate as to whether the application of spherical microphone arrays makes any sense at all. Depending on the targeted playback format, various established coincident single microphone setups, as well as spaced setups are preferred. Spaced microphone setups are particularly popular since they ensure a certain degree of decorrelation for the resulting loudspeaker signals. On the other side, microphone array recordings have a much higher degree of freedom with regard to spatial post processing. This contribution investigates how advantages of both recording strategies can be combined. The recording of a string quartet is used as a well known recording scenario for classical music. A spaced AB recording serves as the basis. A microphone array in a central position within the quartet another one distant from the quartet are used to enhance the AB signals. Different target formats including stereo, surround as well as 3D audio are discussed.
|
|
-
A Target Direction Search Algorithm Based on Microphone Array
Xubin Liang, Difeng Sun, Tianqing Zhao, Liangyong Zhang, Houlin Fang, Fang Zhang
[Abstract]
The use of a microphone array to obtain the direction of a far-field
target is an important means of impacting-point detection. In this
paper, a new algorithm is proposed for the pulsed acoustic signal
orientation. By calculating the Euclidean distance between the
position vector projection and the acoustic path difference of each
microphone at different azimuth angles, the angle corresponding to
the minimum distance is the target azimuth. In the case of poor or
damaged signal quality of some microphones, the minimum distance
searched out will be relatively large. By setting the threshold value,
the participating operation can be eliminated, and the accuracy and
stability of the orientation can be improved. In this paper, the
principle of target direction search is introduced, the accuracy and
stability of orientation are simulated and experimentally verified, and
compared with the traditional algorithm, the algorithm is proved to
be not only high precision but also strong. It has important reference
significance for the practical engineering application of passive sound
detection such as impact point measurement.
|
|
-
Automatic Choice of Microphone Array Processing Methods for Acoustic Testing
Ennes Sarradj, Gert Herold, Simon Jekosch
[Abstract]
If microphone arrays are to be used in acoustic testing, a
signal processing method must be applied to produce
measurement results such as spectra and location of sources
from the data. It is well know that different processing
methods may produce different results, so the practical
question arises which method to choose. We propose some
strategies that allow for a-priori choice of the most suitable
processing method from the raw measured data. One
method uses the eigenvalue spectrum of the measured cross
spectral matrix while another is based on the classical
beamformer output and a neural network. After estimating
the apparent number of sources and the dynamic range, the
best method is looked up based on the statistical analysis of
a large number of synthetic test cases. The procedure is
demonstrated using a practical example.
|
|
-
Wind noise removal from mixture with speech: Using Wiener filter and invariant frequency beamforming
Fan-Jie Kung
[Abstract]
In this paper, we design a system to help the hearing-impaired
to hear clearly from a speaker in a noisy environment especially
in high winds. The amplitude and frequency of speech are
important for the hearing-impaired. They can hear better under
the speech of high amplitude and low frequency. So how to
design a system for them is a critical issue. First, we assume that
the wind noise mostly occurred in the low frequency so that we
can utilize the Wiener filter to reconstruct the clean
speech. Second, we assume that a listener and a speaker
always face to face to talk, so that we can utilize the
beamforming technique to enhance the speech. For
microphone array beam pattern, the beam width change with
the frequency variance. Therefore, some methods are used like
minimum variance distortionless response (MVDR) and linearly
minimum constrained variance (LMCV). Notwithstanding, these
two methods are computation complexity. Hence, we use the
multi-beamforming method to get the constant beamwidth with
low complexity. The simulation results show that this system
improve the quality of speech in high winds.
|
|
-
A Maximum-Achievable-Directivity Beamformer with White-Noise-Gain Constraint for Spherical Microphone Arrays
Xi Chen, Gongping Huang, Jingdong Chen, Jacob Benesty
[Abstract]
In microphone array beamforming, it is desirable to achieve a directive gain as high as possible for maximum acoustic noise rejection. The well-known superdirective beamformer was developed for this purpose; but it is sensitive to array imperfections such as sensors’ self noise, sensor placement errors, mismatch among sensor responses, etc, which restrict its application in practical systems. To circumvent this lack of robustness, we recently developed a differential beamforming method that can achieve a flexible compromise between the directivity factor (DF) and the level of white noise gain (WNG) by adjusting the value of a control parameter. This principle is further extended in this work. We present a beamforming method with spherical microphone arrays. It first determines the order of the differential beamformer in an analytical way based on the minimum level of WNG that is tolerable by the array in the frequency band of interest. This order is then applied to design the beamformer to achieve the maximum possible DF. The advantage of this approach over the existing ones is that it is robust to implement and yet can achieve the maximum possible DF.
|
|
-
Neural Network-based Broadband Beamformer with Less Distortion
Mitsunori Mizumachi
[Abstract]
Beamforming has been one of the important issues in the field of multi-channel signal processing including acoustic signal processing. A wide variety of beamformers have been proposed for each application. In general, acoustic beamforming deals with broadband signals such as speech signals compared to narrowband beamforming for antenna array and radar applications. Recently, neural network-based non-linear beamformers become popular but have a problem that causes an annoying non-linear distortion on the output signal. In the case of speech enhancement, it is a serious problem because our auditory system is highly sensitive to artificial non-linear distortion on speech signals. This paper proposes to solve the problem with the relaxed dual cost functions in the neural network-based beamformer for speech enhancement. The primary cost function aims at sharpening the beam-pattern, and the second cost function is introduced to achieve decreasing speech distortion. Those cost functions are alternatively used for optimizing the beam-pattern in the frequency range of speech signals. The feasibility of the proposed method is confirmed by computer simulation with a small amount of training data, which include sinusoidal signals, random noises, and speech signals.
|
|
-
Estimating sound intensity from acoustic data captured by parallel phase-shifting interferometry
Fumihiko Imaeda, Risako Tanigawa, Kenji Ishikawa, Kohei Yatabe, Yasuhiro Oikawa
[Abstract]
Visualizing sound fields is important to understand them intuitively. Recently, sound field visualization using a parallel phase-shifting interferometer (PPSI) has been proposed. This optical method can observe a sound field instantaneously and quantitatively without placing any object inside the field. Thus, sound fields difficult for the ordinary instruments to measure can be investigated by PPSI such as the fields inside small cavity or air flow. After measurement, the observed data must be analyzed to obtain meaningful information in terms of acoustics. However, such analysis has not been studied much as PPSI itself is a newly developed method. In this paper, we estimate sound intensity from the data captured by PPSI. The number of the observation points of our PPSI system is up to 262,144, where the interval between the adjacent points is 0.22 mm. Therefore, sound intensity can be estimated densely at quite a lot of points. In addition, noise reduction is considered to aid the estimation. The accuracy of estimated sound intensity is investigated through numerical experiments, and then real data observed by PPSI are analyzed for visualizing the sound intensity.
|
|
-
Investigation into Transaural System with Beamforming Using a Circular Loudspeaker Array set at Off-center Position from the Listener
Yu Ito, Yoichi Haneda
[Abstract]
A transaural system based on a crosstalk canceller is effective
for virtual acoustic imaging when two loudspeakers are
arranged in front of the listener. However, when the two
loudspeakers are arranged off-center from the listener, the
crosstalk canceling performance declines drastically, especially
the stereo dipole system. For example, when both loudspeakers
are located on the left front of the listener, the sound pressure
level in the left ear is higher than that in the right ear. In such a
situation, it is difficult to cancel sound in the left ear and
reproduce the desired sound in the right ear. In order to
reproduce equal sound pressure levels in both ears, we
introduce a method that forms two directivity beams using a
circular loudspeaker array.
Here, we consider two beams from the array to the listener’s
left and right ear as being equal to left channel and right
channel loudspeaker in the stereo dipole. To confirm the
performance of the proposed method, we compared the
stereo dipole and the proposed method through computer
simulations and subjective evaluation. The proposed method
improved the sound pressure level difference between the left
and right ear.
|
|
-
Subjective evaluation of Head-Related Transfer Functions reconstructed with Spatial Principal Component Analysis and their domain dependency
Shouichi Takane, Keisuke Sakamoto, Koji Abe, Kanji Watanabe, Masayuki Nishiguchi
[Abstract]
It is well known that data amount for spatial variation of
Head-Related Transfer Functions (HRTFs) can be compressed
by using Principal Component Analysis (PCA) with less
perceptual influence. The PCA of the HRTFs is called Spatial
PCA (SPCA). The author analyzed the effect of domain
selection for SPCA in objective aspect [S. Takane, Appl.
Acoust., 101, 64-77(2016)], but the effect is not analyzed in
subjective aspect. In this paper, the effect was investigated
using non-individual HRTFs with a 3 alternative forced choice
(3AFC) experiment. The domains for the SPCA were HRIR
(Head-Related Impulse Response), HRTF, amplitude of HRTF,
and log-amplitude of HRTF. In the latter two domains,
minimum phase was assumed for generating the phase
components of the HRTFs reconstructed from the SPCA
results. As a result, it was found that number of the
principal components required to reconstruct the HRTFs with
perceptually undetectable difference is small for all domains,
and this corresponds to the results of many researches.
Moreover, the HRTFs reconstructed from the results of the
SPCA in log-amplitude domain achieved the relatively low
number of the principal components, differed from the
results of the objective evaluation done by the authors
previous research.
|
|
-
Adhoc method to Invert the Reassigned Time-Frequency Representation
Shristi Rajbamshi, Peter Balazs, Nicki Holighaus
[Abstract]
Time-Frequency Representation (TFR) technique has been
extensively used for past many years to analyze non-stationary
signals. There have been several TFR methods, out of which TF
reassignment is one such method which was primarily developed to
improve the readability of existing TFRs, thereby proving itself to be
an efficient tool to analyze signals. Despite this, it has not gained
much popularity because of its non-bilinear nature and lack of
invertibility. Therefore, as a way to address this shortcoming, we
present an ad-hoc approach to invert the reassigned TFR.
|
|
-
Detection of clean time-frequency bins based on phase derivative of multichannel signals
Atsushi Hiruma, Kohei Yatabe, Yasuhiro Oikawa
[Abstract]
In this paper, a method for evaluating the cleanness of
each
time-frequency bin of multichannel spectrograms is
proposed.
When observing acoustical signals with noise and/or
interference, the degree of noisiness is usually
different for each
bin in the time-frequency domain. Therefore, array
signal
processing techniques should be possible to be
improved by
choosing only ”cleaner” bins, which contain less noise
and/or
interference, for extracting the spatial information. The
proposed method aims to distinguish such clean bins
from noisy
ones. To do so, the similarity of phase derivative
among
channels is considered since phase is sensitive to
noise and
interference. Constant phase is removed by derivative
so that
convolutive mixtures can be handled without care on
spatial
condition. The proposed method is applied to
direction-of-
arrival estimation (MUSIC method) and blind source
separation
(independent vector analysis) for demonstrating the
possibility
of the proposed measure.
|
|
-
Column-Wise Update Algorithm for Independent Deeply Learned Matrix Analysis
Naoki Makishima, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo
[Abstract]
In this paper, we propose a robust demixing filter update algorithm for audio source separation. Audio source separation is a task to recover source signals from multichannel mixtures observed in a microphone array, which can be applied to, e.g., speech recognition and music signal analysis. Recently, independent deeply learned matrix analysis (IDLMA) has been proposed as a state-of-the-art separation method. IDLMA utilizes deep neural network (DNN) inference of source models and blind estimation of demixing filters based on sources’ independence. In conventional IDLMA, iterative projection (IP) is exploited to estimate the demixing filters. Although IP is a fast algorithm, when the specific source model is not accurate owing to the bad SNR condition, the successive update of filters will fail hereafter. This is because IP updates the demixing filters in a source-wise manner where only one source model is used for each update. In this paper, we derive a new microphone-wise update rule which exploits all information of the source models simultaneously for each update. Moreover, we propose a method to select the appropriate source- or microphone-wise update rule depending on the source signal’s pseudo SNR estimated via the DNNs. Experimental results show the efficacy of the proposed method.
|
|
-
Deep Clustering for Single-Channel Ego-Noise Suppression
Annika Briegleb, Alexander Schmidt, Walter Kellermann
[Abstract]
In the context of audio signal processing for microphone-equipped robots, the robot’s self-created movement noise, so-called ego-noise, is a crucial problem. It massively corrupts the microphone signal and degrades the robot’s capability to interact intuitively with its environment. Therefore, ego-noise suppression is a key processing step in robot audition, which is commonly addressed using learning-based dictionary or template approaches. In this contribution, we introduce a deep-learning framework called Deep Clustering (DC) for ego-noise suppression in a single microphone channel, which was initially introduced by Hershey et al. for the task of speech separation. In DC, a bi-directional recurrent neural network is trained to embed each time-frequency bin of a mixture, containing ego-noise and speech, to a higher dimensional domain under the constraint that embeddings of bins dominated by ego-noise have maximal distance to those dominated by speech. During testing, clustering is performed in the embedding domain to assign each time-frequency bin uniquely to one of the two signal components and thereby allowing the estimation of both. We demonstrate that DC allows a significant reduction of ego-noise in the reconstructed signal. Additionally, we investigate the influence of the embedding size and the amount of training data on the suppression performance.
|
|
-
A Study on the Data Augment Method considering Room Transfer Functions for Acoustic Scene Classification
Minhan Kim, Seokjin Lee
[Abstract]
Acoustic scene classification (ASC) is a recognition problem
using a machine-based approach of sound caused by a daily
life. People can recognize an event by using a sound
information without seeing the event, so the sound-based
awareness algorithms have been researched recently. Recently,
the deep-learning-based algorithms to solve the ASC problem
have been researched mainly, but the algorithms require
massive number of data to achieve a good performance.
Therefore, the data augment method is important to apply the
deep-learning-based algorithm to the ASC problem. In this
paper, data augmentation method considering the acoustic
propagation in a room is researched, and it is applied to a
convolutive-neural-network-based ASC algorithm. A simulation
was performed with the dataset from DCASE 2018 challenges to
evaluate the proposed data augment algorithm, and the results
show that the proposed algorithm can enhance the state-of-
the-art method.
|
|
-
Real-Time Audio Processing on a Raspberry Pi using Deep Neural Networks
Fotios Drakopoulos, Deepak Baby, Sarah Verhulst
[Abstract]
Over the past years, deep neural networks (DNNs) have quickly grown into the state-of-the-art technology for various machine learning tasks such as image and speech recognition or natural language processing. However, as DNN-based applications typically require significant amounts of computation, running DNNs on resource-constrained devices still constitutes a challenge, especially for real-time applications such as low-latency audio processing. In this paper, we aimed to perform real-time noise suppression on a low-cost embedded platform with limited resources, using a pre-trained DNN-based speech enhancement model. A portable setup was employed, consisting of a Raspberry Pi 3 Model B+ fitted with a soundcard and headphones. A (basic) low-latency Python framework was developed to accommodate an audio processing algorithm operating in a real-time environment. Various layouts and trainable parameters of the DNN-based model as well as different processing time intervals (from 64 up to 8 ms) were tested and compared using objective metrics (e.g. PESQ, segSNR) to achieve the best possible trade-off between noise suppression performance and audio latency. We show that 6-layer DNNs with up to 200.000 trainable parameters can successfully be implemented on the Raspberry Pi 3 Model B+ and yield latencies below 16-ms for real-time audio applications.
|
|
-
Underwater Acoustic Recognition System for Detection of Low-altitude Moving Source
Tianqing Zhao, Xubin Liang, Difeng Sun, Fang Zhang, Deyu Sun
[Abstract]
Air targets detection underwater is an important part of
marine safety management. For the difference of physical
properties between water and air, there are many difficulties
for air sound detection launched underwater. In this paper,
we present a recognition system using both time and
frequency domain characteristics for low-altitude moving
source detection. In the detection phase, a short-to-long-
term energy ratio (SLR) is first obtained by analyzing the
coupling acoustic data. This initial SLR indicates the possibility
of target appearing. For signals exceeding SLR threshold, the
cepstral coefficients are extracted by a 4-Gammatone filter
with cochlea membrane nature. The validity of the
algorithm is proved by a test adopting BP network with both
synthetic and real data. The results indicating that the joint
detection algorithm can detect low-altitude moving targets
such as helicopters, and has a good anti-noise
performance.
|
|
-
A Study on Separation Method Combined Gamma-Process Non-negative Matrix Factorization and Deep Learning.
Jomae Satoru, Kenko Ota, Hideaki Yoshino
[Abstract]
Accurate analysis of fundamental frequency and chord constitutive notes is a hard problem. However, to solve this problem is important for similar music retrieval and music arrangement etc. Development of an accurate method for sound source separation is required to analyze the fundamental frequency etc. accurately. In this research, we propose a method of sound source separation that combines gamma-process non-negative matrix factorization (GaP-NMF) and Deep Neural Network (DNN).In the proposed method, we first estimate the basis with GaP-NMF. Then, DNN classifies the estimated basis according to musical instruments.The basis estimated by GaP-NMF is emphasized by multiplying with the spectrum template of musical instruments which is specified by DNN. We conducted a sound source separation experiment to verify the performance of the proposed method. Sound sources are composed of multiple musical instrument sounds. As a result of separating a single musical instrument sound from the sound sources, we confirmed that the proposed method improved the SNR by 1.4 dB over the conventional method depending on the data.
|
|
-
Detection of Boat Noise by a Convolutional Neural Network for a Boat Information System
Haruki Yamaguchi, Kenji Muto
[Abstract]
Some of boat noises cause noisiness and annoyance to a
resident near a canal. We proposed an information system to
notice about coming noisy boat by a cellphone using an
audiovisual effect. However, the system had a problem to
detect of boat coming at night by camera. Then we
investigated the train data to detect a boat noise in
environmental sound by a convolutional neural network
(CNN). In the detection of boat noise, we used the train data
using the spectrogram of the environmental sound. We
investigated the spectrogram configuration for improvement
of the boat noise detection. As a result, when the
spectrogram configuration was the time axis of 5 seconds
and the frequency axis between 10 and 750 Hz, the
detection performance became the highest accuracy of
over 95 %.
|
|
-
Effective Method for Screening Discharged Battery Using Support Vector Machine and High- Resolution Acoustic Analysis
Tomoaki Magome, Kan Okubo
[Abstract]
Alkaline dry batteries and nickel - metal hydride (NiMH) rechargeable batteries are
used worldwide for various portable devices that require continuous current.
Nevertheless, visual verifying such a battery as discharged or not remains as difficult
checking a watermelon for ripeness. Although one can detect a dead battery
using a battery indicator, such a useful tool is not always available. Therefore, a
simple and intuitive means of ascertaining whether a battery is dead or not must be
found to avoid problems such as battery leakage. In our previous work, we
proposed an acoustic analysis based method for estimating the discharge state of
an alkaline dry battery: a hammering test method can screen dead batteries by
analyzing the tone color of the tapping sound. In this report, we propose a more
effective method and apply it to NiMH rechargeable batteries. To improve the
decision accuracy, we also employ a support vector machine (SVM) and super
high-resolution recording system, which can obtain sound up to 100 kHz. Our
experimentally obtained results suggest that the proposed method can provide
effective screening.
|
|
-
Gated convolutional neural network-based voice activity detection under high-level noise environments
Li Li, Kouei Yamaoka, Yuki Koshino, Mitsuo Matsumoto, Shoji Makino
[Abstract]
This paper deals with voice activity detection (VAD) tasks
under high-level noise environments where signal-to-noise
ratios (SNRs) are lower than -5 dB. With the increasing needs
for hands-free applications, it is unavoidable to face
critically low SNR situations where the noise can be internal
self-created ego noise or external noise occurring in the
environment, e.g., rescue robots in a disaster or navigation
in a high-speed moving car. To achieve accurate VAD
results under such situations, this paper proposes a gated
convolutional neural network-based approach that is able
to capture long- and short-term dependencies in time series
as cues for detection. Experimental evaluations using high-
level ego-noise of a hose-shaped rescue robot revealed
that the proposed method was able to averagely achieve
about 86% VAD accuracy in environments with SNR in the
range of -30 dB to -5 dB.
|
|
-
Acoustic Remote Sensing for Irrigation Systems Control in Agriculture
Anna Radionova, Chandra Ghimire, Laura Grundy, Seth Laurenson, Stuart Bradley, Valerie Snow
[Abstract]
The paper addresses the problem of measuring free water
on the
surface of agricultural soils by an accurate real-time
acoustic
method. The generation of free water is the result of a fine
balance between the irrigation rate and the rate at which
the
soil can transport water away from the surface and is the
primary
cause of inefficient and environmentally-harmful losses
during an
irrigation event. The innovative component of the project
is
vested in the development of directional acoustic arrays
and
sophisticated signal processing which can remotely detect
the
onset of free water in the soil via changes in reflectivity.
The
proposed method estimates the amount of free water on
the soil
surface based on the changes in the amplitude of the
reflected
sound waves of the soil surface at different moisture levels.
Our
results show that sound wave reflectivity depends on the
proportion of the soil surface covered by water. The
presented
results are based on both laboratory and field
measurements, and
therefore form the basis of an inexpensive and accurate
free
water sensor for irrigation systems.
|
|
-
Sound Capture from Rolling-shuttered Visual Camera Based on Edge Detection
Koichi Terano, Hiroki Shindo, Kenta Iwai, Takahiro Fukumori, Takanobu Nishiura
[Abstract]
Recently, several new studies for recording sound have been
conducted on extracting sounds from images of an object
surface vibrated by sound waves. This method can capture the
sound with a high-speed camera instead of the air conduction
microphone. However, it is unpractical due to the high cost of
the high-speed camera. In this paper, we propose a method to
capture sound from the image of the object surface vibrated
by sound waves with a rolling-shuttered visual camera. This
camera uses a CMOS image sensor and writes in order from the
top row of the image. Therefore, when a moving object is
photographed, a rolling-shutter distortion in which the object is
distorted occurs due to the different writing time for each row.
The proposed method uses the edge detection to emphasize
the edge of the rolling-shutter distortion caused by
photographing the vibrating object, and the method resamples
the edge as the amplitude of the sound wave. As a result of the
sound capturing experiment, we confirmed that the proposed
method can capture pure tone from the image by using the
CMOS image sensor.
|
|
-
Designing Nearly Tight Window for Improving Time-Frequency Masking
Tsubasa Kusano, Yoshiki Masuyama, Kohei Yatabe, Yasuhiro Oikawa
[Abstract]
Many audio signal processing methods are formulated in
the time-frequency (T-F) domain which is obtained by the
short-time Fourier transform (STFT). The properties of the
STFT are fully characterized by window function, number
of frequency channels, and time-shift. Thus, designing a
better window is important for improving the performance
of the processing especially when a less redundant T-F
representation is desirable. While many window functions
have been proposed in the literature, they are designed
to have a good frequency response for analysis, which
may not perform well in terms of signal processing. The
window design must take the effect of the reconstruction
(from the T-F domain into the time domain) into account
for improving the performance. In this paper, an
optimization-based design method of a nearly tight
window is proposed to obtain a window performing well
for the T-F domain signal processing.
|
|
-
Noise-reducing Sound Capture Based on Exposure-time of Still Camera
Hiroki Shindo, Koichi Terano, Kenta Iwai, Takahiro Fukumori, Takanobu Nishiura
[Abstract]
A visual microphone has been proposed to capture the distant sound. This microphone captures the sound from high frame-rate video by using the pixel difference. It is expected to be applied in surveillance cameras because it is robust to be affected by the distance attenuation. However, this is an unpractical capturing method due to expensive-equipment requirement. In this paper, we aim to realize inexpensive visual microphones focused on the CMOS image sensor with rolling-shutter distortion. An image shot with this sensor has time information because the sensor writes image to each line of the element in sequence. Therefore, it is possible to capture sound from an image without the high frame-rate video if acoustic signals can be extracted from the time information included in the image. However, lower frequency noise is mixed in the captured sound using the sensor due to the inclination and distortion of the object to be photographed. This noise depends on the exposure-time of the sensor. We thus propose a method of suppressing the noise by designing a digital filter based on exposure-time. The experiment results show that the proposed method can suppress the noise compared with the original captured sound.
|
|
-
Time-Variant Acoustic Front-End Measurements of Active Noise Cancellation Headphones
Johannes Fabry, David Hilkert, Stefan Liebich, Peter Jax
[Abstract]
A robust design of active noise cancellation (ANC) headphones by means of digital signal processing requires a deep understanding of the underlying acoustic front-end. Particularly the primary path, which is defined as the transfer path between the outer and inner microphone of the headphone, and the secondary path, which is the transfer path between the loudspeaker and the inner microphone, are of interest. These paths may vary due to e.g. fitting of the headphone, direction of arrival or the physique of the user. In this contribution, we present the results of a measurement series with the objective to examine acoustic paths for their intra-person variances with influences such as jaw movement, head rotation and refitting of the headphone, as well as inter-person variance between different subjects. The measurements were conducted in an anechoic chamber with 25 participants from age 21 to 61. Furthermore, the implications on the performance for time-invariant feed-forward ANC solutions will be considered and the benefit of calibrating the secondary path investigated.
|
|
-
DFT-Filterbanks with Spectral Refinement and its Comparison with Polyphase Filterbanks
Mohammed Krini
[Abstract]
The most popular uniform analysis scheme applied for speech
enhancement periodically performs DFTs of overlapping and
windowed signal segments. However, due to the windowing of
successive signal segments, often a significant frequency
overlap arises among neighboring subbands. These overlapping
effects are undesirable as they limit the performance of
adaptive filters and the feature estimation in the subband
domain. In order to reduce this overlap without increasing the
DFT order, the so-called spectral refinement (SR) can be utilized.
The SR is based on a linear combination of weighted and shifted
speech segments and can be applied as a post-processing
stage after a DFT-based analysis filterbank. In this contribution
the SR is used as a predecessor to a DFT. It can be shown that
the resulting SR structure in the time-domain for DFT-based
analysis flterbanks looks similar to polyphase filterbanks. For
enhanced frequency selectivity of the analysis, preceding
weighted blocks need to be added before performing the DFT.
A window function of higher order has to be defined that covers
the current as well as previous input segments. In case of SR, a
set of shifted low-order window functions are linearly combined
and transformed into a desired window function of higher order.
|
|
-
Optimal Design of Symmetric and Asymmetric Beampatterns with Circular Microphone Arrays
Xudong Zhao, Gongping Huang, Jacob Benesty, Jingdong Chen
[Abstract]
This paper is devoted to the study of the beamforming
problem with circular microphone arrays (CMAs) and
presents an approach to the design of beamformers with
asymmetric and symmetric frequency-invariant
beampatterns. We first discuss how to express a desired
target directivity pattern, either symmetric or asymmetric,
into a linear weighted combination of sine and cosine
functions as well as circular harmonics of different orders.
Using the hypercardioid pattern as an example, we show
how to determine the weighting coefficients by
maximizing the directivity factor (DF) with different
constraints. Then, by using the Jacobi-Anger expansion, an
approximation of the beamformer’s beampattern is
presented. A linear system is subsequently formed by
forcing the approximated beampattern to be equal to a
target asymmetric or symmetric directivity pattern. The
optimal beamforming filter is finally determined by
identifying the linear system. Simulations demonstrate the
properties of the proposed beamforming approach.
|