NOISE REDUCTION FOR DUAL-MICROPHONE MOBILE PHONES EXPLOITING - TopicsExpress



          

NOISE REDUCTION FOR DUAL-MICROPHONE MOBILE PHONES EXPLOITING POWER LEVEL DIFFERENCES Marco Jeub, Christian Herglotz, Christoph Nelke, Christophe Beaugeant  , and Peter Vary Institute of Communication Systems and Data Processing ( ), RWTH Aachen University, Germany  Intel Mobile Communications, Sophia-Antipolis, France { jeub,herglotz,nelke,vary } @ind.rwth-aachen.de christophe.beaugeant@intel ABSTRACT This paper discusses the application of noise reduction algorithms for dual-microphone mobile phones. An analysis of the acoustical environment based on recordings with a dual-microphone mock-up phone mounted on a dummy head is given. Motivated by the record- ings, a novel dual-channel noise reduction algorithm is proposed. The key components are a noise PSD estimator and an improved spectral weighting rule which both explicitly exploit the Power Level Differences (PLD) of the desired speech signal between the micro- phones. Experiments with recorded data show that this low complex- ity system has a good performance and is bene ¿ cial for an integration into future mobile communication devices. IndexTerms — Noise reduction, noise estimation, speech en- hancement, dual-channel, power level difference. 1. INTRODUCTION Mobile phone conversations can take place in nearly every acousti- cal situation. Since the listener at the far-end usually suffers from unwanted background noise if the talker is located in an adverse acoustical situation, most mobile phones have integrated algorithms to enhance the speech quality, cf. [1]. The algorithms aim to re- duce unwanted background noise while ensuring that the occurring speech distortions are inaudible to the greatest possible extent. For such algorithms, the computational complexity and algorithmic de- lay is of signi ¿ cant importance. Besides, the algorithm should be able to converge fast in changing noise conditions. In this contribution, we discuss the application of noise reduc- tion algorithms for dual-microphone mobile phones. In order to em- ploy such algorithms, a secondary microphone can be placed either next to the common primary microphone on the bottom of the de- vice or on top of the device (see Fig.1). In the ¿ rst part of this paper, an analysis of the acoustical environment is given, which is entirely based on recordings taken with a dual-microphone mock-up phone in typical acoustical situations. Based on these observations, in the second part, a novel algorithm is proposed which exploits the Power Level Differences (PLD) of the different signal components and has a very low computational complexity. 2. ANALYSIS OF THE ACOUSTICAL ENVIRONMENT Common mobile phones use a single microphone for capturing the speech signal. This primary microphone is usually mounted on the bottom of the device in order to allow for a short acoustic path be- tween mouth and microphone, which ensures a high direct path en- ergy and less reverberation. Depending on the phone design, a sec- ondary microphone can be placed either on the bottom next to the primary microphone, or on top of the device in order to capture the speech signal with a lower sound pressure level (SPL). x 1 ( k ) x 2 ( k ) 10 cm 3 cm Fig. 1 . Illustration of mobile phone with the considered microphone position. (left) front side , (right) rear side. n 1 ( k ) n 2 ( k ) x 1 ( k ) x 2 ( k ) s 1 ( k ) s 2 ( k ) s ( k ) H 12 ( e j Ω ) Fig. 2 . Dual-channel signal model. In the remainder of this paper, the dual-channel microphone con- ¿ guration according to Fig.1 is considered. A primary microphone is placed at the bottom and a secondary microphone on the top rear side of the device. The two microphone signals x 1 ( k ) and x 2 ( k ) are related to clean speech s ( k ) and additive background noise sig- nals n m ( k ) by the signal model shown in Fig 2, with m =1 , 2 and discrete time index k . The acoustic transfer function of the desired speech signal between the two microphones is denoted by H 12 ( e j Ω ) . The following background noise analysis is based on measure- ments inside an acoustic chamber using the standardized multi- loudspeaker procedure described in [2] to generate realistic noise ¿ elds. Here, we restrict the analysis to two important noise types: car and babble noise from [2]. The recording system consists of a HEAD acoustics HMS II.3 arti ¿ cial head which includes a mouth simulator. A mock-up phone was mounted on the arti ¿ cial head in the À at handset position. This procedure allows to record speech (taken from [3]) and noise separately which is usually not possible in real acoustic environments. 1693 978-1-4673-0046-9/12/$26.00 ©2012 IEEE ICASSP 2012 0 1 2 3 4 5 6 7 8 −90 −80 −70 −60 −50 −40 Bottom Top Power [dB] Frequency [kHz] Fig. 3 . PSD of babble noise captured by the two microphones. 0 1 2 3 4 5 6 7 8 −100 −90 −80 −70 −60 −50 −40 Bottom Top Power [dB] Frequency [kHz] Fig. 4 . PSD of speech signal from arti ¿ cial mouth captured by the two microphones. 2.1. Analysis of Background Noise Important acoustical quantities are the power spectral densities (PSD) recorded at the positions of the two microphones for both, speech and noise. Figure 3 shows exemplarily the PSD of babble noise for the two microphones. It can be seen that both signals have roughly the same PSD and hence, a homogeneous noise ¿ eld exists as con ¿ rmed by the investigation of further noise types. A further coherence evaluation of the background noise showed a good match between the theoretical coherence using the free- ¿ eld diffuse model, cf. [4], with the corresponding inter-microphone dis- tances and the recorded data. All experiments with noise-only con- ditions have also been veri ¿ ed with the same mock-up phone, which was placed outside in crowded places and a busy street. 2.2. Analysis of Speech The attenuation of the desired speech signal from the mouth to the possible microphone locations is of signi ¿ cant importance. Figure 4 shows the PSD of the speech signals picked up by the two micro- phones (noise-free case) where a power level difference of ≈ 10 dB is measured between the bottom and top microphone for all frequen- cies. 3. NOISE REDUCTION SYSTEM The novel speech enhancement system which operates in the short- time Fourier domain is depicted in Fig. 5. The system can be di- vided into two novel components: a dual-channel noise PSD estima- tor as well as a dual-channel spectral weighting rule. Each of the two components works independently and can be incorporated in any re- lated speech enhancement system. The enhanced spectrum ˆ S ( λ,μ ) is given by multiplying the primary input X 1 ( λ,μ ) with the spectral weighting gains G ( λ,μ ) . Discrete frequency bin and frame index are denoted by μ and λ . The required estimate of the noise PSD is S e g m e n t a t i o n W i n d o w i n g F F T N o i s e P S D E s t i m a t i o n S p e c t r a l G a i n C a l c u l a t i o n I F F T O v e r l a p - A d d S e g m e n t a t i o n W i n d o w i n g F F T x 1 ( k ) x 2 ( k ) X 1 ( λ,μ ) X 2 ( λ,μ ) G ( λ,μ ) ˆ Φ nn ( λ,μ ) ˆ S ( λ,μ ) ˆ s ( k ) Fig. 5 . Block diagram of the proposed dual-channel noise reduction system. denoted by ˆ Φ nn ( λ,μ ) . The enhanced time domain signal ˆ s ( k ) is obtained by using the IFFT and overlap-add. 3.1. Noise PSD Estimation (PLDNE Algorithm) The motivation for the novel PLD-based noise PSD estimator, which is termed as Power Level Difference Noise Estimator (PLDNE), is given by the preceding measurements. Two important assump- tions are the existence of a homogeneous diffuse noise ¿ eld, i.e., Φ n 1 n 1 ( λ,μ )=Φ n 2 n 2 ( λ,μ )=Φ nn ( λ,μ ) ,aswellasasuf ¿ cient attenuation of the desired speech signal between the two micro- phones of, e.g., 10 dB. In a ¿ rst step, the normalized difference of the power spectral density 0 ≤ ΔΦ PLDNE ( λ,μ ) ≤ 1 of the noisy input is calculated for every frequency bin μ by ΔΦ PLDNE ( λ,μ )=     Φ x 1 x 1 ( λ,μ ) − Φ x 2 x 2 ( λ,μ ) Φ x 1 x 1 ( λ,μ )+Φ x 2 x 2 ( λ,μ )     , (1) where Φ x 1 x 1 ( λ,μ ) and Φ x 2 x 2 ( λ,μ ) represent the auto-PSD of x 1 ( k ) and x 2 ( k ) respectively. The cross-PSD is denoted by Φ x 1 x 2 ( λ,μ ) . All PSD values are calculated by recursive smoothing over time with constant α 1 . The idea behind the subsequent noise PSD estimation is as fol- lows. In case of background noise-only periods, ΔΦ PLDNE ( λ,μ ) will be close to zero as the input power levels are almost equal. If the value lies below a threshold φ min , the noise PSD estimate is de- termined directly from the input signal x 1 ( k ) by ˆ Φ nn ( λ,μ )= α 2 · ˆ Φ nn ( λ − 1 ,μ )+(1 − α 2 ) ·| X 1 ( λ,μ ) | 2 , if ΔΦ PLDNE ( λ,μ ) φ max . (3) In between these two extremes, a noise estimation using x 2 ( k ) is used as approximation according to ˆ Φ nn ( λ,μ )= α 3 · ˆ Φ nn ( λ − 1 ,μ )+(1 − α 3 ) ·| X 2 ( λ,μ ) | 2 , (4) 1694 since the highly attenuated speech components in x 2 ( k ) can be ne- glected. In situations with babble noise, it is bene ¿ cial to com- bine the PLDNE algorithm with further single- or dual-channel noise PSD estimators, e.g., [5, 6, 7] instead of keeping the last estimate in Eq.(3). 3.2. Noise Reduction (PLD Algorithm) The second component of the novel noise reduction system com- prises the calculation of the spectral weighting gains G ( λ,μ ) .The method is motivated by the PLD algorithm initially proposed in [8]. Here, an alternative calculation of the spectral gains and an addi- tional smoothing is proposed. It is again assumed that the power levels are equal for noise whereas speech results in a higher PSD at microphone x 1 ( k ) .The auto-PSDs of the inputs are given by Φ x 1 x 1 ( λ,μ )=Φ s 1 s 1 ( λ,μ )+Φ n 1 n 1 ( λ,μ ) , (5) Φ x 2 x 2 ( λ,μ )=Φ s 2 s 2 ( λ,μ )+Φ n 2 n 2 ( λ,μ ) . (6) By introducing a transfer function of the desired speech signal be- tween the microphones (see Fig. 2), the auto-PSD at the secondary microphone can be expressed by Φ x 2 x 2 ( λ,μ )= | H 12 ( λ,μ ) | 2 · Φ s 1 s 1 ( λ,μ )+Φ n 2 n 2 ( λ,μ ) . (7) Two difference equations for the auto-PSD of the noisy input and the noise-only signals are introduced as ΔΦ PLD ( λ,μ )=Φ x 1 x 1 ( λ,μ ) − Φ x 2 x 2 ( λ,μ ) , (8) ΔΦ nn ( λ,μ )=Φ n 1 n 1 ( λ,μ ) − Φ n 2 n 2 ( λ,μ ) . (9) The power level difference of the noisy input signal can thus be ex- pressed as ΔΦ PLD ( λ,μ )=Φ s 1 s 1 ( λ,μ )(1 −| H 12 ( λ,μ ) | 2 )+ΔΦ nn ( λ,μ ) . (10) Due to the assumption of a homogeneous noise ¿ eld the difference ΔΦ nn ( λ,μ ) can be neglected, i.e., ΔΦ nn ( λ,μ ) ≈ 0 . Hence, the equation for the PLD reads ΔΦ PLD ( λ,μ )=(1 −| H 12 ( λ,μ ) | 2 ) · Φ s 1 s 1 ( λ,μ ) . (11) The ¿ nal spectral weighing rule is the Wiener ¿ lter equation G ( λ,μ )= Φ s 1 s 1 ( λ,μ ) Φ s 1 s 1 ( λ,μ )+Φ n 1 n 1 ( λ,μ ) . (12) By expanding both nominator and denominator by 1 −| H 12 ( λ,μ ) | 2 as in [8] and by inserting Eq.(11), the weighting rule reads G ( λ,μ )= ΔΦ PLD ( λ,μ ) ΔΦ PLD ( λ,μ )+ γ (1 −| H 12 ( λ,μ ) | 2 ) · Φ nn ( λ,μ ) , (13) with a noise overestimation factor denoted by γ . In the case of speech absence ΔΦ PLD ( λ,μ ) will be zero and hence, the gains will be zero, too. When there is pure speech the right part of the denom- inator of Eq.(13) will be zero. Thus the gains G ( λ,μ ) will turn to one. The required transfer function H 12 ( λ,μ ) is derived from the cross-PSD of the noisy input Φ x 1 x 2 ( λ,μ ) . In [8], the cross-PSD is expressed by Φ x 1 x 2 ( λ,μ )= H 12 ( λ,μ ) · Φ x 1 x 1 ( λ,μ )+Φ n 1 n 2 ( λ,μ ) , (14) and the transfer function is given by H 12 ( λ,μ )= Φ x 1 x 2 ( λ,μ ) − Φ n 1 n 2 ( λ,μ ) Φ x 1 x 1 ( λ,μ ) − Φ nn ( λ,μ ) . (15) Table 1 . Main simulation parameters. Sampling frequency f s =16 kHz Frame length L = 320 ( 20 ms) FFT length M = 512 (including zero-padding) Frame overlap 50 % (Hann window) Smoothing factors α 1 =0 . 9 , α 2 =0 . 9 , α 1 =0 . 8 , α nn =0 . 9 Smoothing threshold f 0 =1 kHz PLDNE thresholds φ min =0 . 2 , φ max =0 . 8 Overestimation factor γ =4 The required cross-PSD of the background noise Φ n 1 n 2 ( λ,μ ) is cal- culated in [8] from the ¿ rst 400 ms where no speech activity is as- sumed. In contrast to Eq.(14), in our implementation the cross-PSD is correctly expressed by Φ x 1 x 2 ( λ,μ )= H 12 ( λ,μ ) · Φ s 1 s 1 ( λ,μ )+Φ n 1 n 2 ( λ,μ ) . (16) By incorporating the coherence of the noise ¿ eld Γ n 1 n 2 ( μ ) ,the cross-PSD reads with Φ s 1 s 1 ( λ,μ )=Φ x 1 x 1 ( λ,μ ) − Φ nn ( λ,μ ) Φ x 1 x 2 ( λ,μ )= H 12 ( λ,μ ) · (Φ x 1 x 1 ( λ,μ ) − Φ nn ( λ,μ )) +Γ n 1 n 2 ( μ ) · Φ nn ( λ,μ ) . (17) Hence, the proposed transfer function is given by H 12 ( λ,μ )= Φ x 1 x 2 ( λ,μ ) − Γ n 1 n 2 ( μ ) · Φ nn ( λ,μ ) Φ x 1 x 1 ( λ,μ ) − Φ nn ( λ,μ ) . (18) With Eq.(18), the computation of the transfer function does not re- quire an additional calculation of the noise cross-PSD anymore and allows the algorithm to cope with non-stationary noise and changing SNR conditions compared to [8]. In the practical implementation, the power level difference is proposed to be calculated by ΔΦ PLD ( λ,μ )=max(Φ x 1 x 1 ( λ,μ ) − Φ x 2 x 2 ( λ,μ ) , 0) , (19) which prevents speech distortions if the assumption of a homoge- neous noise ¿ eld is violated, e.g., due to an interfering talker. In order to reduce the amount of musical tones, a smoothing over frequency using the approach of [9] is employed for frequencies above f 0 . 4. PERFORMANCE EVALUATION The experiment section is separated into an evaluation of the pro- posed noise estimator as well as the complete noise reduction sys- tem using PLDNE and the PLD-based weighting rule. The input signals are taken from recordings using the same experimental setup with a dual-microphone mock-up phone used for the acoustical anal- ysis carried out in Section 2. Speech and noise were recorded sep- arately and mixed together with different SNR conditions, ensuring the same power level difference of the speech signal. We investigate the PLD algorithm assuming an ideal diffuse noise ¿ eld and use the following coherence model in Eq.(18) Γ n 1 n 2 ( f )= sinc (2 πfd mic /c ) , (20) with distance d mic =0 . 1 m and sound velocity c = 340 m/s. Further simulation parameters are listed in Tbl.1. 4.1. Noise Estimation Accuracy The PLDNE algorithm (Proposed) is compared to the generalized dual-channel coherence-based noise PSD estimator [7] (GCoh). It has to be mentioned that the estimator presented in [7] was mainly developed for binaural hearing aids with a larger inter-microphone 1695 Noisy input PLD (Proposed) PLD (Original) DMSS Wiener filter (SC) PLDNE (Prop.) GCoh MS MMSE 0 5 10 15 20 25 2 4 6 8 10 logErr [dB] 0 5 10 15 20 25 5 10 15 20 25 30 NA-SA [dB] 0 5 10 15 20 25 0.4 0.5 0.6 0.7 0.8 0.9 1 SII SNR [dB] SNR [dB] SNR [dB] Fig. 6 . Simulation results: (left) noise estimation accuracy, (middle) noise suppression performance, (right) in À uence on intelligibility. NA-SA: noise attenuation minus speech attenuation, SII: speech intelligibility index. spacing of 0 . 15 − 0 . 2 m. Besides, the two single-channel ap- proaches Minimum Statistics (MS) [5] and MMSE-based noise tracker (MMSE) [6], which work on the primary signal x 1 ( k ) only, are used as state-of-the art references. The performance is rated in terms of the symmetric segmental logarithmic estimation error between the ideal noise PSD Φ nn ( λ,μ ) and the estimated noise PSD ˆ Φ nn ( λ,μ ) by logErr = 1 KM K  λ =1 M  μ =1      10log 10  Φ nn ( λ,μ ) ˆ Φ nn ( λ,μ )       , (21) with total number of frames K . The ideal noise PSD is obtained us- ing the true noise periodograms smoothed over time λ with smooth- ing factor α nn . The averaged results for babble and traf ¿ c noise are depicted in Fig.6 (left). It can be seen that the novel algorithm outperforms all related approaches and is nearly independent of the input SNR. 4.2. Noise Reduction Performance The performance of the PLD weighting rule (Proposed) using Eq.(18) is compared with the original implementation by [8] (Orig- inal) using Eq.(15) and a single-channel (SC) Wiener ¿ lter with decision-directed approach for the a priori SNR calculation. All algorithms use the PLDNE noise PSD estimator. Besides, a dual- channel spectral subtraction algorithm [10] (DMSS) is evaluated. In a ¿ rst step, two common spectral subtraction approaches provide a rough speech and noise estimate for each channel by using the other channel respectively. In a following step these estimates are used by a third spectral subtraction stage which results in the enhanced output. The noise reduction performance is determined by means of the noise attenuation minus speech attenuation (NA-SA) measure, where higher values indicate an improvement. Besides, the speech intelligibility index (SII) [11] was calculated for the noisy as well as the enhanced signal. An SII higher than 0 . 75 indicates a good communication system and values below 0 . 45 correspond to a poor system. The averaged results for babble and traf ¿ careshownin Figs. 6 (middle/right). From the plots, we can conclude that the proposed noise reduction system outperforms related approaches in terms of noise suppression performance and increase in speech in- telligibility. The modi ¿ cations on the original PLD implementation also result in a high performance gain. All results are consistent with the subjective listening impression where the highest amount of musical tones was observed for the DMSS algorithm. Since for babble noise the major frequency components of the noise signal lie in the same regions as those of the desired speech signal, this sce- nario can be seen as the most dif ¿ cult one. However, all experiments have also been conducted with train station noise where the same tendency has been observed. 5. CONCLUSIONS We propose a noise reduction system which is suitable for speech en- hancement in dual-microphone mobile phones. A novel noise PSD estimator as well as a modi ¿ ed spectral weighting rule are presented, which both exploit the power level differences of the desired speech signal between the microphones. The algorithms require a low com- putational complexity and can ef ¿ ciently be implemented using ¿ rst order IIR ¿ lters for the auto- and cross-PSD estimation. Experiments have shown that the novel system is capable of reducing unwanted background noise and increase the intelligibility in terms of the SII measure. 6. REFERENCES [1] L. Watts, “Advanced noise reduction for mobile telephony,” IEEE Computer Magazine , vol. 41, no. 8, pp. 90–92, 2008. [2] ETSI 202 396-1, Speech and multimedia Transmission Quality (STQ); Part 1: Background noise simulation technique and background noise database , 03 2009, V1.2.3. [3] P. Kabal, “TSP speech database,” Tech. Rep., Department of Electrical & Com- puter Engineering, McGill University, Montreal, Quebec, Canada, 2002. [4] H. Kuttruff, Room Acoustics , Spon Press, Oxon, 2009. [5] R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Transactions on Speech and Audio Processing , vol. 9, no. 5, pp. 504–512, 2001. [6] R.C. Hendriks, R. Heusdens, and J. Jensen, “MMSE based noise PSD tracking with low complexity,” in Proc. IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP) , Dallas, USA, 2010. [7] M. Jeub, C.M. Nelke, H. Kr ̈ uger, C. Beaugeant, and P. Vary, “Robust dual-channel noise power spectral density estimation,” in Proc. European Signal Processing Conference (EUSIPCO) , Barcelona, Spain, 2011. [8] N. Youse ¿ an, A. Akbari, and M. Rahmani, “Using power level difference for near ¿ eld dual-microphone speech enhancement,” Applied Acoustics , vol. 70, pp. 1412 – 1421, 2009. [9] T. Esch and P. Vary, “Ef ¿ cient musical noise suppression for speech enhance- ment systems,” in Proc. IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP) , Taipei, Taiwan, 2009. [10] H. Gustaffson, I. Claesson, S. Nordholm, and U. Lindgren, “Dual microphone spectral subtraction,” Tech. Rep., Department of Telecommunications and Signal Processing, University of Karlskrona/Ronneby, Sweden, 2000. [11] ANSI S3.5-1997, Methods for the Calculation of the Speech Intelligibility Index , ANSI, r2007 edition, 2007. 1696
Posted on: Tue, 22 Jul 2014 16:14:08 +0000

Trending Topics



Recently Viewed Topics




© 2015