This article is part of the series Advances in Subspace-Based Techniques for Signal Processing and Communications.

Open Access Research Article

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Kris Hermus*, Patrick Wambacq and HugoVan hamme

Author Affiliations

Department of Electrical Engineering - ESAT, Katholieke Universiteit Leuven, Leuven-Heverlee 3001, Belgium

For all author emails, please log on.

EURASIP Journal on Advances in Signal Processing 2007, 2007:045821  doi:10.1155/2007/45821


The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2007/1/045821


Received:24 October 2005
Revisions received:7 March 2006
Accepted:30 April 2006
Published:13 September 2006

© 2007 Hermus et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

References

  1. DW Tufts, R Kumaresan, I Kirsteins, Data adaptive signal estimation by singular value decomposition of a data matrix. Proceedings of the IEEE 70(6), 684–685 (1982)

  2. JA Cadzow, Signal enhancement—a composite property mapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 36(1), 49–62 (1988). Publisher Full Text OpenURL

  3. M Dendrinos, S Bakamidis, G Carayannis, Speech enhancement from noise: a regenerative approach. Speech Communication 10(1), 45–57 (1991). Publisher Full Text OpenURL

  4. B De Moor, The singular value decomposition and long and short spaces of noisy matrices. IEEE Transactions on Signal Processing 41(9), 2826–2838 (1993). Publisher Full Text OpenURL

  5. S Van Huffel, Enhanced resolution based on minimum variance estimation and exponential data modeling. Signal Processing 33(3), 333–355 (1993). Publisher Full Text OpenURL

  6. Y Ephraim, HL Van Trees, A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 3(4), 251–266 (1995). Publisher Full Text OpenURL

  7. Y Hu, P Loizou, Perceptual weighting motivated subspace based speech enhancement approach. Proceedings of International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA, 1797–1800

  8. F Jabloun, B Champagne, Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 11(6), 700–708 (2003). Publisher Full Text OpenURL

  9. Y Hu, PC Loizou, A perceptually motivated approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 11(5), 457–465 (2003). Publisher Full Text OpenURL

  10. SH Jensen, PC Hansen, SD Hansen, JA Sørensen, Reduction of broad-band noise in speech by truncated QSVD. IEEE Transactions on Speech and Audio Processing 3(6), 439–448 (1995). Publisher Full Text OpenURL

  11. A Rezayee, S Gazor, An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 9(2), 87–95 (2001). Publisher Full Text OpenURL

  12. H Lev-Ari, Y Ephraim, Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters 10(4), 104–106 (2003). Publisher Full Text OpenURL

  13. PSK Hansen, PC Hansen, SD Hansen, JA Sørensen, Experimental comparison of signal subspace based noise reduction methods. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1, 101–104

  14. J Huang, Y Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters 4(10), 283–285 (1997). Publisher Full Text OpenURL

  15. K Hermus, W Verhelst, P Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 3, 542–545

  16. K Hermus, P Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1, 945–948

  17. I Dologlou, G Carayannis, Physical interpretation of signal reconstruction from reduced rank matrices. IEEE Transactions on Signal Processing 39(7), 1681–1682 (1991). Publisher Full Text OpenURL

  18. PC Hansen, SH Jensen, FIR filter representations of reduced-rank noise reduction. IEEE Transactions on Signal Processing 46(6), 1737–1741 (1998). Publisher Full Text OpenURL

  19. Y Ephraim, HL Van Trees, A signal subspace approach for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), April 1993, Minneapolis, Minn, USA 2, 355–358

  20. K Hermus, in Signal subspace decompositions for perceptual speech and audio processing, Ph.D. dissertation

  21. S Doclo, M Moonen, GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Processing 50(9), 2230–2244 (2002). Publisher Full Text OpenURL

  22. IY Soon, SN Koh, CK Yeo, Noisy speech enhancement using discrete cosine transform. Speech Communication 24(3), 249–257 (1998). Publisher Full Text OpenURL

  23. J Rissanen, Modeling by shortest data description. Automatica 14(5), 465–471 (1978). Publisher Full Text OpenURL

  24. S Bakamidis, M Dendrinos, G Carayannis, SVD analysis by synthesis of harmonic signals. IEEE Transactions on Signal Processing 39(2), 472–477 (1991). Publisher Full Text OpenURL

  25. R Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9(5), 504–512 (2001). Publisher Full Text OpenURL

  26. I Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 11(5), 466–475 (2003). Publisher Full Text OpenURL

  27. S Rangachari, PC Loizou, Y Hu, A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1, 305–308

  28. Golub G, Van Loan C (eds.), Matrix Computations (Johns Hopkins University Press, Baltimore, Md, USA, 1983)

  29. PC Hansen, SH Jensen, Prewhitening for rank-deficient noise in subspace methods for noise reduction. IEEE Transactions on Signal Processing 53(10), 3718–3726 (2005)

  30. U Mittal, N Phamdo, Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing 8(2), 159–167 (2000). Publisher Full Text OpenURL

  31. Y Hu, PC Loizou, A subspace approach for enhancing speech corrupted by colored noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1, 573–576

  32. Y Hu, PC Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing 11(4), 334–341 (2003). Publisher Full Text OpenURL

  33. GS Kang, LJ Fransen, Quality improvement of LPC-processed noisy speech by using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 37(6), 939–942 (1989). Publisher Full Text OpenURL

  34. Linguistic Data Consortium (LDC) (http://www), . ldc.upenn.edu webcite

  35. H-G Hirsch, D Pearce, The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of International Speech Communication Association (ISCA) Workshop: Authomatic Speech Recognition: Challanges for the New Millenium (ASR '00), September 2000, Paris, France, 181–188

  36. K Demuynck, in Extracting, modelling and combining information in speech recognition, Ph.D. dissertation

  37. J Duchateau, K Demuynck, D Van Compernolle, Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Communication 24(1), 5–17 (1998). Publisher Full Text OpenURL

  38. Y Gong, Speech recognition in noisy environments: a survey. Speech Communication 16(3), 261–291 (1995). Publisher Full Text OpenURL