We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones
in automatic speech recognition. NAM microphones are special acoustic sensors, which
are attached behind the talker's ear and can capture not only normal (audible) speech,
but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones
can be applied in automatic speech recognition systems when privacy is desired in
human-machine communication. Moreover, NAM microphones show robustness against noise
and they might be used in special systems (speech recognition, speech transform, etc.)
for sound-impaired people. Using adaptation techniques and a small amount of training
data, we achieved for a 20 k dictation task a
word accuracy for nonaudible murmur recognition in a clean environment. In this paper,
we also investigate nonaudible murmur recognition in noisy environments and the effect
of the Lombard reflex on nonaudible murmur recognition. We also propose three methods
to integrate audible speech and nonaudible murmur recognition using a stethoscope
NAM microphone with very promising results.
References
-
Y Nakajima, H Kashioka, K Shikano, N Campbell, Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 5, 708–711
-
Y Zheng, Z Liu, Z Zhang, et al. Air- and bone-conductive integrated microphones for robust speech detection and enhancement. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA, 249–254
-
Z Liu, A Subramanya, Z Zhang, J Droppo, A Acero, Leakage model and teeth clack removal for air- and bone-conductive integrated microphones. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1, 1093–1096
-
M Graciarena, H Franco, K Sonmez, H Bratt, Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters 10(3), 72–74 (2003). Publisher Full Text
-
OM Strand, T Holter, A Egeberg, S Stensby, On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA, 315–320
-
S-C Jou, T Schultz, A Waibel, Adaptation for soft whisper recognition using a throat microphone. Proceedings of International Conference on Speech and Language Processing (ICSLP '04), October 2004, Jeju Island, Korea
-
Y Nakajima, H Kashioka, K Shikano, N Campbell, Non-audible murmur recognition. Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), September 2003, Geneva, Switzerland, 2601–2604
-
A Lee, T Kawahara, K Takeda, K Shikano, A new phonetic tied-mixture model for efficient decoding. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3, 1269–1272
-
P Heracleous, Y Nakajima, A Lee, H Saruwatari, K Shikano, Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA, 73–76
-
P Heracleous, Y Nakajima, A Lee, H Saruwatari, K Shikano, Non-audible murmur (NAM) recognition using a stethoscopic NAM microphone. Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech '04 - ICSLP), October 2004, Jeju Island, Korea, 1469–1472
-
P Heracleous, T Kaino, H Saruwatari, K Shikano, Applications of NAM microphones in speech recognition for privacy in human-machine communication. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05 - EUROSPEECH), September 2005, Lisboa, Portugal, 3041–3044
-
Y Nakajima, H Kashioka, K Shikano, N Campbell, Remodeling of the sensor for non-audible murmur (NAM). Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05 - EUROSPEECH), September 2005, Lisboa, Portugal, 389–392
-
P Heracleous, T Kaino, H Saruwatari, K Shikano, Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05 - EUROSPEECH), September 2005, Lisboa, Portugal, 2649–2652
-
P Heracleous, Y Nakajima, A Lee, H Saruwatari, K Shikano, Audible (normal) speech and inaudible murmur recognition using NAM microphone. Proceedings of the 7th European Signal Processing Conference (EUSIPCO '04), September 2004, Vienna, Austria, 329–332
-
T Kawahara, A Lee, T Kobayashi, et al. Free software toolkit for Japanese large vocabulary continuous speech recognition. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China, IV-476–IV-479
-
K Itou, M Yamamoto, K Takeda, et al. JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. The Journal of the Acoustical Society of Japan (E) 20(3), 199–206 (1999)
-
CJ Leggetter, PC Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995). Publisher Full Text
-
C-H Lee, C-H Lin, B-H Juang, A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Transactions on Signal Processing 39(4), 806–814 (1991). Publisher Full Text
-
PC Woodland, D Pye, MJF Gales, Iterative unsupervised adaptation using maximum likelihood linear regression. Proceedings of the 4th International Conference on Spoken Language (ICSLP '96), October 1996, Philadelphia, Pa, USA 2, 1133–1136
-
J-C Junqua, The Lombard reflex and its role on human listeners and automatic speech recognizers. Journal of the Acoustical Society of America 93(1), 510–524 (1993). PubMed Abstract | Publisher Full Text
-
A Wakao, K Takeda, F Itakura, Variability of Lombard effects under different noise conditions. Proceedings of the 4th International Conference on Spoken Language (ICSLP '96), October 1996, Philadelphia, Pa, USA 4, 2009–2012
-
JHL Hansen, Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect. IEEE Transactions on Speech and Audio Processing 2(4), 598–614 (1994). Publisher Full Text
-
BA Hanson, TH Applebaum, Robust speaker-independent word recognition using static, dynamicand acceleration features: experiments with Lombard and noisy speech. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2, 857–860
-
R Ruiz, E Absil, B Harmegnies, C Legros, D Poch, Time- and spectrum-related variabilities in stressed speech under laboratory and real conditions. Speech Communication 20(1-2), 111–129 (1996). Publisher Full Text




