Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in...

Full description

Bibliographic Details
Main Author:	Thum, Wei Seong
Format:	Thesis
Language:	English
Published:	2018
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://umpir.ump.edu.my/id/eprint/27969/ http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf

id	ump-27969
recordtype	eprints
spelling	ump-279692020-02-25T04:17:26Z http://umpir.ump.edu.my/id/eprint/27969/ Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis Thum, Wei Seong TK Electrical engineering. Electronics Nuclear engineering For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in distinguishing the actual speech signal with noise signal correctly. Signal-to-noise ratio (SNR) is a fundamental measuring ratio of signal power over noise power, which is expressed in decibels (dB). One of the most famous SNR estimation techniques is the waveform amplitude distribution analysis (WADA), where it assumes that the amplitude of speech and noise follows gamma and Gaussian distributions. It has been used in some research works as a benchmark for result comparison. However, there is no clear instruction on how to build the look-up table. In this work, the development and rebuild of the look-up table using the own database corrupted with general white noise as the noise reference has been proposed. The reconstruction of WADA look-up table technique, which is known as the waveform amplitude distribution analysis-white (WADA-W), is able to enhance the SNR estimation by referring to the reconstructed WADA-W look-up table instead of a general WADA precomputed look-up table. The proposed WADA-W SNR estimation technique was evaluated by developing an AVSR system that utilised mel-frequency cepstral coefficients (MFCC) features and shape-based visual features from two speech databases: LUNA-V and CUAVE. According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises, which are white, babble, factory1, and factory2 noises from the NOISEX-92 dataset. The overall deviation of the SNR estimation of the LUNA-V database using the proposed WADA-W technique was just approximately 9.6dB, whereas the deviation of NIST and WADA techniques was approximately 42.3dB and 67.3dB respectively. By using the same proposed technique for CUAVE database, the overall deviation of the SNR estimation was only 13.3dB, whereas the deviation of NIST and WADA techniques was 50.6dB and 62.3dB respectively. The classification was done using the multi-stream hidden Markov model (MSHMM) with leave-one-out cross-validation (LOOCV) technique. From the experiments, it showed that the proposed AVSR system able to achieve the highest accuracy at 96.6% using LUNA-V database and 95.2% for CUAVE database under clean condition. In conclusion, the proposed WADA-W SNR estimator able to improve by 4.5% and 12.7% compared to the original WADA technique by using the LUNA-V and CUAVE database respectively. 2018-12 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf Thum, Wei Seong (2018) Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis. Masters thesis, Universiti Malaysia Pahang.
repository_type	Digital Repository
institution_category	Local University
institution	Universiti Malaysia Pahang
building	UMP Institutional Repository
collection	Online Access
language	English
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Thum, Wei Seong Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
description	For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in distinguishing the actual speech signal with noise signal correctly. Signal-to-noise ratio (SNR) is a fundamental measuring ratio of signal power over noise power, which is expressed in decibels (dB). One of the most famous SNR estimation techniques is the waveform amplitude distribution analysis (WADA), where it assumes that the amplitude of speech and noise follows gamma and Gaussian distributions. It has been used in some research works as a benchmark for result comparison. However, there is no clear instruction on how to build the look-up table. In this work, the development and rebuild of the look-up table using the own database corrupted with general white noise as the noise reference has been proposed. The reconstruction of WADA look-up table technique, which is known as the waveform amplitude distribution analysis-white (WADA-W), is able to enhance the SNR estimation by referring to the reconstructed WADA-W look-up table instead of a general WADA precomputed look-up table. The proposed WADA-W SNR estimation technique was evaluated by developing an AVSR system that utilised mel-frequency cepstral coefficients (MFCC) features and shape-based visual features from two speech databases: LUNA-V and CUAVE. According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises, which are white, babble, factory1, and factory2 noises from the NOISEX-92 dataset. The overall deviation of the SNR estimation of the LUNA-V database using the proposed WADA-W technique was just approximately 9.6dB, whereas the deviation of NIST and WADA techniques was approximately 42.3dB and 67.3dB respectively. By using the same proposed technique for CUAVE database, the overall deviation of the SNR estimation was only 13.3dB, whereas the deviation of NIST and WADA techniques was 50.6dB and 62.3dB respectively. The classification was done using the multi-stream hidden Markov model (MSHMM) with leave-one-out cross-validation (LOOCV) technique. From the experiments, it showed that the proposed AVSR system able to achieve the highest accuracy at 96.6% using LUNA-V database and 95.2% for CUAVE database under clean condition. In conclusion, the proposed WADA-W SNR estimator able to improve by 4.5% and 12.7% compared to the original WADA technique by using the LUNA-V and CUAVE database respectively.
format	Thesis
author	Thum, Wei Seong
author_facet	Thum, Wei Seong
author_sort	Thum, Wei Seong
title	Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_short	Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_full	Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_fullStr	Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_full_unstemmed	Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_sort	development on snr estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
publishDate	2018
url	http://umpir.ump.edu.my/id/eprint/27969/ http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf
first_indexed	2023-09-18T22:43:51Z
last_indexed	2023-09-18T22:43:51Z
_version_	1777417093627510784

Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

Similar Items