Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment

Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variabilit...

Full description

Bibliographic Details
Main Authors: M. Z., Ibrahim, Mulvaney, D. J., M. F., Abas
Format: Article
Language:English
English
Published: Asian Research Publishing Network (ARPN) 2015
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/12890/
http://umpir.ump.edu.my/id/eprint/12890/
http://umpir.ump.edu.my/id/eprint/12890/1/jeas_1215_3203.pdf
http://umpir.ump.edu.my/id/eprint/12890/7/fkee-2015-zamri-Feature-Fusion%20based%20Audio-Visual.pdf
id ump-12890
recordtype eprints
spelling ump-128902018-03-20T06:51:39Z http://umpir.ump.edu.my/id/eprint/12890/ Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment M. Z., Ibrahim Mulvaney, D. J. M. F., Abas TK Electrical engineering. Electronics Nuclear engineering Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach. Asian Research Publishing Network (ARPN) 2015-12-19 Article PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/12890/1/jeas_1215_3203.pdf application/pdf en http://umpir.ump.edu.my/id/eprint/12890/7/fkee-2015-zamri-Feature-Fusion%20based%20Audio-Visual.pdf M. Z., Ibrahim and Mulvaney, D. J. and M. F., Abas (2015) Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment. ARPN Journal of Engineering and Applied Sciences, 10 (23). pp. 17521-17527. ISSN 1819-6608 http://www.arpnjournals.org/jeas/research_papers/rp_2015/jeas_1215_3203.pdf
repository_type Digital Repository
institution_category Local University
institution Universiti Malaysia Pahang
building UMP Institutional Repository
collection Online Access
language English
English
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
M. Z., Ibrahim
Mulvaney, D. J.
M. F., Abas
Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
description Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.
format Article
author M. Z., Ibrahim
Mulvaney, D. J.
M. F., Abas
author_facet M. Z., Ibrahim
Mulvaney, D. J.
M. F., Abas
author_sort M. Z., Ibrahim
title Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
title_short Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
title_full Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
title_fullStr Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
title_full_unstemmed Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
title_sort feature-fusion based audio-visual speech recognition using lip geometry features in noisy environment
publisher Asian Research Publishing Network (ARPN)
publishDate 2015
url http://umpir.ump.edu.my/id/eprint/12890/
http://umpir.ump.edu.my/id/eprint/12890/
http://umpir.ump.edu.my/id/eprint/12890/1/jeas_1215_3203.pdf
http://umpir.ump.edu.my/id/eprint/12890/7/fkee-2015-zamri-Feature-Fusion%20based%20Audio-Visual.pdf
first_indexed 2023-09-18T22:14:54Z
last_indexed 2023-09-18T22:14:54Z
_version_ 1777415272119926784