Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah

Automated speech recognition (ASR) for spontaneous speech poses extra challenge compared to read speech as it contains varied speaking rates, poor phonation and disfluencies. Studies have shown that filled pause is one of the most common disfluencies of spontaneous speech characteristic where it pre...

Full description

Bibliographic Details
Main Author:	Hamzah, Raseeda
Format:	Thesis
Language:	English
Published:	2016
Online Access:	http://ir.uitm.edu.my/id/eprint/17786/ http://ir.uitm.edu.my/id/eprint/17786/1/TP_RASEEDA%20HAMZAH%20CS%2016_5.pdf

id	uitm-17786
recordtype	eprints
spelling	uitm-177862018-09-24T06:57:24Z http://ir.uitm.edu.my/id/eprint/17786/ Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah Hamzah, Raseeda Automated speech recognition (ASR) for spontaneous speech poses extra challenge compared to read speech as it contains varied speaking rates, poor phonation and disfluencies. Studies have shown that filled pause is one of the most common disfluencies of spontaneous speech characteristic where it presents considerable problems for ASR performance. In many filled pause studies, the hindering factor is that filled pause being often recognized as short words which particularly has semantic meaning, such as 'urn' can be recognized as 'thumb' or 'arm'. This problem becomes especially pertinent where a vowel sound of normal word being relatively long at any position in an utterance, both within a word as well as between words which formerly known as elongation. The existence of elongation causes normal word falsely detected as filled pause due to their similar acoustical feature patterns. Classifying elongation as filled pause affects ASR's performance as eliminating normal words from recognition may modify the intended context of a speech. Therefore, the main aim of this research is to classify filled pause and elongation into its own classes by constructing a discriminative classification model from the extracted acoustical features. A large number of signal features have been employed for the problem of discriminating filled pause and elongation. Several well-established features such as Formant Frequency (FF), Fundamental Frequency (FO), Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rates (ZCR) and Short Time Energy (STE) were used in this research. These features are carefully chosen to emphasize signal characteristics that differ between filled pause and elongation. In most speech research, extracting speech energy feature is still remains as challenging task due to it typically has a great deal of variance which include loudness as well as the variance in the signal energy between different phoneme which contains vowel or/and consonant sounds. One of the ways of detecting vowel and consonant is through its energy level. Beside the common way of quantifying the speech energy by calculating the sum of energy of the short interval centered on each interval, we proposed new technique namely, Local Maxima Energy (LM-E) to exploit the speech energy feature of filled pause and elongation. Experimentally, this can be done by measuring its amplitude transition from one frame to another by setting a threshold as height difference between peaks of the speech signal. Unlike other acoustical features, LM-E has shown its performance to classify elongation better by detecting the expressive contour of the elongation that is caused by the transition from consonant to vowel of the elongation. A rigorous feature performance evaluation shows that LM-E significantly increased the classification performance when fused with ZCR. Therefore, these two features are incorporated into discriminative Naive-Bayes model for filled pause and elongation classification. The discriminative model of LM-E and ZCR improved the classification performance by 7% error rate reduction, and average of 7% accuracy increments compared to single feature classification performance. This model can further be used to improve disfluencies detection for a better ASR performance. 2016 Thesis NonPeerReviewed text en http://ir.uitm.edu.my/id/eprint/17786/1/TP_RASEEDA%20HAMZAH%20CS%2016_5.pdf Hamzah, Raseeda (2016) Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah. PhD thesis, Universiti Teknologi MARA.
repository_type	Digital Repository
institution_category	Local University
institution	Universiti Teknologi MARA
building	UiTM Institutional Repository
collection	Online Access
language	English
description	Automated speech recognition (ASR) for spontaneous speech poses extra challenge compared to read speech as it contains varied speaking rates, poor phonation and disfluencies. Studies have shown that filled pause is one of the most common disfluencies of spontaneous speech characteristic where it presents considerable problems for ASR performance. In many filled pause studies, the hindering factor is that filled pause being often recognized as short words which particularly has semantic meaning, such as 'urn' can be recognized as 'thumb' or 'arm'. This problem becomes especially pertinent where a vowel sound of normal word being relatively long at any position in an utterance, both within a word as well as between words which formerly known as elongation. The existence of elongation causes normal word falsely detected as filled pause due to their similar acoustical feature patterns. Classifying elongation as filled pause affects ASR's performance as eliminating normal words from recognition may modify the intended context of a speech. Therefore, the main aim of this research is to classify filled pause and elongation into its own classes by constructing a discriminative classification model from the extracted acoustical features. A large number of signal features have been employed for the problem of discriminating filled pause and elongation. Several well-established features such as Formant Frequency (FF), Fundamental Frequency (FO), Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rates (ZCR) and Short Time Energy (STE) were used in this research. These features are carefully chosen to emphasize signal characteristics that differ between filled pause and elongation. In most speech research, extracting speech energy feature is still remains as challenging task due to it typically has a great deal of variance which include loudness as well as the variance in the signal energy between different phoneme which contains vowel or/and consonant sounds. One of the ways of detecting vowel and consonant is through its energy level. Beside the common way of quantifying the speech energy by calculating the sum of energy of the short interval centered on each interval, we proposed new technique namely, Local Maxima Energy (LM-E) to exploit the speech energy feature of filled pause and elongation. Experimentally, this can be done by measuring its amplitude transition from one frame to another by setting a threshold as height difference between peaks of the speech signal. Unlike other acoustical features, LM-E has shown its performance to classify elongation better by detecting the expressive contour of the elongation that is caused by the transition from consonant to vowel of the elongation. A rigorous feature performance evaluation shows that LM-E significantly increased the classification performance when fused with ZCR. Therefore, these two features are incorporated into discriminative Naive-Bayes model for filled pause and elongation classification. The discriminative model of LM-E and ZCR improved the classification performance by 7% error rate reduction, and average of 7% accuracy increments compared to single feature classification performance. This model can further be used to improve disfluencies detection for a better ASR performance.
format	Thesis
author	Hamzah, Raseeda
spellingShingle	Hamzah, Raseeda Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah
author_facet	Hamzah, Raseeda
author_sort	Hamzah, Raseeda
title	Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah
title_short	Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah
title_full	Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah
title_fullStr	Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah
title_full_unstemmed	Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah
title_sort	discriminative classification model of filled pause and elongation for malay language spontaneous speech / raseeda hamzah
publishDate	2016
url	http://ir.uitm.edu.my/id/eprint/17786/ http://ir.uitm.edu.my/id/eprint/17786/1/TP_RASEEDA%20HAMZAH%20CS%2016_5.pdf
first_indexed	2023-09-18T22:59:04Z
last_indexed	2023-09-18T22:59:04Z
_version_	1777418050412216320

Discriminative classification model of filled pause and elongation for Malay language spontaneous speech / Raseeda Hamzah

Similar Items