Grammar-based prosody modification for explicit control Malay language storytelling speech synthesis / Muhammad Izzad Ramli

Storytelling speech synthesis is a process of converting written text to the spoken speech in storytelling speaking style. It has gained much interest in the area of digital storytelling and storytelling humanoid robot for children in learning environment. Reviews have shown that storytelling speech...

Full description

Bibliographic Details
Main Author: Ramli, Muhammad Izzad
Format: Book Section
Language:English
Published: Institute of Graduate Studies, UiTM 2018
Subjects:
Online Access:http://ir.uitm.edu.my/id/eprint/22099/
http://ir.uitm.edu.my/id/eprint/22099/1/ABS_MUHAMMAD%20IZZAD%20RAMLI%20TDRA%20VOL%2014%20IGS%2018.pdf
Description
Summary:Storytelling speech synthesis is a process of converting written text to the spoken speech in storytelling speaking style. It has gained much interest in the area of digital storytelling and storytelling humanoid robot for children in learning environment. Reviews have shown that storytelling speech synthesis can be developed using implicit control, explicit control or playback approach. The literatures stated that each approach has its own drawbacks and needs to be tackled for a better quality synthesized speech. In this thesis, explicit control is selected because it is commonly used in the storytelling speech synthesis and has shown to produce good intelligibility and reasonably natural speech. However, modification of prosody in explicit control approach remains a problem as it may lead to speech quality degeneration due to extreme over-exaggeration of speech. Furthermore, perception evaluation showed that the similarity score between the natural and synthesized speech can also be improved for a more satisfactory result. Therefore, this research aims to introduce a new prosody modification technique to reduce over-exaggeration and simultaneously improve the similarity between the natural and synthesized speech. Three narrative children short stories in neutral and storytelling styles are recorded by nine storytellers. A total of 522 speech sentences, 5,238 words and 12,294 syllables are collected to be utilized as experimental datasets and prosody analysis. Based on the prosody analysis, a grammar-based prosody modification rules are proposed by integrating grammatical structure…