Building CMU Sphinx language model for the Holy Quran using simplified Arabic phonemes

This paper investigates the use of a simplified set of Arabic phonemes in an Arabic Speech Recognition system applied to Holy Quran. The CMU Sphinx 4 was used to train and evaluate a language model for the Hafs narration of the Holy Quran. The building of the language model was done using a simplifi...

Full description

Bibliographic Details
Main Authors: El Amrani, Mohamed Yassine, Rahman, M.M. Hafizur, Wahiddin, Mohamed Ridza, Shah, Asadullah
Format: Article
Language:English
English
Published: Elsevier 2016
Subjects:
Online Access:http://irep.iium.edu.my/53574/
http://irep.iium.edu.my/53574/
http://irep.iium.edu.my/53574/
http://irep.iium.edu.my/53574/1/EIJ_Pub.pdf
http://irep.iium.edu.my/53574/7/53574_building%20CMU%20Sphinx%20language_scopus.pdf
Description
Summary:This paper investigates the use of a simplified set of Arabic phonemes in an Arabic Speech Recognition system applied to Holy Quran. The CMU Sphinx 4 was used to train and evaluate a language model for the Hafs narration of the Holy Quran. The building of the language model was done using a simplified list of Arabic phonemes instead of the mainly used Romanized set in order to simplify the process of generating the language model. The experiments resulted in very low Word Error Rate (WER) reaching 1.5% while using a very small set of audio files during the training phase when using all the audio data for both the training and the testing phases. However, when using 90% and 80% of the training data, the WER obtained was respectively 50.0% and 55.7%.