Phonetically rich and balanced text and speech corpora for Arabic language

This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three...

Full description

Bibliographic Details
Main Authors: Abushariah, Mohammad Abd-Alrahman Mahmoud, Ainon, Raja Noor, Zainuddin, Roziati, Elshafei, Moustafa, Khalifa, Othman Omran
Format: Article
Language:English
Published: Springer 2011
Subjects:
Online Access:http://irep.iium.edu.my/10572/
http://irep.iium.edu.my/10572/
http://irep.iium.edu.my/10572/
http://irep.iium.edu.my/10572/4/Phonetically_rich_Irep_ID10572.pdf
Description
Summary:This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created for