The development of an integrated corpus for Malay language

Generally, a corpus serves as the source of data for various types of research. As such, there are a number of Malay corpora being developed to support the needs of the researchers. However, the various corpora of Malay text are distributed and not integrated, where some words are not included or mi...

Full description

Bibliographic Details
Main Author: Awang Abu Bakar, Normi Sham
Other Authors: Alfred, Rayner
Format: Conference or Workshop Item
Language:English
English
Published: Springer Singapore 2020
Subjects:
Online Access:http://irep.iium.edu.my/75243/
http://irep.iium.edu.my/75243/
http://irep.iium.edu.my/75243/
http://irep.iium.edu.my/75243/1/75243_The%20development%20of%20an%20integrated%20corpus.pdf
http://irep.iium.edu.my/75243/7/75243_The%20development%20of%20an%20integrated%20corpus_scopus.pdf
Description
Summary:Generally, a corpus serves as the source of data for various types of research. As such, there are a number of Malay corpora being developed to support the needs of the researchers. However, the various corpora of Malay text are distributed and not integrated, where some words are not included or missing in some corpora. The focus of this paper is to develop an integrated corpus that will combine four most comprehensive Malay corpora. The intention is to provide comprehensive coverage of Malay corpora which would be beneficial for any relevant work.