Towards a Malay derivational lexicon: learning affixes using expectation maximization
We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of morphological segmentations produced by a na¨ıve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given segmentation. We train the...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/32082/ http://irep.iium.edu.my/32082/ http://irep.iium.edu.my/32082/1/W11-3005.pdf |
id |
iium-32082 |
---|---|
recordtype |
eprints |
spelling |
iium-320822013-12-26T03:12:22Z http://irep.iium.edu.my/32082/ Towards a Malay derivational lexicon: learning affixes using expectation maximization Sulaiman, Suriani Gasser, Michael Kubler, Sandra QA75 Electronic computers. Computer science We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of morphological segmentations produced by a na¨ıve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given segmentation. We train the model with EM to find the segmentation that maximizes the probability of each morpheme. We extract the set of affix patterns produced by our algorithm and evaluate them against two references: a list of affix patterns extracted from our hand-segmented derivational wordlist and a derivational history produced by a stemmer. 2011 Conference or Workshop Item PeerReviewed application/pdf en http://irep.iium.edu.my/32082/1/W11-3005.pdf Sulaiman, Suriani and Gasser, Michael and Kubler, Sandra (2011) Towards a Malay derivational lexicon: learning affixes using expectation maximization. In: 2nd Workshop on South and Souteast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, 8th-13th Nov. 2011, Chiang Mai, Thailand. http://aclweb.org/anthology//W/W11/W11-3005.pdf |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
International Islamic University Malaysia |
building |
IIUM Repository |
collection |
Online Access |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Sulaiman, Suriani Gasser, Michael Kubler, Sandra Towards a Malay derivational lexicon: learning affixes using expectation maximization |
description |
We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of
morphological segmentations produced by a na¨ıve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given
segmentation. We train the model with EM to find the segmentation that maximizes the probability of each morpheme. We extract the set of affix patterns produced
by our algorithm and evaluate them against two references: a list of affix patterns extracted from our hand-segmented
derivational wordlist and a derivational history produced by a stemmer. |
format |
Conference or Workshop Item |
author |
Sulaiman, Suriani Gasser, Michael Kubler, Sandra |
author_facet |
Sulaiman, Suriani Gasser, Michael Kubler, Sandra |
author_sort |
Sulaiman, Suriani |
title |
Towards a Malay derivational lexicon: learning affixes using expectation maximization |
title_short |
Towards a Malay derivational lexicon: learning affixes using expectation maximization |
title_full |
Towards a Malay derivational lexicon: learning affixes using expectation maximization |
title_fullStr |
Towards a Malay derivational lexicon: learning affixes using expectation maximization |
title_full_unstemmed |
Towards a Malay derivational lexicon: learning affixes using expectation maximization |
title_sort |
towards a malay derivational lexicon: learning affixes using expectation maximization |
publishDate |
2011 |
url |
http://irep.iium.edu.my/32082/ http://irep.iium.edu.my/32082/ http://irep.iium.edu.my/32082/1/W11-3005.pdf |
first_indexed |
2023-09-18T20:46:17Z |
last_indexed |
2023-09-18T20:46:17Z |
_version_ |
1777409696441827328 |