Pashto language stemming algorithm
This paper presents a stemming algorithm for morphological analysis for less popular or minor language like Pashto language. There is lack of resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database sea...
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Penerbit Universiti Kebangsaan Malaysia
2015
|
| Online Access: | http://journalarticle.ukm.my/8852/ http://journalarticle.ukm.my/8852/ http://journalarticle.ukm.my/8852/1/7048-23719-1-PB.pdf |
| Summary: | This paper presents a stemming algorithm for morphological analysis for less popular or minor language like
Pashto language. There is lack of resources and tools that can be applied in different applications such as in
document indexing, clustering, language processing, text analysis, database search systems, information retrieval,
and linguistic applications. The review of literature shows that only a few morphological studies have been
conducted in the Pashto language, and research which focused on automatic stemming has not yet been fully
analysed. In addition, no stemming algorithm has been proposed for extracting Pashto root words from the Pashto
corpus, which is applicable for the above mentioned functions. Therefore, the objective of the current thesis is to
develop a rule-based stemming algorithm for the Pashto language. The Pashto corpus is directly used as the input
and the stemming algorithm uses both inflectional and derivational morphemes. The output is in the form of
meaningful root word without affixes. Furthermore, the accuracy and strength of the proposed algorithm is
evaluated using word count method. To validate the function of the developed algorithm, two native speakers of
Pashto were recruited to evaluate the algorithm in terms of its accuracy and strength. The result of the study shows
that the proposed algorithm has the accuracy of 87%. This study can have a great contribution to Pashto language
in terms of extracting the root words useful for different purposes including data indexing, information retrieval,
linguistic application, etc. This research also lays the ground for further studies on Pashto language analysis. |
|---|