Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman

Clustering is an overlapping method found in many areas such as data mining, machine learning, pattern recognition, bioinformatics and information retrieval. The goal of clustering is to group any similar objects into a cluster, while the other objects that are not similar in the different clusters....

Full description

Bibliographic Details
Main Author:	Seman, Ali
Format:	Thesis
Language:	English
Published:	2013
Subjects:	Malaysia
Online Access:	http://ir.uitm.edu.my/id/eprint/15171/ http://ir.uitm.edu.my/id/eprint/15171/1/TP_ALI%20SEMAN%20CS%2013_5.pdf

id	uitm-15171
recordtype	eprints
spelling	uitm-151712016-10-25T04:17:00Z http://ir.uitm.edu.my/id/eprint/15171/ Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman Seman, Ali Malaysia Clustering is an overlapping method found in many areas such as data mining, machine learning, pattern recognition, bioinformatics and information retrieval. The goal of clustering is to group any similar objects into a cluster, while the other objects that are not similar in the different clusters. Meanwhile, Y-Short Tandem Repeats (YSTR) is the tandem repeats on Y-Chromosome. The Y-STR data is now being utilized for distinguishing lineages and their relationships applied in many applications such as genetic genealogy, forensic genetic and anthropological genetic applications. This research tends to partition the Y-STR data into groups of similar genetic distances. The genetic distance is measured by comparing the allele values and their modal haplotypes. Nevertheless, the distances among the Y-STR data are typically found similar or very similar to each other. They are characterized by the higher degree of similarity of objects in intra-classes and also inter-classes. In some cases, they are quite distant and sparseness. This uniqueness of Y-STR data has become problematic in partitioning the data using the existing partitional clustering algorithms. The main problem was essentially caused by the mode mechanism (problem P?) which was unable to handle the characteristics of Y-STR data, thus producing poor clustering results. The problem has become worst when the initial centroid selection which is also known as problem Po failed to obtain good centroids. These conditions have led the existing partitional algorithms to local minima and empty clusters problems. As a result, a new idea of problem P2 using the objects (medoid) themselves was introduced. The idea was incorporated into a new algorithm called, k-Approximate Modal Haplotypes (&-AMH) algorithm. Six Y-STR data sets were used as a benchmark to evaluate the performances of the algorithm against the other eight partitional clustering algorithms. Out of six data sets, the &-AMH algorithm obtained the highest mean accuracy scores for the five data sets and one data set was at equal performance. For the overall performances which were based on the six data sets, the &-AMH algorithm recorded the highest mean accuracy scores of 0.93 as compared to the other algorithms: the ^-Population (0.91), the &-Modes-RVF (0.81), the New Fuzzy &-Modes (0.80), A:-Modes (0.76), &-Modes-HI (0.76), £-Modes- HII (0.75), Fuzzy £-Modes (0.74) and £-Modes-UAVM (0.70). A One-Way ANOVA test also indicated that the clustering accuracy scores of &-AMH algorithm was significantly different as compared to the other eight partitional algorithms. In addition, the algorithm was also efficient in terms of time complexity which was recorded as O (km(n-k) and considered as linear. Thus, the &-AMH algorithm has been bounded with good characteristics of a desired algorithm - scalability 2013 Thesis NonPeerReviewed text en http://ir.uitm.edu.my/id/eprint/15171/1/TP_ALI%20SEMAN%20CS%2013_5.pdf Seman, Ali (2013) Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman. PhD thesis, Universiti Teknologi MARA.
repository_type	Digital Repository
institution_category	Local University
institution	Universiti Teknologi MARA
building	UiTM Institutional Repository
collection	Online Access
language	English
topic	Malaysia
spellingShingle	Malaysia Seman, Ali Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman
description	Clustering is an overlapping method found in many areas such as data mining, machine learning, pattern recognition, bioinformatics and information retrieval. The goal of clustering is to group any similar objects into a cluster, while the other objects that are not similar in the different clusters. Meanwhile, Y-Short Tandem Repeats (YSTR) is the tandem repeats on Y-Chromosome. The Y-STR data is now being utilized for distinguishing lineages and their relationships applied in many applications such as genetic genealogy, forensic genetic and anthropological genetic applications. This research tends to partition the Y-STR data into groups of similar genetic distances. The genetic distance is measured by comparing the allele values and their modal haplotypes. Nevertheless, the distances among the Y-STR data are typically found similar or very similar to each other. They are characterized by the higher degree of similarity of objects in intra-classes and also inter-classes. In some cases, they are quite distant and sparseness. This uniqueness of Y-STR data has become problematic in partitioning the data using the existing partitional clustering algorithms. The main problem was essentially caused by the mode mechanism (problem P?) which was unable to handle the characteristics of Y-STR data, thus producing poor clustering results. The problem has become worst when the initial centroid selection which is also known as problem Po failed to obtain good centroids. These conditions have led the existing partitional algorithms to local minima and empty clusters problems. As a result, a new idea of problem P2 using the objects (medoid) themselves was introduced. The idea was incorporated into a new algorithm called, k-Approximate Modal Haplotypes (&-AMH) algorithm. Six Y-STR data sets were used as a benchmark to evaluate the performances of the algorithm against the other eight partitional clustering algorithms. Out of six data sets, the &-AMH algorithm obtained the highest mean accuracy scores for the five data sets and one data set was at equal performance. For the overall performances which were based on the six data sets, the &-AMH algorithm recorded the highest mean accuracy scores of 0.93 as compared to the other algorithms: the ^-Population (0.91), the &-Modes-RVF (0.81), the New Fuzzy &-Modes (0.80), A:-Modes (0.76), &-Modes-HI (0.76), £-Modes- HII (0.75), Fuzzy £-Modes (0.74) and £-Modes-UAVM (0.70). A One-Way ANOVA test also indicated that the clustering accuracy scores of &-AMH algorithm was significantly different as compared to the other eight partitional algorithms. In addition, the algorithm was also efficient in terms of time complexity which was recorded as O (km(n-k) and considered as linear. Thus, the &-AMH algorithm has been bounded with good characteristics of a desired algorithm - scalability
format	Thesis
author	Seman, Ali
author_facet	Seman, Ali
author_sort	Seman, Ali
title	Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman
title_short	Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman
title_full	Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman
title_fullStr	Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman
title_full_unstemmed	Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman
title_sort	partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / ali seman
publishDate	2013
url	http://ir.uitm.edu.my/id/eprint/15171/ http://ir.uitm.edu.my/id/eprint/15171/1/TP_ALI%20SEMAN%20CS%2013_5.pdf
first_indexed	2023-09-18T22:53:16Z
last_indexed	2023-09-18T22:53:16Z
_version_	1777417685405007872

Partitional clustering algorithms for highly similar and sparseness y-short tandem repeat data / Ali Seman

Similar Items