Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz

Huge reliance on computer usage in everyday life, leads to a continuous increase of large data applications in textual forms. The data are reposited to a secondary storage for future usage. Therefore, a relational database (RDB) is most commonly used as a backbone in most application software for or...

Full description

Bibliographic Details
Main Author:	Yafooz, Wael Mohamed Shaher
Format:	Thesis
Language:	English
Published:	2014
Subjects:	Electronic digital computers Database management
Online Access:	http://ir.uitm.edu.my/id/eprint/28040/ http://ir.uitm.edu.my/id/eprint/28040/1/TP_WAEL%20MOHAMED%20SHAHER%20YAFOOZ%20CS%2014_5.pdf

id	uitm-28040
recordtype	eprints
spelling	uitm-280402020-02-03T07:00:58Z http://ir.uitm.edu.my/id/eprint/28040/ Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz Yafooz, Wael Mohamed Shaher Electronic digital computers Database management Huge reliance on computer usage in everyday life, leads to a continuous increase of large data applications in textual forms. The data are reposited to a secondary storage for future usage. Therefore, a relational database (RDB) is most commonly used as a backbone in most application software for organising such data into structured form. The RDB has robust and powerful structures for managing, organising, and retrieving the data. However, the database structure can still contain large amounts of unstructured textual data. Dealing with unstructured textual data leads to three basic issues; users encounter difficulties to find useful information, inaccurate information retrieval and insufficient performance of query processing. Attempts have been made to resolve all of these issues by using several methods such as: full text searching, text indexing, a database schema management, database data model, and query-based techniques. However, the front-end approach, in the form of software applications, are still needed to organise the unstructured textual information in the RDB. This study proposes a Textual Virtual Schema Model (TVSM) as the back-end approach to reorganising textual data inside relational databases, while performing automatic semantic linking and clustering assignments. Upon storing any new unstructured textual data into a database, all words are extracted to uncover the underlying meaning of such data. Their name entities and top most frequent terms are selected for the factors used in a cluster assignment. The model is tested and evaluated by embedding it in a component-based package of a relational databases internal structure. Three experiments have been conducted on textual Reuters corpus, Classic and WAP dataset. The clustering results have been validated using the F-measure, Entropy and Purity methods of measurement and compared with two common methods, which are information extraction and textual document clustering, for example, K-means, Frequent Item-Set, Hierarchical Clustering Algorithms and Oracle Text. The results show that there are linkages between structured textual data and unstructured information, quality improvement in textual document clustering with accurate clusters and high performance of query processing. Thus, the proposed technique can increase retrieval performance and produce high accuracy textual data clusters. This model envisages a beneficial and useful approach for various domains that involve big textual data such as document clustering, topic detecting and tracking, information integration, personal data management and information retrieval. 2014 Thesis NonPeerReviewed text en http://ir.uitm.edu.my/id/eprint/28040/1/TP_WAEL%20MOHAMED%20SHAHER%20YAFOOZ%20CS%2014_5.pdf Yafooz, Wael Mohamed Shaher (2014) Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz. PhD thesis, Universiti Teknologi MARA.
repository_type	Digital Repository
institution_category	Local University
institution	Universiti Teknologi MARA
building	UiTM Institutional Repository
collection	Online Access
language	English
topic	Electronic digital computers Database management
spellingShingle	Electronic digital computers Database management Yafooz, Wael Mohamed Shaher Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
description	Huge reliance on computer usage in everyday life, leads to a continuous increase of large data applications in textual forms. The data are reposited to a secondary storage for future usage. Therefore, a relational database (RDB) is most commonly used as a backbone in most application software for organising such data into structured form. The RDB has robust and powerful structures for managing, organising, and retrieving the data. However, the database structure can still contain large amounts of unstructured textual data. Dealing with unstructured textual data leads to three basic issues; users encounter difficulties to find useful information, inaccurate information retrieval and insufficient performance of query processing. Attempts have been made to resolve all of these issues by using several methods such as: full text searching, text indexing, a database schema management, database data model, and query-based techniques. However, the front-end approach, in the form of software applications, are still needed to organise the unstructured textual information in the RDB. This study proposes a Textual Virtual Schema Model (TVSM) as the back-end approach to reorganising textual data inside relational databases, while performing automatic semantic linking and clustering assignments. Upon storing any new unstructured textual data into a database, all words are extracted to uncover the underlying meaning of such data. Their name entities and top most frequent terms are selected for the factors used in a cluster assignment. The model is tested and evaluated by embedding it in a component-based package of a relational databases internal structure. Three experiments have been conducted on textual Reuters corpus, Classic and WAP dataset. The clustering results have been validated using the F-measure, Entropy and Purity methods of measurement and compared with two common methods, which are information extraction and textual document clustering, for example, K-means, Frequent Item-Set, Hierarchical Clustering Algorithms and Oracle Text. The results show that there are linkages between structured textual data and unstructured information, quality improvement in textual document clustering with accurate clusters and high performance of query processing. Thus, the proposed technique can increase retrieval performance and produce high accuracy textual data clusters. This model envisages a beneficial and useful approach for various domains that involve big textual data such as document clustering, topic detecting and tracking, information integration, personal data management and information retrieval.
format	Thesis
author	Yafooz, Wael Mohamed Shaher
author_facet	Yafooz, Wael Mohamed Shaher
author_sort	Yafooz, Wael Mohamed Shaher
title	Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_short	Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_full	Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_fullStr	Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_full_unstemmed	Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_sort	application of clustering in managing unstructured textual data in relational database / wael mohamed shaher yafooz
publishDate	2014
url	http://ir.uitm.edu.my/id/eprint/28040/ http://ir.uitm.edu.my/id/eprint/28040/1/TP_WAEL%20MOHAMED%20SHAHER%20YAFOOZ%20CS%2014_5.pdf
first_indexed	2023-09-18T23:19:29Z
last_indexed	2023-09-18T23:19:29Z
_version_	1777419334802472960

Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz

Similar Items