Evaluation of XML documents queries based on native XML database

As the amount of data available on the Internet grows rapidly, more and more of the data becomes semi structured. The Extensible Markup Language (XML), as a format for semi structured data, has become a standard for the representation and exchange of data over the Internet. Early in the XML history...

Full description

Bibliographic Details
Main Author: Lazim, Raghad Yaseen
Format: Undergraduates Project Papers
Language:English
English
English
English
Published: 2016
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/18104/
http://umpir.ump.edu.my/id/eprint/18104/
http://umpir.ump.edu.my/id/eprint/18104/1/Evaluation%20of%20XML%20documents%20queries%20based%20on%20native%20XML%20database-Table%20of%20contents.pdf
http://umpir.ump.edu.my/id/eprint/18104/7/Evaluation%20of%20XML%20documents%20queries%20based%20on%20native%20XML%20database-Abstract.pdf
http://umpir.ump.edu.my/id/eprint/18104/8/Evaluation%20of%20XML%20documents%20queries%20based%20on%20native%20XML%20database-Chapter%201.pdf
http://umpir.ump.edu.my/id/eprint/18104/17/Evaluation%20of%20XML%20documents%20queries%20based%20on%20native%20XML%20database-References.pdf
Description
Summary:As the amount of data available on the Internet grows rapidly, more and more of the data becomes semi structured. The Extensible Markup Language (XML), as a format for semi structured data, has become a standard for the representation and exchange of data over the Internet. Early in the XML history there were thoughts about whether XML is different from other data formats that require a database of its own. The popularity and wide-spread use of XML among a diverse set of organizations has engendered a rethinking of the storage and retrieval practices for data. Most early XML storage practices relied on mappings and transformations between XML data trees and relational database tables within a Relational Database. Though relational databases can represent nested data structures by using tables with foreign keys, it is still difficult to search these structures for objects at an unknown depth of nesting; by contrary, it is a potential advantage in XML. Also, the nested and repeating elements in XML documents can quite easily result in an unmanageable number of tables. Furthermore, it is usually very difficult after insertion to change the relational schema due to XML schema changes. The limitations of relational approaches are now well known. Moreover, local update to the document should not cause drastic changes to the whole storage system. Therefore, the design of the storage system should trade-off between the query performance and update costs. This study is to evaluate the Native XML database (NXD) performance in a comparison with XML_Enabled Database (XED), and then to ellhanceĀ· Entity Relationship (ER) algorithm of the relational schema for the improvement of Insert, Delete, Update and Search XML document (XML files with a large number of elements) and finally, to validate the algorithm in NXD and compare the performance ofXED and NXD, by implementing the same command and control data model. Five different sizes of datasets have been used (65.8, 101, 117, 127, 183 MB). Benchmark techniques is used to measure the performance. XMark and XMark-1 are two main tools of Benchmarks in the research field, and they have used for the dataset. The performance of a system can be measured by using datasets of. varying sizes, different documents with different features. The size of XML documents and the number of elements have been determined by the factor of the main driver of generation. The result of this study shown that XED has better performance for the datasets <= 117 MB. The performance of XED begins to decline with the increase in the size of XML data(> 127 MB), while NXD shown better performance in for the data(=> 127 MB). NXD produced better results in the reporting section, which implies that the Nf{D X-Query has performance gains from query optimization. Most of the figures show that the XED starts better, but becomes worse as data size grows. The difference becomes obvious as the query becomes more complicated.