MS (Master of Science)
Computer and Information Science
Date of Award
Committee Chair or Co-Chairs
Phillip E. Pfeiffer IV
Christopher D. Wallace, Donald B. Sanderson
The rise of XML as a de facto standard for document and data exchange has created a need to store and query XML documents in relational databases, today's de facto standard for data storage. Two common strategies for storing XML documents in relational databases, a process known as document shredding, are Interval encoding and ORDPATH Encoding. Interval encoding, which uses a fixed mapping for shredding XML documents, tends to favor selection queries, at a potential cost of O(N) for supporting insertion queries. ORDPATH Encoding, which uses a looser mapping for shredding XML, supports fixed-cost insertions, at a potential cost of longer-running selection queries. Experiments conducted for this research suggest that the breakeven point between the two algorithms occurs when users offer an average 1 insertion to every 5.6 queries, relative to documents of between 1.5 MB and 4 MB in size. However, heterogeneous tests of varying mixes of selects and inserts indicate that Interval always outperforms ORDPATH for mixes ranging from 76% selects to 88% selects. Queries for this experiment and sample documents were drawn from the XMark benchmark suite.
Thesis - Open Access
Leonard, Jonathan Lee, "Strategies for Encoding XML Documents in Relational Databases: Comparisons and Contrasts." (2006). Electronic Theses and Dissertations. Paper 2213. https://dc.etsu.edu/etd/2213
Copyright by the authors.