Degree Name

MS (Master of Science)

Program

Computer and Information Science

Date of Award

5-2006

Committee Chair or Co-Chairs

Phillip E. Pfeiffer IV

Committee Members

Christopher D. Wallace, Donald B. Sanderson

Abstract

The rise of XML as a de facto standard for document and data exchange has created a need to store and query XML documents in relational databases, today's de facto standard for data storage. Two common strategies for storing XML documents in relational databases, a process known as document shredding, are Interval encoding and ORDPATH Encoding. Interval encoding, which uses a fixed mapping for shredding XML documents, tends to favor selection queries, at a potential cost of O(N) for supporting insertion queries. ORDPATH Encoding, which uses a looser mapping for shredding XML, supports fixed-cost insertions, at a potential cost of longer-running selection queries. Experiments conducted for this research suggest that the breakeven point between the two algorithms occurs when users offer an average 1 insertion to every 5.6 queries, relative to documents of between 1.5 MB and 4 MB in size. However, heterogeneous tests of varying mixes of selects and inserts indicate that Interval always outperforms ORDPATH for mixes ranging from 76% selects to 88% selects. Queries for this experiment and sample documents were drawn from the XMark benchmark suite.

Document Type

Thesis - unrestricted

Copyright

Copyright by the authors.

Share

COinS