In this paper we will
discuss strategies for transferring data between XML documents and
relational databases according to two mappings (a table-based mapping
and an object-based mapping) commonly used to map DTDs to relational
databases. Although the discussion largely focuses on the difference
between using SAX- and DOM-based tools to transfer data, it also
discusses a number of strategies for traversing both the XML and
database hierarchies and the tradeoffs among them.
IMPORTANT:
The table-based mapping and the object-based mapping are not discussed
in this paper. However, you must understand them before reading this
paper. For more information, see Mapping DTDs to Databases.
WARNING:
This is not the best paper I have ever written, largely due to lack of
time to research all of the issues involved. If you find factual errors
-- I'm sure there are more than what have already been found -- please
contact me. Thanks.
The
table-based mapping views an XML document as a serialized table. That
is, the structure of the document must be as follows, where the
outermost (<Tables>) element is needed only if the document
contains more than one table:
<Tables> <Table> <Row> <Column_1>...</Column_1> ... <Column_n>...</Column_n> </Row> <Row> <Column_1>...</Column_1> ... <Column_n>...</Column_n> </Row> ... </Table> </Tables>
The
table-based mapping has the advantage of simplicity, which makes it
useful as a model for writing data transfer tools. However, because the
model applies only to a limited set of XML documents, the tools are of
limited use. They are most useful for using XML to transfer data
between relational databases or for providing single-table access to
the database.
For a complete discussion of the table-based mapping, see see Mapping DTDs to Databases".
The code to transfer data from XML to the database follows a common pattern, regardless of whether it uses SAX or DOM:
- Table element start: prepare an INSERT statement
- Row element start: clear INSERT statement parameters
- Column elements: buffer PCDATA and set INSERT statement parameters
- Row element end: execute INSERT statement
- Table element end: close INSERT statement
The
code does not make any assumptions about the names of the tags. In
fact, it uses the name of the table-level tag to build the INSERT
statement and the names of the column-level tags to identify parameters
in the INSERT statement. Thus, these names could correspond exactly to
the names in the database or could be mapped to names in the database
using a configuration file.
Here is the code using SAX for a document containing a single table:
int state = UNKNOWN; PreparedStatement stmt; StringBuffer data; public void startElement(String uri, String name, String qName, Attributes attr) { if (state == UNKNOWN) { stmt = getInsertStmt(name); state = TABLE; } else if (state == TABLE) { state = ROW; stmt.clearParameters(); } else if (state == ROW) { state = COLUMN; data = new StringBuffer(); } else { // if (state == COLUMN) throw new SAXException("XML document nested too deep."); } } public void characters (char[] chars, int start, int length) { if (state == COLUMN) data.append(chars, start, length); } public void endElement(String uri, String name, String qName) { if (state == TABLE) { stmt.close(); state = UNKNOWN; } else if (state == ROW) { stmt.executeUpdate(); state = TABLE; } else if (state == COLUMN) { setParameter(stmt, name, data.toString()); state = ROW; } else { // if (state == UNKNOWN) throw new SAXException("Invalid program state."); } }
And here is the code using DOM for a document containing a single table:
void processTable(Document doc) { Element table = doc.getDocumentElement(); PreparedStatement stmt = getInsertStmt(table.getNodeName()); Element row = (Element)table.getFirstChild(); while (row != null) { stmt.clearParameters(); processRow(stmt, row); row = row.getNextSibling(); } } void processRow(PreparedStatement stmt, Element row) { Element column = (Element)row.getFirstChild(); while (column != null) { processColumn(stmt, column); column = column.getNextSibling(); } stmt.executeUpdate(); } void processColumn(PreparedStatement stmt, Element column) { String data = getTextChildren(column); setParameter(stmt, column.getNodeName(), data); }
As you can see, the code is functionally identical, the only difference being the use of SAX or DOM.
Download