Download
One
night five developers, all of whom wore very thick glasses and had
recently been hired by Elephants, Inc., the world's largest purveyor of
elephants and elephant supplies, were familiarizing themselves with the
company's order processing system when they stumbled into a directory
full of XML documents on the main server. "What's this?" the team
leader asked excitedly. None of them had ever heard of XML before so
they decided to split up the files between them and try to figure out
just what this strange and wondrous new technology actually was.
The
first developer, who specialized in optimizing Oracle databases,
printed out a stack of FMPXMLRESULT documents generated by the
FileMaker database where all the orders were stored, and began poring
over them. "So this is XML! Why, it's nothing novel. As anyone can see
who's able, an XML document is nothing but a table!"
"What do
you mean, a table?" replied the second programmer, well versed in
object oriented theory and occupied with a collection of XMI documents
that encoded UML diagrams for the system. "Even a Visual Basic
programmer could see that XML documents aren't tables. Duplicates
aren't allowed in a table relation, unless this is truly some strange
mutation. Classes and objects is what these document are. Indeed, it
should be obvious on the very first pass. An XML document is an object
and a DTD is a class."
"Objects? A strange kind of object,
indeed!" said the third developer, a web designer of some renown, who
had loaded the XHTML user documentation for the order processing system
into Mozilla. "I don't see any types at all. If you think this is an
object, then it's your software I refuse to install. But with all those
stylesheets there, it should be clear to anyone not sedated, that XML
is just HTML updated!"
"HTML? You must be joking" said the
fourth, a computer science professor on sabbatical from MIT, who was
engrossed in an XSLT stylesheet that validated all the other documents
against a Schematron schema. "Look at the clean nesting of hierarchical
structures, each tag matching its partner as it should. I've never seen
HTML that looks this good. What we have here is S-expressions, which is
certainly nothing new. Babbage invented this back in 1882!"
"An
S expression?" queried the technical writer, who was occupied with
documentation for the project written in DocBook. "Maybe that means
something to those in your learned profession. But to me, this looks
just like a FrameMaker MIF file. However, locating the GUI is taking me
awhile."
And so they argued into the night, none of them
willing to give an inch, all of them presenting still more examples to
prove their points, none of them bothering to look at the others'
examples. Indeed, they're probably still arguing today. You can even
hear their shouts from time to time on xml-dev. Their mistake, of
course, was in trying to force XML into the patterns of technologies
they were already familiar with rather than taking it on its own terms.
XML can store data, but it is not a database. XML can serialize
objects, but an XML document is not an object. Web pages can be written
in XML, but XML is not HTML. Functional (and other) programming
languages can be written in XML, but XML is not a programming language.
Books are written in XML, but that doesn't make XML desktop publishing
software.
XML is something truly new that has not been seen
before in the world of computing. There have been precursors to it, and
there are always fanatics who insist on seeing XML through database (or
object, or functional, or S-expression) colored glasses. But XML is
none of these things. It is something genuinely unique and new in the
world of computing; and it can only be understood when you're willing
to accept it on its own terms, rather than forcing it into yesterday's
pigeon holes.
There are a lot of tools, APIs, and
applications in the world that try to pretend XML is something more
familiar to programmers; that it's just a funny kind of database, or
just like an object, or just like remote procedure calls. These APIs
are occasionally useful in very restricted and predictable
environments. However, they are not suitable for processing XML in its
most general format. They work well in their limited domains, but they
fail when presented with XML that steps outside the artificial
boundaries they've defined. XML was designed to be extensible, but it's
a sad fact that many of the tools designed for XML aren't nearly as
extensible as XML itself.
This book is going to show you how
to handle XML in its full generality. It pulls no punches. It does not
pretend that XML is anything except XML, and it shows you how to design
your programs so that they handle real XML in all its messiness: valid
and invalid, mixed and unmixed, typed and untyped, and both all and
none of these at the same time. To that end, this book focuses on those
APIs that don't try to hide the XML. In particular, there are three
major Java APIs that correctly model XML, as opposed to modeling a
particular class of XML documents or some narrow subset of XML. These
are:
-
SAX, the Simple API for XML
-
DOM, the Document Object Model
-
JDOM, a Java native API
These
APIs are the core of this book. In addition I cover a number of
preliminaries and supplements to the basic APIs including:
-
XML syntax
-
DTDs, schemas, and validity
-
XPath
-
XSLT and the TrAX API
-
JAXP, a combination of SAX, DOM, and TrAX with a few factory classes
And,
since we're going to need a few examples of XML applications to
demonstrate the APIs with, I also cover XML-RPC, SOAP, and RSS in some
detail. However, the techniques this book teaches are hardly limited to
just those three applications.
This
book is written for experienced Java programmers who want to integrate
XML into their systems. Java is the ideal language for processing XML
documents. Its strong Unicode support in particular made it the
preferred language for many early implementers. Consequently, more XML
tools have been written in Java than in any other language. More open
source XML tools are written in Java than in any other language. More
programmers process XML in Java than in any other language.
Processing XML with Java will teach you how to:
-
Save XML documents from applications written in Java
-
Read XML documents produced by other programs
-
Search, query, and update XML documents
-
Convert legacy flat data into hierarchical XML
-
Communicate with network servers that send and receive XML data
-
Validate documents against DTDs, schemas, and business rules
-
Combine functional XSLT transforms with traditional imperative Java code
This
book is meant for Java programmers who need to do anything with XML. It
teaches the fundamentals and advanced topics, leaving nothing out. It
is a comprehensive course in processing XML with Java that takes
developers from little knowledge of XML to designing sophisticated XML
applications and parsing complicated documents. The examples cover a
wide range of possible uses including file formats, data exchange,
document transformation, database integration, and more.
This
is not an introductory book with respect to either Java or XML. I
assume you have substantial prior experience with Java and preferably
some experience with XML. On the Java side, I will freely use advanced
features of the language and its class library without explanation or
apology. Among other things, I assume you are thoroughly familiar with:
-
Object oriented programming including inheritance and polymorphism
-
Packages and the CLASSPATH. You should not be surprised by classes that do not have main() methods or that are not in the default package.
-
I/O including streams, readers, and writers. You should understand that System.out is a horrible example of what really goes on in Java programs.
-
The Java Collections API including hash tables, maps, sets, iterators, and lists.
In
addition, in one or two places in this book I'm going to use some SQL
and JDBC. However, these sections are relatively independent of the
rest of the book; and chances are if you aren't already familiar with
SQL, then you don't need the material in these sections anyway.
XML
is deliberately architecture, platform, operating system, GUI, and
language agnostic (in fact, more so than Java). It works equally well
on Mac OS, Windows, Linux, OS/2, various flavors of Unix, and more. It
can be processed with Python, C++, Haskell, ECMAScript, C#, Perl,
Visual Basic, Ruby, and of course Java. No byte order issues need
concern you if you switch between PowerPC, X86, or other architectures.
Almost everything in this book should work equally well on any platform
that's capable of running Java.
Most of the material in this
book is relatively independent of the specific Java version. Java 1.4
bundles SAX, DOM, and a few other useful classes into the core JDK.
However, these are easily installed in earlier JVMs as open source
libraries from the Apache XML Project and other vendors. For the most
part, I used Java 1.3 and 1.4 when testing the examples; and it's
possible that a few classes and methods have been used that are not
available in earlier versions. In most cases, it should be fairly
obvious how to backport them. All of the basic XML APIs except TrAX
should work in Java 1.1 and later. TrAX requires Java 1.2 or later.