Surely, if you have decided to learn about XML, you are probably
already quite familiar with the concepts behind HTML (HyperText Markup
Language). So let's start from there.
HTML, as its name
implies, is a markup language. As such, it is used to markup text. But
what exactly does it mean to markup text?
Abstractly, marking
up text is a methodology for encoding data with information about
itself. Examples of markups (encoded data) are ubiquitous in the real
world.
For example, back when you were slogging through high
school, you probably used to use a bright yellow highlighter pen to
highlight sentences in your schoolbooks (or at last you knew someone
who did!). You did so because you thought that the highlighted
sentences would be useful to review around exam time and you wanted a
quick way to skim through the important points. Just like you,
thousands of kids around the world did the exact same thing for the
exact same reason.
By highlighting certain bits of text, you
were effectively "marking-up" the data. Essentially, you specified that
certain sentences (data) were important by marking them in yellow.
These sentences became encoded with the fact that they were important.
And
what's more, since everyone followed the same standard of marking up,
you could easily pick up a used text book and get a good idea just from
reading the highlighted sections what were core points of the book.
There
are two crucial points to take away from this example. For markups to
transmit useful information about data to a pool of users...
- a
standard must be in place to define what a valid markup is - In the
example above, markup is defined as a bit of yellow ink atop text. In
HTML a markup is a tag.
- a standard must be in place to define
what markup means - In the example above, a yellow highlight means the
highlighted text represents an important point. In HTML each tag
communicates its own layout of formatting meaning.
Markups
are also ubiquitous in the world of computers. They are used by word
processors to specify formatting and layout, by communications programs
to express the meaning of data sent over the wires, by database
applications that must associate meaning and relationships with the
data they serve, and by multimedia processing programs which must
express meta-data about images or sound.
As data is sent
through dumb computers and programs, it is essential that the data
carries with it information necessary to communicate what the data
means and/or what the receiver should do with that data.
Data with no context is meaningless just as an unhighlighted book is bad news around exam time!
HTML
is one of the more famous computer markup systems. HTML defines a set
of tags that associate formatting rules with bits of text. Documents
which have been marked up (which contain plain text as well as the tags
that specify the rules for formatting that text) are read by an HTML
processing application (a web browser for example) that knows how to
display the text according to the rules.
For example, the
<B> tag specifies a rule which instructs an HTML processing
application to bold a specific bit of text. Similarly, the
<CENTER> tag instructs the HTML processing application to center
the text.
Thus <CENTER><B>BOLD</B></CENTER> would be displayed by an HTML processing application as
You might imagine a client contact list which could look like the following bit of HTML code:
The above HTML-encoded data would be displayed by an HTML processing application as: