XML, extensible and universal data format
XML, the eXtended Markup Language, is a successor for SGML like HTML but more generic, it incorporates data inside tags themselves and has unlimited
description capacities.
The format of the display is independant, and given
by another document, the XSLT. Rules to create tags and check their validity are defined by another
document, the DTD (Document Type Declaration) which describes the grammar
of the tags, a Schema, another validation format written itself in XML.
XML, like HTML is processed by many programming languages, and mainly JavaScript, through the Document Object Model. The document is fully loaded into memory and its structure is stored so as to provide access to any tag or set of tags, thanks to internal search tools.
For larger documents, we use the SAX mode instead, and in this case the XML code is read one line at once from the file and each tag is processed once loaded into memory
XML is characterized by significant tags, and the the meaning of tags depends upon the content the the tool which parses the XML document. Names of tags are chosen for the readability of the document, their role depends entirely on tools that will access it.
Example of code, storing an invoice in XML:
<?xml version="1.0" ?>
<!- Invoice from Scriptol.com ->
<invoice>
<order>000156</order>
<date timezone="Greenwhich">
Jan 1, 2003 14:30:00
</date>
<address>
<firstName>Sherlock</firstName>
<lastName>Holmes</lastName>
<street>5 Baker St.</street>
<city>London</city>
<state>England</state>
<zip>75004</zip>
</address>
<amount> 270 </amount>
</order>
</invoice>
A separate document is used for the presentation and for validity check. Browsers support several doctypes that describes different version of the XHTML or HTML code, and are able to process the CSS presentation language for each version.
Beyond textual document
XML is just a semantic language with a basic syntax, a tool "talking", ie converting works into actions. It is not just to contain text.
Start by looking at some applications of the SVG language. They are amazing, made of vector graphics that can even be animated. But SVG is nothing more than a subset of XML which is associated with an API. Tags become rectangles or various shapes and attributes the parameters that vary to obtain motion. SVG is a language understood by the browser (or other SVG rendering tool) to represent scaling images.
Another example is the RSS format. Once assigned a role to each tag, a list of links and descriptions becomes a press review.
In XHTML dialect every tag to a layout role. It is a subset of XML semantically equivalent to HTML that tells to browser how to present multimedia contents.
XML or JSON?
We could express Web pages also in JSON files, this would reduce the file size, but probably would has slowed the development of the Web as HTML code is much more accessible to non-programmers.
For an application the choice of format is detailed in the article JSON or XML, which format to choose? But do we really need to choose? These are two ways to present the same structured content and converting a format to the other is not complicated. In fact, once the content loaded into memory and translated into objects and attributes, to serialize as XML or JSON file is just a matter of personal convenience.
The purpose of the article is mainly to decide when one or the other format is best suited for storing data, depending on the language or system that use it.
XML tools
Whether to access data or to change the document, or convert it into another format, several classes of tools are used.
Parsers
There are two types of parsers. A tree parser loads the whole XML document entirely in memory, and you can then access the contents through the Document Object Model, specifically with instructions such as getElementsByTagName.
An event-driven parser on the contrary, according to the SAX API, loads the content progressively and all the data are stored, or only those that are asked.
- Pugixml. Lightweight and very fast tool written in C++ and integrated into a program as source or binary.
- Xerces. A Java or C++ Parser for XML (Xerces) is distributed by the Apache group. Several other XML tools also.
- LibXML. C library using the DOM or SAX APIs.
- Expat. Library to build a events parser (used by Scriptol and Xcheck).
We does not need XML parser in the JavaScript + Node + HTML environment. This is integrated to the browser. These parsers are therefore only useful for the C++, Java, or other language.
It you want just check if an XML document is well-formed, download here XCheck, a code checker for Windows.
XQuery
XQuery is a language for XML database query, either from a file or a database with a tree structure similar to that of XML as Apache's XIndice. It allows to create an XML database and use it. Get the GNU implementation.
XSLT
An XSL language is made of transformation rules. XSLT converts an XML document into another format such as HTML and can be used to access data too. Xalan transforms an XML document into HTML. There are Java and C++ versions.
Editors
Doctored. Simplified editor for XML and DocBook 5, shows the code without brackets. Multiple schemas. Open source on GitHub.
Virtual machine
Xmlvm is a sort of bytecode. It may be compiled to Objective C, JavaScript, Java bytecode.