Plasma GitLab Archive
Projects Blog Knowledge

Programming with Objective Caml Document Tree

Caml Components

WWW Consortium
The PXP manual
Specification of XML 1.0



The Extensible Markup Language, or XML for short, is a configurable language for structured documents or structured data. XML is related to the well-known HTML language because it can be applied in a similar way, i.e. you can easily transform HTML documents to XML, but it is more general because it is not restricted to browsers and the World-wide Web. XML uses the same tag notation as HTML (such as "<tag>"), but unlike HTML the set of tags is not fixed but can be declared in a so-called DTD (document type definition). Almost every kind of text document or collection of text data fields can be expressed by an XML instance, for example the documents written with your favourite word processor[1], documents describing both the contents and the overall behaviour of Web applications, or data records which are interchanged across several data bases or data processing systems.

What XML makes possible is not new. Before XML was invented, there had already been a standard for structured documents: SGML. The definition of SGML is rather complicated, mainly because SGML tries to achieve too many goals at once; for example, SGML demands on the one hand that the document instances match exactly the format declared in the DTD, i.e. it is designed to be processed by machines, but allows on the other hand lots of abbreviations such that instances can be easier entered by human beings. Actually, XML is a subset of SGML, and only features which are really needed have been adopted; XML is a lightweight version of SGML. For example, the SGML rule that end tags can be omitted in some situations was dropped; every element must have a start tag and an end tag. Because of such simplifications XML parsers are much simpler to implement and need much less lines of code.

These simplifications do not mean that XML has fewer applications than SGML; it is only a different philosophy. XML specifies only the minimum core shared by all applications; you can customize XML by adding further constraints or by using pre- and postprocessors. There currently many groups working on specific XML formats such as chemical formulas or account booking by defining DTDs with additional constraints[2]. The simplicity makes XML more available than SGML ever was, there are already XML parsers for many programming languages.

This particular document you are currently reading is actually an XML document. A postprocessor transforms it to HTML such that every browser can display it; by using XML for the original document many features can be added compared to HTML: There is a footnote mechanism collecting footnotes embedded in the text; there is a generator which creates the nice headlines automatically; all URLs to that I refer are defined in a single file; and many more. (This processor will be available once it is stable enough.)

This has been announced by the manufactures of such software; XML will supersede RTF as vendor-independent exchange format.
The DTD can only declare the overall structure of documents, not the exact format of every data field. Because of this, an XML format consists usually of a DTD and additional rules derived from the intended semantics.
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml