An introduction to markup

2. What are markup languages?

Markup languages are ways of annotating an electronic document. Usually markup will either specify how something should be displayed or what something means. The origin of the term is in typesetting, where proofs were marked up with instructions about their visual appearance, but the term then broadened to include the semantic perspective that we’re interested in here.

The names of the most popular languages usually end with Markup Language and so are abbreviated as something-ML: for example,

  • HTML – Hypertext Markup Language
  • KML – Keyhole Markup Language
  • MathML – Mathematical Markup Language
  • SGML – Standard Generalized Markup Language
  • XHTML – eXtensible Hypertext Markup Language
  • XML – eXtensible Markup Language

In this course we are going to concentrate on XML because, as its name suggests, it is extensible: that means that you can adapt it to your own needs and focus when marking up texts. Although the list above may look a bit intimidating, actually if you learn XML you will be well equipped to tackle the others (they are all based on SGML, so there are strong family resemblances).

HTML is the most well-known markup language on the list and it is also the most forgiving. However, it has a limited number of basic elements, so we’ll use it as a gentle introduction to XML. You may be confident with HTML already – feel free to skip the rest of this handbook if so (although take a quick look at section 3.5 as we will be returning to the sample text later in this module).