Markup Schemes

Markup Schemes

2. Text Encoding Initiative

TEI has been around since the 1980s, and so it is a mature project. A number of high-profile text collections use TEI, but perhaps the one most likely to be used by historians is EEBO-TCP (we’ll talk about EEBO-TCP later in the module, because the transparency of its procedures is useful for learning TEI for historical texts).

The TEI guidelines structure texts differently from the way in which we’ve been dealing with the PCMs or our letter example. Because the TEI can be applied to any type of text structural units are pretty generic. Here is the first level of a TEI document:

 

<TEI>

<teiHeader></teiHeader>

<text>

<front></front>

<body></body>

<back></back>

</text>

</TEI>

 

By now you will recognise TEI as the root element.

teiHeader contains the metadata: the bibliographic information about text encoded, as well as the name of person doing the encoding. Here is an example, taken from the TEI’s own documentation:

<titleStmt>

  <title>Capgrave's Life of St. John Norbert:  a

         machine-readable transcription</title>

  <respStmt> <resp>compiled by</resp> <name>P.J. Lucas</name> </respStmt>

</titleStmt>

<titleStmt>

  <title>Two stories by Edgar Allen Poe: electronic version</title>

  <author>Poe, Edgar Allen (1809-1849)</author>

  <respStmt>

    <resp>compiled by</resp> <name>James D. Benson</name>

  </respStmt>

</titleStmt>

<titleStmt>

  <title>Yogadar&sacute;anam (arth&amacr;t

         yogas&umacr;trap&macr;&tdot;ha&hdot;):

         a machine readable transcription.</title>

  <title>The Yogas&umacr;utras of Pata&ntilde;jali:

         a machine readable transcription.</title>

  <funder>Wellcome Institute for the History of Medicine</funder>

  <principal>Dominik Wujastyk</principal>

  <respStmt><name>Wieslaw Mical</name>

        <resp>data entry and proof correction</resp>

  </respStmt>

  <respStmt><name>Jan Hajic</name>

            <resp>conversion to TEI-conformant markup</resp></respStmt>

</titleStmt>

 

This might look a bit fearsome, but it is easier to grapple with when you are working with a specific, known example.

text normally wraps the rest of the entire file, but – because it’s not the root element – you can have as many text elements as you like: you could have multiple occurrences for a collection of texts, if you wanted to.

front and  back contain any front matter or back matter, such as a preface or an index; whereas the main body of the text goes in, well, body. front and back can be omitted if not needed. Sometimes what constitutes front and back matter will be a matter of personal choice but if you break with convention too much (for example, putting an editor’s note in body, then you will be diluting the usefulness of marking up in TEI in the first place).

Within front, back and body, the main unit used is div. This represents a broad division, and it can either be numbered (in which case div2 nests inside div1, all the way down to div7 – if you want this level of hierarchy, which you probably won’t  –  or unnumbered.

<body>

<div></div>

<div></div>

<body>

 If div is unnumbered then you can nest one div inside another. This might become confusing but it's certainly allowed within TEI.

Or

<body>

<div1>

<div2>

<div3>

</div3>

</div2>

</div1>

</body>

 

In book of letters we wouldn’t use letter as an element name but put it in the attribute value type, which is available for divs:

 

<text>

<front>

<div1 type=”introduction”></div1>

</front>

<body>

<div1 type=”letter”></div1>

<div1 type=”letter”></div1>

<div1 type=”letter”></div1>

<div1 type=”letter”></div1>

</body>

</text>

 

We might also use this simple structure for a book that just contained chapters, with each chapter becoming <div1 type=”chapter”>.

A good example of a text with a hierarchical structure might be a play, with two levels of hierarchy: act and scene:

<body>

<div1 type=”act” n=”1”>

<div2 type=”scene” n=”1”></div2>

<div2 type=”scene” n=”2”></div2>

</div1>

</body>