Digital Tools Glossary
A
assign
assign to assign a variable is to
give it a value
|
|
Attribute
Attribute a container for
information about the contents of an element,
held in an opening tag, separated from
the element name
by a space and followed by an equal sign, eg. <date
style=”new style”>; elements
can have multiple attributes
|
|
Attribute value
Attribute value attributes are
fixed but their values can vary in each instance, they
follow the equal sign and must be within quotation
marks, eg <person id=”123456”>
|
|
B
Boolean
Boolean a data format in
Python that can only have the
values true or false.
More generally, Boolean operators create logical
structures by combining statements
with and, not and or.
|
|
C
Character data
Character data text
that is ignored by the XML editor or
other software; this
means that characters such as & or < will be
ignored and need not be rendered as entities
|
|
Closing tag
Closing
tag the second part of
an element,
denoted by a forward slash after the opening angle
bracket, eg </body>
|
|
codepoint
codepoint a unique number that
defines a character and the number of bytes needed to
encode it
|
|
concatenation
concatenation the joining
together of multiple things: strings, variables, files,
etc.
|
|
Crowdsourcing
Crowdsourcing the use of
volunteers to contribute to a research project
|
|
D
DocBook
DocBook an
XML format for encoding books and papers
|
|
DTD
DTD Document
Type Definition, a declaration which specifies
which elements
and attributes are allowed in an XML file, and how they
can be used
|
|
E
EAD
EAD (Encoded
Archival Description) an XML markup scheme for encoding
archival finding aids
|
|
EEBO
EEBO (Early
English Books Online) a project to photograph all books
published in England or English between the beginning
of printing and 1700. Originally a microfilm
product, EEBO is
now published on the web by ProQuest.
|
|
EEBO-TCP
EEBO-TCP a
project to take a selection of page
images from EEBO and
produce lightly encoded TEI-conformant XML
transcriptions of the texts
|
|
Element
Element a
discrete piece of markup, usually consisting of an
opening and closing tag
|
|
Entity reference
Entity
reference an encoding for a
particular character that begins with & and ends
with a semi-colon, eg &. Used in HTML for
reserved characters, they also have wider applications.
|
|
J
Join
Join a database manipulation
technique to combine multiple tables into a new table
|
|
L
Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation
(LDA) an algorithm that clusters topics
on the basis of probability (using
the Dirichlet distribution)
|
|
list comprehension
list comprehension a compact,
readable syntax
provided by Python for creating lists
|
|
M
markup
markup embedded annotations to a
text which provide instructions on how elements of
it should be presented, structured or interpreted
|
|
N
Nest
Nest an element that
opens and closes inside another element (its parent)
is nested within it, so date is nested within lang
here: <lang name=”Latin”>Sepultus erat
<date value=”3-10-1609”>tertio die
Octobris</date></lang>
|
|
node
node an element in
a data structure
which is linked to other nodes, often hierarchically as
in a tree diagram
|
|
O
object serialisation
object serialisation the
conversion of a complex data structure
into a series of bytes
|
|
Opening tag
Opening tag the first part of
an element,
bounded by angle brackets, eg <p>
|
|
P
Parse
Parse to read the
structure of an XML document, element by element;
any XML-aware software needs to
parse the document before acting upon it
|
|
Parsed character data
Parsed character data text
that is read by the
XML editor or other software; this
means that any characters which are part of XML syntax,
such as & or < will need to be rendered as
entities if they are to be represented literally
|
|
Plain text
Plain text: text without
any markup. Note that text in word processors, such as
Word, does have markup – you just can’t see it.
|
|
Processing instruction
Processing
instruction an element that
takes the form <? … ?>, which calls
upon software to
act – for example by referring to another file, such as
a stylesheet
|
|
Q
Quantifier
Quantifier a symbol
specifying how many in
a DTD: ?
= none or one; + = one or
many; * = none, one or many.
|
|
R
RelaxNG
RelaxNG an alternative
rules file
format to a DTD or XML
Schema; although we don’t cover it in this course there
is plenty of information about it on the web.
|
|
Root element
Root element everything
in an XML file, apart from the declaration and other
header
information, must go inside one element which
wraps all other elements
|
|
Rules file
Rules file a generic term for
a DTD, XML Schema
or other format specifying the rules of an XML document
|
|
Running text
Running text text in paragraphs
or other long units of narrative, as opposed to text in
tables, lists, headings etc.
|
|
S
scripting language
scripting language a programming
language that does not need to be compiled before it is
run
|
|
stop word
stop word an instruction to
ignore a word when analysing text, creating indexes,
etc
|
|
string
string a data type in
Python, entered in quotation marks and treated as a
literal string; for example the string 21 cannot be
divided by 2, whereas the integer 21 can be
|
|
T
Tag
Tag part of an element,
bounded by angle brackets, eg <h1>
|
|
Text file
Text file a file that can
be read by any
text editor, usually having the file extension .txt
|
|
tuple
tuple a sequence of any number of
values (the name is formed from the suffix of words
like quintuple). In Python, once a
tuple has been created it cannot be changed
|
|
U
Unicode
Unicode a standard for encoding
characters in the world’s writing systems
|
|
V
Valid
Valid and XML document is valid
if it follows the rules specified by the rules file to
which it is linked; additionally a document must be
well formed in order to be valid.
|
|
variable
variable a name for a value; for
example if in Python myage = 21 assigns the value
21 to the variable myage
|
|
W
Web scraping
Web scraping automated collection
of content from web pages
|
|
Well-formed
Well-formed conforming to the
structural rules of XML, i.e. properly
nested elements,
matching case for elements,
and quoted attribute values
|
|
X
XML
XML (Extensible Stylesheet
Language) a markup language which gives
great flexibility to its users in defining content and
structure
|
|
XML declaration
XML
declaration a processing
instruction that goes at the top of an
XML file; the most minimal form is <?xml
version="1.0"?>
|
|
XML Schema
XML Schema a
schema language: an expression of the rules for a
particular XML document, written in XML itself, and
following the syntax specified by the W3C
|
Last modified: Monday, 13 April 2020, 5:52 AM