Overview of the TEI
Julia Flanders
2008-07-21/24
Motives for Text Encoding
We’ve been talking so far about the kinds of responsibility that document modelling
and text encoding express concerning the text itself: to represent it adequately in
an intellectual sense
How about the larger context? Why do we need text encoding, socially? What function
does it serve in the larger ecology of scholarship?
There are several other important functions text encoding can perform:
- To store information for the long term, in a format that is not vulnerable to changes
in hardware and software
- served by using an non-proprietary, open data format like XML
- To exchange information meaningfully with colleagues and other projects, and to publish
it for future
use.
- to achieve this, you need not just an adequate conception of how you want to model
your textual information: you need to agree on this conception with anyone you plan
to share data with
- in other words, you need some sort of lingua franca, a common standard that expresses
what you all agree are the important concepts and structures
- for this, you also need some sort of infrastructure for developing and maintaining
the
markup system and even more importantly its documentation, so that people who want
to use it
have a place to go find it, learn about it.
- you might be able to come up with a perfectly good encoding system all by yourself;
if
you lived on a desert island, you wouldn’t have any motive to do otherwise
- but insofar as text encoding is a community-oriented activity, inventing your own
system from scratch can be a very solipsistic activity
This is ultimately why the TEI exists: to provide a long-term, detailed, analytically
rich markup system that is understood by an entire community and can be used to create
sharable, durable representations of the textual objects that community cares about.
What is the TEI?
Technically: The TEI is a standards organization that exists to create, maintain,
and
disseminate a standard for humanities text encoding
- a common language for encoding humanities documents of all sorts, typically for
research or archival purposes
- internationally developed and used
- widely supported and used within the academy, libraries, museums, anywhere people
have
important humanities data
Organizationally: The TEI is an international consortium whose members are institutions
that
want the TEI to continue to exist
Socially: The TEI is a community of people and projects who use text encoding in a
wide
variety of ways, and who communicate with one another about their research and
the practical
problems associated with it.
The TEI is also, importantly, the set of guidelines and XML specifications that make
up the
TEI Guidelines.
- first published in 1990; a major release in 1994 (P3) which was the first version
to be
widely used
- an XML version published in 2001 (P4)
- now, a fully revised version being prepared, which will use schemas rather than DTDs
and will also add a number of new features. This will be released next year as P5.
The TEI Guidelines
The TEI Guidelines are a flexible specification:
- Not intended to be difficult or burdensome to use
- Not intended to require uniformity from all users
- Intended to be adapted and customized
- Not unlike a human language: has idiomatic usage, dialects, local usage
Areas of Usage
- Digital libraries and digital archives
- Literary and cultural materials
- Scholarly editions
- Manuscript collections and descriptions
- Dictionaries
- Language corpora
- Historical documents
- Anthropology and social sciences
- Authoring
- Many other areas…
Customization
It’s important to note that the TEI is not a fixed tag set that is written in stone
- it is intended to be customized: both for users to select a subset of the TEI that
they
really need, and for users to add elements for particular features in their texts
In fact, the TEI is not really ever used directly in its raw form. In all
cases a customized view of the TEI,
or a customization is what is used.
When users want to create TEI schemas, they create a
customization file that lists the modules they would
like to use, the specific elements they would like to
add or delete, the attributes they change, etc.
As a result, in actual practice there is both a common core of usage that is more
or less universal among TEI projects—the stuff that everyone agrees on—and also beyond
it a thinning penumbra of specialized uses and extensions that express the needs of
particular groups and projects
There is no single orthodox TEI practice: there are greater and lesser degrees of
adherence to a set of central principles and usages
For projects in which consistency and use of a common standard is very important (for
instance, digital library collections) there’s greater emphasis on best practices
and a tendency to discourage idiosyncrasy, but for projects and individuals where
the need for local expressiveness is much greater, specialized TEI methods are very
common and arguably essential to good practice
International Use of the TEI
The TEI is intended to serve a wide international community:
- Broad range of methods and approaches
- Participation from member institutions around the world
- Support for multilingual versions of the TEI Guidelines
Future Tendencies for the TEI
Several directions for the TEI in coming years, now that P5 is done:
- More and better documentation: for different audiences, for more specific purposes;
note
that some of this will not come directly from the TEI itself, but from projects
using the
TEI, like the WWP
- More use (and support for use) by individuals
- More discipline-specific customizations
Already there are several examples of customizations and detailed documentation being
written by particular groups, or by projects that represent the encoding work
of those groups:
- DALF
- WWP
- California Digital Libraries
- TEI in Libraries
- Model Editions Partnership
There are also many others (which are listed at the web site for the seminar). It’s
worth looking at these examples; they often explain things in more detail, and also
give advice that is more directly aimed at the kind of encoding you’re trying to do
or understand
Other encoding possibilities
The TEI is by no means the only encoding language available for humanities scholarship,
though it is the most widely used and the most broadly adapted; there are several
others that
are worth noting because they offer distinctive and useful representational features
- Historical Event Markup Language (HEML): provides a way of representing historical
event
information so that it can serve as the basis for various kinds of geographical
and temporal
representations
- Music Markup Language: not a finished product, but useful to know about
- Multi-Element Coding System: developed by the Wittgenstein Archive; non-hierarchical,
non-XML: designed for representing very complex manuscript materials in a very
detailed way
- Writing your own XML language is not as far-fetched as it sounds: the reasons you
might
not choose to do it are actually more social than technical
- Using your own language is potentially isolating: it means you also have to write
your
own tools to some extent, and you can’t ask for advice as easily
- But it also means that you aren’t constrained by other people’s ideas of what is
interesting or useful to say
- HEML was written, essentially, by an individual scholar who had an idea about
representing historical information
Sources of Information
The most important takeaway message here concerning the TEI is that it is adaptable;
expressive but requires some thought (in other words, it won’t do the analytical work
for you); it’s a tool that has arisen out of certain strands of humanities thought
and that carries with it certain assumptions that are worth probing
This grant program (the seminars and the supporting services that accompany them)
is intended to encourage faculty and students to learn more about the TEI, and to
give them the kinds of information that they will find legible and relevant to their
interests