Overview of the TEI

Motives for Text Encoding

We’ve been talking so far about the kinds of responsibility that document modelling and text encoding express concerning the text itself: to represent it adequately in an intellectual sense

How about the larger context? Why do we need text encoding, socially? What function does it serve in the larger ecology of scholarship?

There are several other important functions text encoding can perform:

To store information for the long term, in a format that is not vulnerable to changes in hardware and software
served by using an non-proprietary, open data format like XML
To exchange information meaningfully with colleagues and other projects, and to publish it for future use.
to achieve this, you need not just an adequate conception of how you want to model your textual information: you need to agree on this conception with anyone you plan to share data with
in other words, you need some sort of lingua franca, a common standard that expresses what you all agree are the important concepts and structures
for this, you also need some sort of infrastructure for developing and maintaining the markup system and even more importantly its documentation, so that people who want to use it have a place to go find it, learn about it.
you might be able to come up with a perfectly good encoding system all by yourself; if you lived on a desert island, you wouldn’t have any motive to do otherwise
but insofar as text encoding is a community-oriented activity, inventing your own system from scratch can be a very solipsistic activity

This is ultimately why the TEI exists: to provide a long-term, detailed, analytically rich markup system that is understood by an entire community and can be used to create sharable, durable representations of the textual objects that community cares about.

What is the TEI?

Technically: The TEI is a standards organization that exists to create, maintain, and disseminate a standard for humanities text encoding

a common language for encoding humanities documents of all sorts, typically for research or archival purposes
internationally developed and used
widely supported and used within the academy, libraries, museums, anywhere people have important humanities data

Organizationally: The TEI is an international consortium whose members are institutions that want the TEI to continue to exist

Socially: The TEI is a community of people and projects who use text encoding in a wide variety of ways, and who communicate with one another about their research and the practical problems associated with it.

The TEI is also, importantly, the set of guidelines and XML specifications that make up the TEI Guidelines.

first published in 1990; a major release in 1994 (P3) which was the first version to be widely used
an XML version published in 2001 (P4)
now, a fully revised version being prepared, which will use schemas rather than DTDs and will also add a number of new features. This will be released next year as P5.

The TEI Guidelines

The TEI Guidelines are a flexible specification:

Not intended to be difficult or burdensome to use
Not intended to require uniformity from all users
Intended to be adapted and customized
Not unlike a human language: has idiomatic usage, dialects, local usage

Areas of Usage

Digital libraries and digital archives
Literary and cultural materials
Scholarly editions
Manuscript collections and descriptions
Dictionaries
Language corpora
Historical documents
Anthropology and social sciences
Authoring
Many other areas…

Customization

It’s important to note that the TEI is not a fixed tag set that is written in stone

it is intended to be customized: both for users to select a subset of the TEI that they really need, and for users to add elements for particular features in their texts

In fact, the TEI is not really ever used directly in its raw form. In all cases a customized view of the TEI, or a customization is what is used.

When users want to create TEI schemas, they create a customization file that lists the modules they would like to use, the specific elements they would like to add or delete, the attributes they change, etc.

As a result, in actual practice there is both a common core of usage that is more or less universal among TEI projects—the stuff that everyone agrees on—and also beyond it a thinning penumbra of specialized uses and extensions that express the needs of particular groups and projects

There is no single orthodox TEI practice: there are greater and lesser degrees of adherence to a set of central principles and usages

For projects in which consistency and use of a common standard is very important (for instance, digital library collections) there’s greater emphasis on best practices and a tendency to discourage idiosyncrasy, but for projects and individuals where the need for local expressiveness is much greater, specialized TEI methods are very common and arguably essential to good practice

International Use of the TEI

The TEI is intended to serve a wide international community:

Broad range of methods and approaches
Participation from member institutions around the world
Support for multilingual versions of the TEI Guidelines

Future Tendencies for the TEI

Several directions for the TEI in coming years, now that P5 is done:

More and better documentation: for different audiences, for more specific purposes; note that some of this will not come directly from the TEI itself, but from projects using the TEI, like the WWP
More use (and support for use) by individuals
More discipline-specific customizations

Already there are several examples of customizations and detailed documentation being written by particular groups, or by projects that represent the encoding work of those groups:

DALF
WWP
California Digital Libraries
TEI in Libraries
Model Editions Partnership

There are also many others (which are listed at the web site for the seminar). It’s worth looking at these examples; they often explain things in more detail, and also give advice that is more directly aimed at the kind of encoding you’re trying to do or understand

Other encoding possibilities

The TEI is by no means the only encoding language available for humanities scholarship, though it is the most widely used and the most broadly adapted; there are several others that are worth noting because they offer distinctive and useful representational features

Historical Event Markup Language (HEML): provides a way of representing historical event information so that it can serve as the basis for various kinds of geographical and temporal representations
Music Markup Language: not a finished product, but useful to know about
Multi-Element Coding System: developed by the Wittgenstein Archive; non-hierarchical, non-XML: designed for representing very complex manuscript materials in a very detailed way
Writing your own XML language is not as far-fetched as it sounds: the reasons you might not choose to do it are actually more social than technical
Using your own language is potentially isolating: it means you also have to write your own tools to some extent, and you can’t ask for advice as easily
But it also means that you aren’t constrained by other people’s ideas of what is interesting or useful to say
HEML was written, essentially, by an individual scholar who had an idea about representing historical information

Sources of Information

The most important takeaway message here concerning the TEI is that it is adaptable; expressive but requires some thought (in other words, it won’t do the analytical work for you); it’s a tool that has arisen out of certain strands of humanities thought and that carries with it certain assumptions that are worth probing

This grant program (the seminars and the supporting services that accompany them) is intended to encourage faculty and students to learn more about the TEI, and to give them the kinds of information that they will find legible and relevant to their interests