Name keys: encoding unique name references

name regularization proper name phrase-level encoding
key persName reg placeName rs

Use of the key attribute on persName to uniquely identify individuals

If you want to be able to identify not only the fact that a given reference is a name, but also the unique individual being named, the TEI key attribute is the best method. Providing a name key offers a number of benefits: it allows you to search for individuals regardless of how any particular name reference is spelled; it enables you to disambiguate between individuals who share a name; and it allows you to link a name to an external record with additional information (such as a name authority record).

1. Categories of name

There are several different categories of name which are significant in this context, because they may merit different treatment:

  • names of historical figures who have a broader cultural existence and are likely to appear in multiple texts: Julius Caesar, Charlotte Smith, William Wordsworth, Boudicca, Queen Anne, publishers and printers
  • names of common mythological or fictional figures with a cultural existence independent of any particular text: Juliet, Andromeda, Maid Marion, Clarissa Harlowe, Robin Goodfellow, Gulliver
  • names of fictional figures who only operate within a single text (e.g. characters in minor novels)
  • names of historical individuals who have no broader cultural existence and are not likely to appear in multiple texts: family members mentioned in poems
  • names of project participants and other modern figures who are not likely to be search terms

Keys make the most sense for the first two categories, since these are the most likely to be used as search terms and are also the most likely candidates for linking to external resources such as authority records or biographical databases. They also offer the greatest opportunity for linking between texts in a collection, or across collections.

Keying fictional characters whose range of existence is limited to one or two texts provides much slighter benefits, since it is unlikely that you will be linking out to any external source. Retrieval is not going to be significantly improved, since spelling variation and modes of reference within a given piece of fiction are likely to be fairly consistent. However, there might be certain kinds of research for which having name keys would be useful.

If your project involves historical documents which include names of ordinary people who are mentioned in passing and are unlikely to appear in multiple texts, keying these names is probably unnecessary. However, historical collections in which such names might be repeated (and might benefit from disambiguation) could benefit greatly from keying: for instance, census data, wills, genealogical records, county records, etc.

It may also be sensible to provide keys for project participants, to allow for a simple and effective way of identifying people in workflow documentation and change logs.

2. The key value itself

The TEI provides a key attribute on all elements that have to do with naming (name, rs, persName, roleName, etc.). The value of this attribute can be anything, but it will work best if you come up with a consistent, scalable system for generating keys. By scalable we mean a system that will continue to work well even if you end up needing a large number of values (so using someone’s initials as a key is not a good choice). One good option is some portion of the first and/or last name, plus a series of extra letters or digits to distinguish between people with the same name. Each key needs to be unique: that is, it needs to identify a unique individual, so whatever scheme you use needs to provide enough values to cover the number of individuals you are working with.

If you have human beings transcribing and encoding your documents, using keys that are human-readable (for instance, keys that are based on a person’s name) will be easier to use than entirely random keys (e.g. HGF27FH4), will tend to have fewer transcription errors, and are easier to check by eye. (For instance, the key mcavendis.neu is unlikely to be the correct key for Edward I.) If your keys are being entered and checked automatically then this is less of a concern.

The Women Writers Project uses a system in which each key consists of the first initial plus last name up to a total of nine characters, followed by a period, a randomly generated character, and two checksum characters. The checksum characters are derived computationally from the rest of the key, and allow the accuracy of the key to be checked automatically (by reversing the computation), to catch transcription errors. This system may be more involved than is necessary for most projects, but it works well.

3. Providing keys for other kinds of names

Although key is most useful for names of persons, it can also be used for place names or other types of names in which it is important to provide a unique identification. For place names, it can be useful to distinguish between all of the different Springfields or Nortons that might be mentioned in a large collection of texts. For organizational names, it might be useful to distinguish between identical organizations in different places (e.g. the Town Councils of various towns) in documents which don’t specify the location since it is locally obvious from context. Keying place names, organization names, and any other type of name works according to the same basic principles as keying personal names.

Examples

  • Syd Bauman: sbauman.emt
  • Richard Jones: rjones.neq
  • Moses: moses.znq