Regularization: orig and reg

citation typography normalization biblical reference regularization
bibl reg regMe orig

In addition to the silent forms of regularization discussed in Regularization: silent, there are several kinds of information which may call for more explicit treatment. In P4, the TEI provides the orig and reg elements, which mirror one another in their function:

Using orig gives primary weight to the source reading; the reg element gives primary weight to the regularized reading. These elements may be used for old spellings, typography, reference formats, representation of numbers, and other textual features which it is useful to represent in a regularized form. In cases where only the regularization is desired, you can use the reg element without the orig attribute, to indicate that the reading in question has been regularized (and perhaps referring the reader to your documentation).

For the kinds of archival projects to which this Guide is addressed, we recommend using orig rather than reg, since it gives emphasis to an accurate transcription of the original text as a source of historical evidence. However, we also regard the use of reg as an important way of providing valuable extra information, which can be used to improve searching and to provide additional display options for readers.

Some particular forms of regularization that may be useful for projects using this Guide:

  1. We recommend regularizing old-style typography, in which i/j, u/v, and w/vv are interchanged. In these cases, the original reading is encoded as the content of orig, and the regularized version is encoded as the value of the reg attribute. We tag the smallest applicable unit of regularization (usually the letter). For more details of the encoding of early typography, see Early typography and letter substitutions.
  2. Modernization of spelling follows the same logic as above, but because it must typically be carried out on a much larger scale it poses special problems. Projects may choose to modernize the use of ijuvw without also modernizing spelling in general, since it represents a smaller and more manageable challenge and can be (in part) automated. If you do wish to modernize spelling, you should use the orig or reg element as above; we recommend orig.
  3. For projects dealing with literary texts, we recommend regularizing bibliographic references to texts for which a standard reference system exists (e.g. Homer, the Bible, Virgil, etc.). Such encoding not only produces better search and analysis possibities, but also makes it possible to link out to digital versions of these sources in the future. You can use orig with a reg attribute to encode and standardize these citations. The reg attribute should contain a citation in a standardized format, e.g. Gen_1:13. Although it would be ideally desirable to use a widely shared format for such citations, if one can be ascertained, the consistency of your format matters more than your use of any particular existing format. It is comparatively easy to convert from one consistent format to another.