Representation of errors in the source, using choice with sic and corr.

The treatment of errors in the source document is an issue in which editorial and encoding strategy are closely intertwined. The encoding is not simply a facet of the transcription: it is a reflection of your approach to the source document and the evidence it contains.

Encoding typographical and other kinds of errors using TEI really involves two separate activities: identifying the error, and (optionally) supplying a correction.

1. Identifying error

The identification of error requires a theory of what an error is in your editorial universe, and this will potentially vary quite a lot from project to project. For some projects, particularly those dealing with more recent texts, it may be easy to identify any departure from convention as an error, and to say with confidence what the corrected reading should be. For other projects, the text may be full of idiosyncrasies which are either acceptable contemporary variations, or cannot be corrected with any confidence.

If you do wish to identify an error in the source, the simplest method is the TEI sic element, which simply says this is the reading in the source. This encoding can be applied at the word or phrase level; it is therefore not appropriate for marking large-scale structural errors such as misordered pages, a missing section, and similar problems.

There is value in identifying an error or idiosyncrasy even if you do not plan to supply a correction. For one thing, it allows you to indicate to a reader that the reading is correct as transcribed, thereby eliminating one source of confusion or distrust. This might be accomplished inconspicuously (e.g. with a mouse-over that was only visible to a curious reader) or conspicuously (e.g. with a printed [sic] displayed in the text). These details can be controlled by a stylesheet.

2. Supplying a correction

If you wish to supply a corrected reading, the correction should be encoded with corr. If your approach to transcription involves silently correcting obvious typographical errors, you may choose to use corr on its own (as illustrated in Example 4). This has the effect of suppressing the original reading altogether, while still signalling that a correction has been made. To preserve both the error and the corrected reading, you can nest both sic and corr within a choice element, which shows that they are alternatives (and by implication that the error is the original reading in the source).

There are some specific cases which deserve particular attention, in which sic is useful but no corr value seems necessary. These include:

The sic or corr tag should be applied to the minimum relevant section of text, usually the letter, but sometimes the word. See Tagging at the letter or word level for details on some of the factors affecting this decision.

In older texts, where spelling and printing conventions may be highly variable, it is better to be cautious in marking errors, and certainly in identifying corrections. If you wish to mark old spellings or modernize the text, orig and reg are a more appropriate encoding. We recommend using a phonetic rule of thumb: if the spelling of the word is phonetically similar to its pronunciation, it should be regarded as typographically correct for encoding purposes. Thus queane is simply a period spelling of queen, but ihside (for inside) would be treated as a typographical error. In modern-spelling texts (typically those from the mid-18th century on), spelling irregularities can be treated as errors unless they are clearly well-attested alternate spellings, either elsewhere in the same text, or in a relevant historical dictionary such as the OED.

Additional information

The sic and corr elements carry a resp attribute which allows you to indicate who takes responsibility for supplying a corrected reading, or identifying an error as such. The value for this attribute could be an identifier pointing to a list of editors, transcribers, and other responsible parties (stored in the TEI header).

If you want to treat different kinds of errors differently—for instance, displaying the corrected value for certain kinds of errors and the original reading for others—you can classify them using the type attribute.


Example 1.

Encoding error at the letter level


Example 2.

Encoding error at the word level


Example 3.

Encoding whole-word errors

She walked into <choice><sic>the the</sic><corr>the</corr></choice> garden.
Example 4.

Omitting the original reading