Encoding of errors in the document source using <sic>; situations where corr= is and is not used; distinguishing between error and old spelling


1. The WWP encodes errors in the original text using <sic>, with the corrected reading encoded using the corr= attribute. The corr= attribute is required in WWP encoding, except in a special group of cases where the <sic> element is used but no correction is required. At present this group contains only the following:

--errors in catchwords (that is, discrepancies between the catchword and the corresponding word in the main flow of text): corr= is not used.

--cases where the reading is anomalous enough to require <sic> as a way of indicating that the transcription is accurate, but where we do not know what correction to supply: corr= is not used.

2. The <sic> tag should be applied to the minimum relevant section of text, usually the letter, but sometimes the word. Some examples:

p<sic corr="i">u</sic>ckle

He went to the <sic corr="">the</sic> store.

Notice in the second example that the corrected value is empty, not a space: this indicates that nothing, not even a blank, should be present.

3. Since many of our texts use archaic spellings which were correct in the time they were written, care must be taken not to encode these as typographical errors. In general, in old-spelling texts, if the spelling of the word is phonetically similar to its proper pronunciation, it should be regarded as correct for our purposes. Thus “queane” is simply a period spelling of “queen”, but “ibside” (for “inside”) would be treated as a typographical error. If in doubt, check the OED to see whether the word is listed as a possible spelling. However, absence from the OED does not mean that the spelling is incorrect; if it is phonetically plausible we do not encode it as an error. In modern-spelling texts (typically those from the mid-18th century on), spelling irregularities are treated as errors unless they are clearly well-attested alternate spellings.

4. We do not use the cert= attribute on <sic>. We also do not use the <corr> element, which is an alternative element for error correction which encodes the same information as <sic>, but gives emphasis to the corrected reading rather than to the original.

See chapter 18 in TEI for more detailed information.

5. The WWP also uses <sic type="seq"> for errors in sequencing (page numbers, signatures, etc.). For more information, see the entry on sequencing errors.

