Regularization: <orig> [143]

Abstract

Explicit regularization using <orig>

Discussion

The WWP performs several kinds of regularization. The most explicit of these is the use of the <orig> element, which we use for the following categories of phenomena:

1. old-style typography, in which i/j, u/v, and w/vv are interchanged. In these cases, the original reading is encoded as the content of <orig>, and the regularized version is encoded as the value of the reg= attribute. We tag the smallest applicable unit of regularization (usually the letter). For more details of the WWP’s encoding of early typography, see 108 on Typography: I, J, U, V.

2. bibliographic references to texts for which a standard reference system exists (e.g. Homer, the Bible, Virgil, etc.). We use <orig> with a reg= attribute to encode and standardize these citations. However, we do not encode all the <orig>s by hand. Instead, we encode all citations of this sort using the marker element <regMe> with the content being the unregularized form of the citation. The <regMe> element should go inside <bibl> where appropriate (it does not replace <bibl>). Within <regme> no further elements are needed. A script will then convert these unregularized citations to a regularized form encoded with <orig> and reg=. Within <regMe> we will use the same criteria we use for spacing between words ordinarily: if the normal spacing practices of the text seem to be violated in any particular case, check to see whether the line is especially cramped or loose.

For information on other forms of regularization, see 144 on silent regularization.

list all entries

search

about

wwp