Typographic detail

General notes on what aspects of typography the WWP does and does not capture

For encoding projects that are interested in the details of physical document appearance, and in capturing the characters of the text with exactness, it is often difficult to be sure how much detail is enough. Each project will need to make its own decision, but we have found the guidelines below sufficient and sustainable for most research purposes.

We do not recommend capturing the details of typographic design, including the appearance of a particular font or of particular letters in a font (e.g. swash letters, variant letter forms, ligatures between kerned characters). It is difficult to capture this kind of variation accurately enough to be useful and the usefulness of this information (in proportion to the effort required to capture it) is comparatively low However, we do recommend preserving some aspects of earlier typographical practices which carry interpretive meaning or are of interest to modern scholars. In particular:

Characters of text should be transcribed using the corresponding ASCII character, or with the appropriate entity reference. The appearance of the original character should never be used as a motivation for choosing a different ASCII character (for instance, the accidental resemblance of a small C to an inverted comma, or of the numeral one to the letter I). Even in the case of the letters n and u, which are frequently printed upside-down and resemble one another closely, an attempt is made to determine what letter was actually used.

We treat long s as a distinct character and we capture it using an entity reference (&s;). However, we have not found any persuasive evidence that this information is useful to scholars, except as a curiosity.

We do not recommend transcribing ligatures, which are letter combinations joined together for convenience because of their frequent use. Examples include st, ct, fl, and other letter combinations involving long s or f. These letters should be transcribed as if they were not joined.

However, digraphs or ligatures which represent the Greek/Latin ae and oe letter combinations are an exception to this rule. These are characters which express a different phonetic quantity than the two letters taken separately, and hence need to be preserved. They are transcribed using entity references: æ (ae ligature), Æ (AE ligature), œ (oe ligature), and Œ (OE ligature).