Special characters: brevigraphs and diacritical marks [141]


Using entity references to transcribe brevigraphs and characters with diacritical marks


Brevigraphs and diacritical marks may often resemble each other, and in some cases the same mark may mean different things in different contexts. For instance, a macron or acute accent may function as a diacritical mark in some texts, but in others it may indicate an omitted “n” or “m”. The WWP encodes these features as follows:

1. Letters with diacritical marks indicating pronunciation. We encode these with an entity reference. Thus for instance an “e” with an acute accent is encoded as é (a full list of these entities is included in the ISO Latin entity set and can also be found on the WWP training page).

2. Letters with associated marks (which may resemble diacritical marks, small attached letters, or small flourishes or squiggles attached to the letter) which indicate the omission of letters or an abbreviated form of a word or syllable. These appear almost exclusively in our earliest texts, in which the typography attempts to imitate the letterforms and abbreviations common in manuscript writing (for instance, “y” with a superscripted “t” attached, “p” with a curly hook, etc.) We encode these with an entity reference, and with an <abbr> element to indicate the omitted letters. The following is the proper encoding for the word “whom” in which the final “m” is indicated only by a macron over the “o”:

     wh<abbr expan="om">&omacr;</abbr>

Similarly, the following is the encoding for a “y” with a small attached, superscripted “t” which is an old abbreviation for the word “that”:

     <abbr expan="that">&ysupt;</abbr>

Although this encoding may seem redundant, its utility becomes clear if we consider the case of a brevigraph which can have more than one possible expansion. In such a case, the expansion could not be handled adequately simply by expanding the entity references; the expan= value would be crucial. However, the entity reference indicates what the original character was, and allows it to be printed if necessary.

These special brevigraphs should not be confused with simple superscripted characters. For information on encoding superscripted characters, see 194.

A list of brevigraphs for which the WWP has created entities:

&ysupe; y with superscripted e: usually for the abbreviation of "the"

&ysupt; y with superscripted t: usually for the abbreviation of “that”

&ysupu; y with superscripted u: usually for the abbreviation of "thou/you"

&wsupt; w with superscripted t: usually for the abbreviation of “with”

&wsupch; w with superscripted ch: usually for the abbreviation of “which”

There are a number of other less frequent brevigraphs for which we do not yet have entities, but for which we will be creating them.

list all entries