Special characters: entity references [182]


Use of entity references for special characters, boilerplate, and decorative features of the text


SGML and XML files are encoded using the standard ASCII character set (the characters found on a standard keyboard); as a result, any characters which are not found in this character set must be encoded with an entity reference. An entity reference is essentially a character string which stands in for a character or chunk of text; it must begin with an ampersand and end with a semicolon, to delimit it from the rest of the text. When the SGML or XML file is processed, the entities are resolved and the desired character or text string is substituted as appropriate. Thus the entity reference á refers to an “a” with an acute accent; the entity reference &boilerplate; might refer to some standard text which is always included but is stored in one place for easy modification (for instance, licensing information).

The WWP uses entity references for several classes of textual features:

1. characters with diacritical marks

2. other special characters, such as curly quotes (also known as “smart” quotes), digraphs (ae or oe, for instance), ellipsis, dashes, long s. See 001 for more details.

3. boilerplate text in the TEI header

4. characters which are reserved for markup (such as angle brackets, ampersands, etc.). See 114 for more details.

5. ornaments and ruled lines. See the entry on these items for more details.

6. soft hyphens, which are encoded with ­. See entry for more details.

The WWP does not use entity references for ordinary marks of punctuation, including parentheses, periods, commas, semicolons, apostrophes, brackets, hyphens (except for soft hyphens), etc. The only exception is in contexts where the punctuation mark itself could be mistaken for a markup character. These cases are described in 114.

list all entries