Hyphens, soft and hard [103]


Encoding of hard and soft hyphens, including guidelines for determining when a line-end hyphen is soft


The WWP encodes hard hyphens using an ordinary hyphen character: “-”.

The WWP encodes soft hyphens (hyphens which mark the division of am unhyphenated word across a line break) using an entity reference: “­” This entity reference is used for any character which is used to indicate a word break, whatever it looks like. In some texts it may look like an equals sign (=); in others it may be like a dot or an underline character.

This entity reference should be used for all line-ending hyphens (and other characters indicating word breaks) except where it is clear that the hyphen is a hard hyphen: i.e. that the word would be hyphenated even without the line break. Evidence for this would be a capital letter at the beginning of the second half of the word, or other usage *within the same text*. Hyphenation practices within our period vary a great deal, and it is impossible to generalize from one text to another whether a word is hyphenated or treated as a single word.

If the text in question has any other instances of the same word with a hard hyphen, then the line-ending hyphen in question should be encoded using a hard hyphen.

If all other instances of the word in the same text are unhyphenated, then the word should be encoded using a soft hyphen.

If there are no other instances of the same word within the text, then if you are in doubt you should encode it using a soft hyphen.

Hyphens in catchwords should always be encoded as hard hyphens, since a catchword is never involved in relineation (since they are not part of a line), and so will never need to be unhyphenated.

