Punctuation: hyphens
For most transcriptional purposes it is useful to distinguish between hard hypens and soft hyphens:
- hard hyphens are those that remain in the text regardless of where the line breaks occur, as for instance in hyphenated names such as Rimsky-Korsakov, or hyphenated words such as cross-reference. Hard hyphens should be encoded with a standard keyboard hyphen character.
- soft hyphens are those that result from line breaks, and they disappear if the text is relineated, as for instance in the word hy-phen broken across two lines. For these, we recommend using the standard ISO entity reference ­. This approach allows the text to be relineated flexibly (by removing the lb and ­) and also makes it easier to include soft-hyphenated words in searching, since the soft hyphen can be suppressed (i.e. the entity reference can be resolved to nothing) when the text is indexed.
In cases where it is difficult to tell whether a line-end hyphen is soft or hard, the encoder will need to look through the rest of the text (or the rest of the author’s oeuvre, or some other relevant set of documents with which the example is assumed to be consistent) for other instances of the word in question. If no other examples can be found, the assumption should be that the hyphen is hard.
The TEI provides a method for describing the project’s approach with respect to hyphenation
Hyphens in catchwords should always be encoded as hard hyphens, since a catchword is never involved in relineation (not being part of a line), and so will never need to be unhyphenated.