Punctuation: general [113]

Abstract

Transcription of punctuation, including treatment of hard and soft hyphens

Discussion

In general, the WWP transcribes punctuation using standard keyboard characters. However, where punctuation is used as a delimiter which might be altered with changes in display, as for instance in connection with a speaker’s name in a dramatic text, then it is encoded using an entity reference within a rend= attribute:

<speaker rend="post(&colon;)">Hamlet</speaker>

Hyphens are encoded using a standard keyboard character for hard hyphens, and using the standard ISO entity reference &shy; for soft hyphens. Hard hyphens are those which remain in the text regardless of where the line breaks occur, as for instance in hyphenated names such as DeBoer-Langworthy. Soft hyphens are those which result from line breaks, and which would disappear if the text were relineated, as for instance in the word “hy-phen” if it was broken across two lines. In cases where it is difficult to tell whether a line-end hyphen is soft or hard, the encoder will need to look through the rest of the text for other instances of the word in question. If it seems more likely to be a hard hyphen, it should be encoded as

<unknown desc="&shy;">-</unknown>

whereas if it is more likely to be a soft hyphen , it should be encoded as

<unknown desc="-?">&shy;</unknown>

Hyphens in catchwords should always be encoded as hard hyphens, since a catchword is never involved in relineation (not being part of a line), and so will never need to be unhyphenated.

The WWP does not attempt to distinguish the different linguistic functions of the various marks of punctuation (for instance, periods as abbreviation marks versus sentence delimiters).

list all entries

search

about

wwp