Encoding verse: general notes [066]


General discussion of encoding poetry, including the use of <text>, <div>, and <lg> to encode basic poetic structures


The WWP encodes poetry with several overarching goals in mind: to enable poetry to be identified and distinguished from the other genres in the textbase (for analytical purposes); to enable us to extract entire poems together with their headings and associated apparatus; and to represent the internal structure of each poem with some precision.

Although the TEI recommends that poems be encoded with <text> to acknowledge their literary wholeness, the WWP differs from this practice. Because of the documentary rather than literary nature of our encoding, we prefer to represent the poem as an inextricable part of the larger document in which it was published. As a result we treat individual poems as subdivisions of that document, and we encode them with <div type="poem.foo">, where “foo” identifies the basic type of poem in question. For a list of type= attributes for this outermost element, see 188. Within this wrapper, internal line groupings are tagged if they are separated by white space, as described in 186 and 187. Poems which occur in a context where <div> is not allowed (for instance, within running prose) should either be encoded within <quote> (if the poem is being quoted) or simply within <lg>. The WWP allows <lg> to appear both within and between <p> elements, and the encoder must decide which is more appropriate based on context.

Within a long poem, any major subdivisions are encoded with <div type="part">, or with a more specific value for type= if the poem specifies one (e.g. “canto”, “book”, etc.).

Within short poems, the WWP uses <lg> with a type attribute to encode different groupings of metrical lines. Groupings are usually determined by intervening white space, indentation or some other graphical indicator (such as an ornament). The type= attribute on <lg> is used to indicates the formal structure of the line group; the WWP has created a fixed list of values for this attribute; see 148 and 186.

Headings should be encoded within the <div> to which they refer. A heading for the entire poem should be encoded within the outermost <div>, not within the first <lg>. Headings to individual stanzas (including stanza numbers) should be encoded within the <lg> surrounding the stanza. Similarly, speaker names (e.g. the speaker of a particular stanza) should be encoded within the line group they modify. In cases where a line group is divided between two speakers (e.g. where one speaker speaks the first line of a couplet and a second speaker speaks the second line), the line group should be fragmented and the part= attribute should be used.

Values for n= should be arabic numbers without punctuation, without regard for the format or delimiters of the actual stanza number in the text.

Speaker names in poetry which is *not* part of a drama should be encoded with <label>; this provides a way of encoding speakers without requiring a who= value and a <castlist>. In addition, other information which is associated with the <lg> or with individual lines (such as line numbers) should be encoded with <label>.

