Quotations: direct speech
Encoding of quotations, distinction between use of said and quote, treatment of quotation marks
The TEI provides the said element to encode direct speech and reported thought. This element carries several attributes which may be useful for projects who wish to do detailed analysis of direct speech.
- the direct attribute indicates whether the speech is direct or indirect.
- the who attribute indicates who the speaker is, in a manner very similar to the who attribute on sp in drama. As in drama, the who attribute is a URI pointing to the id of a person element (e.g. in the particDesc in the TEI header, or in a separate personography). This would allow you to create a list of all of the speakers in the text, and track how often and how much each one speaks.
- the aloud attribute allows you to indicate whether the utterance was spoken aloud (true) or not (false).
For a basic encoding, for most projects, we do not recommend using any of these attributes.
As with quote, it may sometimes be necessary to break a single said into multiple XML elements to avoid overlap with other XML elements, such as verse lines. To indicate that these multiple elements are really part of the same quotation, you can use the next and prev attributes, to indicate the connection. In general we only recommend using the the next and prev attributes in cases where the element is artificially broken to avoid overlap, not in cases where a quotation is interrupted by the text itself (for instance, with she said or other interventions). If you are preparing the text for a detailed analysis of quotation (for instance, involving counting the number of quotations present, or assessing their length) you will need to come up with a consistent method of handling these interventions so that you can identify whatever you decide are truly the boundaries of each quotation. Using next and prev may be the most effective method; see Overlapping and fragmented elements.
Quoted speech in the early modern period may be marked in a number of ways, or may even be left unmarked. In some cases this makes it difficult to be certain where a given quotation begins and ends. In addition, the conventions for signalling direct and indirect speech have changed over the centuries and there exist transitional forms which may be hard to assign to one category or the other. If your documents present this range of materials, we recommend a strategy which emphasizes certainty and simplicity:
- Encode all quoted speech which is renditionally distinct, regardless of whether it is direct or indirect speech. Rendition in this case includes the use of quotation marks, as well as the use of distinctive fonts (all caps, small caps, italics, black letter).
- Also encode all instances of direct speech, whether renditionally distinct or not. Direct speech here means any quoted speech which occurs in the first person singular or plural (I will…), not reported speech in the form she said that she would…
This approach respects the documents’ own representation of quoted speech (signalled by renditional distinction) while also catching instances which a modern reader recognizes as direct speech.
As with quote, in cases where you are not sure exactly where the quoted material begins or ends, we recommend encoding the minimum text about whose quotedness you are certain. The rationale here is that for most purposes false negatives are less awkward and misleading than false positives; if a user is searching for material within a quotation, he or she is better served by getting only those results which are certain to match the criteria. If precision is essential, you may also use the TEI’s provision for encoding certainty and responsibility, in Chapter 17 of the TEI Guidelines, P5, but for most purposes this encoding is excessive.
Examples
Example 1. Direct speech encoded with said:
<said rend="pre(“)post(”)">Bless me!</said> he said, looking about him, <said rend="pre(“)post(”)">I never did.</said>