Phrase-level encoding: Overview
An outline view of the topics covered within the discussion of phrase-level encoding
Phrase-level encoding refers to elements which are used within paragraphs and other similar elements: that is, encoding which identifies specific words or phrases. Encoding at this level of detail often serves to mark particular features of the document’s verbal texture: for instance, words which are emphasized for some reason (such as irony or emphatic contrast), words in foreign languages or a distinctive dialect or linguistic register. It may also mark particular aspects of the document’s referential content—such as names, titles, dates, events—and allow them to be linked to authority records and external sources, or to be regularized if necessary. Even if there’s no desire to perform such linking, phrase-level markup can also be used simply to capture local shifts in rendition (such as italics or quotation marks), and identify the textual feature which motivates it (such as a foreign word or a quotation).
There are three basic kinds of motivation for using phrase-level encoding:
- to capture renditional features of the document, such as italicized words or passages marked by quotation marks;
- to mark specific content features that are of analytical interest, such as dates, names of people or places, words in foreign languages, etc;
- and to enable linking of various sorts: for instance, to link a word to a footnote that comments on it, to link a cross-reference to its target, to align phrases which are related or equivalent (as in a translation).
In general, the same elements are used regardless of the motive. However, if marking specific content features is not important to your project, and marking rendition is, then instead of using name, date, foreign, term, and so forth, you can simply use hi to indicate renditional shifts and omit all other phrase-level markup. We do not recommend this path; our experience is that the local texture of the document’s content is likely to be very important for scholarly work, and that for the most part it is not much more difficult to encode a feature with name or date than to use hi. However, we recognize that for some projects the methodological emphasis may lie elsewhere, and the challenges of identifying content features unambiguously may be significant.
The entries below on phrase-level encoding discuss the more minute structures of the text, and the encoding that applies to very local phenomena of words and phrases.
Encoding names
- General notes on names
- Names of humans: general notes
- Names of humans: detailed discussion
- Names of places
- Names of non-human creatures and things
- Names of collectivities
- Name keys and unique name references
- Abbreviated names
- Nesting names of different types
- Problems of multiple reference
Dates and times
- Simple encoding of dates
- Dates: dealing with different calendars
- Dates: format for the value and when attributes
- Date ranges
- Errors in dates
- Complex encoding of dates
- Encoding time references
Other types of phrase-level encoding
- Emphasis
- Special terminology, irony, and other forms of highlighting
- Abbreviations
- Names of authors
- Titles of other works
- Foreign-language words and phrases
- Simple highlighting
- Referencing strings
- Measures
- Interactions between different kinds of emendation
- Encoding unknown features