Principles of transcription: general


General principles of transcription, including details of what is and is not captured, and the order in which it is represented


Some general principles of transcription:

We transcribe all the printed marks from the main text block.

We transcribe all the information from the forme work (page numbers, catchwords, signatures, etc.) but not the delimiters or punctuation.

We transcribe any handwriting that dates from the period of the text’s original production, but not modern handwriting (e.g. that of recent owners or librarians).

We transcribe the text in what we think of as “reading order”. The principle to follow here is the reading order of the text elements themselves, rather than a strict top-to-bottom, left-to-right reading. So for instance we often encounter chunks of text (often in the closers of letters or in cast lists) which are printed in separate text blocks with internal line breaks, and which if transcribed strictly from top to bottom and from left to right would violate the sense of the text. (See example 1.) In these cases, we determine the order of the chunks using the principle of “reading order” outlined above, and then transcribe each chunk as a whole (not interrupted by pieces of other chunks). In cases where there are large differences in font size between different chunks of text, the text whose baseline is higher up on the page is considered to be closer to the “top” and hence would be read before text with a baseline further down, even if the top of the latter chunk is higher up.

The principle of reading order also applies to things like multi-column layouts, where we encode the text within the division or paragraph as it flows, with the columns encoded in the order in which they would be read. Similarly, it governs the encoding of scripts which are read from right to left rather than from left to right: within a <quote> element, a Hebrew quotation (if printed right to left) would be transcribed from right to left, with the actual direction of the script recorded in a renditional attribute or in some other manner.

There are occasionally anomalous cases where the strict principle of top-bottom, left-right transcription does not yield an appropriate reading order; in some cases this is obvious and in others it may be merely a suspicion (if the text’s meaning is unclear). For example, a sign reading:




would be transcribed <p>Chicken Crossing</p><p rend="case(allcaps)">Slow</p>, but it’s clear that the word “Slow” is probably meant to be read first, since it is the largest and most important to the sense. In these kinds of cases the encoder may need to use judgment to determine the correct transcription order. However, in cases where the meaning of the text is completely unclear, our regular rule should be followed to avoid ambiguity.

