Page breaks and page numbering
Encoding of page breaks and page numbering using the pb element and its n attribute, including guidelines for creating idealized page number sequences
Page breaks are encoded using the TEI pb element. This is an empty element, and has no content. By convention, the pb element is understood to mark the start of a new page, so each page of text should be preceded by a pb element, even the first page. The pb element goes before any other information about the page, including collation, forme work, etc.
The page number is encoded in two ways. The actual printed page number is part of the forme work and is encoded using fw type="pageNum". An idealized page number is also captured on the n attribute of the pb element.
Idealization of the page number means correcting errors in sequencing, omitting casual variations in the way page numbers are printed (brackets, etc.), and supplying page numbers which are not printed on the page. The idealized page numbers will usually have the same form (arabic numbers, roman numerals, etc.) as the actual page numbers, unless there are overriding reasons to do otherwise. If the document has a separate numbering system for the front or back matter, the idealized numbers should do the same.
Every page in the document including the title page should have an idealized page number, recorded on the pb for that page. Page numbering of this sort should start with the first page of the text, which will usually be the title page but might be a frontispiece or some other page before the title page. It is up to the individual project to decide how to treat preliminary blank pages. If you do plan to include all blank pages, then when transcribing from microfilm or photocopies it’s important to be alert for omitted blank pages that will need to be included in the idealized pagination, so that the book itself is accurately represented.
Special cases:
- Pages that are not numbered, but are accounted for in the explicit numbering of the text (i.e, there simply is no ink on the page, e.g., 1, 2, 3, , 5, 6, ...) will not have an fw type="pageNum", but will have the appropriate n on the pb. (In this case, the blank page’s pb would be
n="4".) - Pages that are not numbered and are not counted in the page number reference system (e.g., 1, 2, 3, 4, , , 5, 6 ...) will, where possible, be referred to by the word facing and their facing page’s number. (In this case
n="facing 4"andn="facing 5".) Typically these will be extra leaves tipped in, such as illustrations or sheets of errata. Note that unnumbered pages of this sort will come in pairs (i.e. the two sides of the tipped-in leaf) and will start with an odd number. - For tipped-in pages which occur within unnumbered page sequences (e.g. unnumbered front matter), it may be impossible to tell (especially if working from a microfilm or photocopy) that the pages are tipped in, unless you can examine the source text or a detailed description thereof. It is up to the individual project to decide how careful to be about accounting for these sequences; for some projects, it may be unimportant and in these cases there is no need to use the designation facing x, etc. When working with texts where the details of physical bibliography are important, the facing x designation may still be useful.
Examples
Example 1.
A work which has frontmatter numbered in little roman numerals, followed by the body which is numbered in arabic numbers: the frontmatter should be numbered as follows:
<pb n="i">, <pb n="ii">, etc.,
followed by the body numbered
<pb n="1">, <pb n="2">, etc.
Example 2.
A work in which the frontmatter does not have numbers printed on the page, followed by a body numbered in arabic numbers: the frontmatter should be numbered
<pb n="i">, <pb n="ii">, etc.
followed by the body numbered
<pb n="1">, <pb n="2">, etc.
Similarly, if the body does not have numbers printed on the page, the page numbering should still be recorded in arabic numerals on the n of PB:
n=“1”, n=“2”, n=“3”, etc.
Example 3.
A work in which no page numbers appear at all, which contains a title page, frontmatter, and a body: number the title and frontmatter continuously with small roman numerals: number the body 1, 2, 3, etc. If there were a frontispiece preceding the title page, the page numbering for the frontmatter should start with the frontispiece (or, if that is on a verso, with its hypothetical recto; see above).
Example 4.
A document which has separately numbered subsections such as plays (e.g., 1-30 for the first play, 1-25 for the second, 1-34 for the third...): the numbering for each section should be encoded as n="1", n="2", etc. It is not necessary that each n value be unique in the document.