Recognizing subdivisions


For projects dealing with printed books, recognizing subdivisions is usually fairly straightforward. Any subdivisions which are important to the comprehension of the text have typically been clearly signalled by the printer, using headings, ornamental dividers, or other typographical signals. A crucial part of the document analysis is determining what signals are used.

However, there are at least three kinds of cases where recognizing subdivisions may be more complex:

In these situations, one might imagine wanting to indicate alternative boundary points (for instance, to show that a given subdivision might begin in one of two possible places), or indicate uncertainty about a boundary point. This kind of encoding is certainly possible, although using the information might prove more difficult; you would need to think carefully about what sort of behavior you want from your interface. Before undertaking encoding of this sort, it would be worth thinking about what it would gain you, and how you plan to use it if at all.