Lists: general notes [130]


Encoding lists, including discussion of criteria for identifying lists


The WWP encodes all lists using <list>, and specifies a limited set of possible type attributes to designate certain particular kinds of list. The possible values for the type= attribute are “TOC”, “gloss”, “errata”, and “subscriber”. The default value is “simple” and does not need to be entered. See the separate entries elsewhere in this database for detailed information about each of these particular kinds of list. These are the only kinds of lists which are designated with type= attributes; no other type attributes should be used.

The WWP encodes labels using <label> inside the <item> element which they refer to. We only nest <p> inside <item> in cases where this is necessary: for instance, for multi-paragraph items, or for lists nested inside lists. We do not encode n= on <item>.

1. Disclaimers and Overview

First, we note that the correct identification of lists is not a matter of life-threatening urgency. If we get a few false positives and a few false negatives, it’s OK. So to a certain extent this proposal leaves the judgment up to the encoder’s common sense and intuitions. These guidelines will ensure broad consistency along the lines that matter most.

Second, we note that the basic nature of a list--to indicate a grouping or sequence of similar items--is quite general, so that list-like structures appear in a great many contexts. Not all of these seem important to encode as lists. For instance, the sequence of numbered chapters in a novel could in some sense be considered a list, but we wouldn’t want to encode it as one. This proposal treats a “list” as a fairly local phenomenon, not something normally used to structure an entire division or text.

2. Identifying Lists

There are essentially two criteria for identifying something as a list: formatting and labelling. If EITHER of these is present, the thing in question is a list. (See Exceptions below for a few caveats.) In addition to these two criteria, there is also the Common Sense Factor.


If something is formatted like a list, it is a list. This means that there are line breaks between the items; they may be indented or overhung in a manner different from the ordinary paragraphs in the text. Separate paragraphs, ordinarily formatted, may also constitute a list if the Common Sense Factor applies.


If a group of textual components is labelled in some way, it is a list, even if it is not formatted as a list. Labelling may indicate a sequence (1, 2, 3; A, B, C) or it may simply set off the individual items from the rest of the text (bullets, dashes, little pointing hands, etc.). Numeric labels must be actual numbers, not words like “First, Next, Then, Finally”. Numbers written out “First, Second, Third” count as numbering as long as they are consistent through the whole list (not degenerating into “Next....Next”). However, see the Common Sense Factor below.

An exception to the Labelling rule is that we do not encode a simple sequence of numbered paragraphs as a list. Early texts may number some or all of their paragraphs to indicate the sequence of ideas or to serve as a reference system, but this numbering does not indicate a list--it does not identify a group of similar things which are gathered together or sequenced within a certain rubric. However, if the numbered paragraphs are gathered together under some sort of rubric (“these are the reasons why he will go to Hell...”) then it may be treated as a list.

Common Sense Factor

Your intuitions about what is a list are probably fairly good. There may be clues in the text which tend to support or discourage the encoding of something as a list, and you should use these in combination with the above criteria to decide whether something is or isn’t a list. For instance, a phrase like “This is a list of ....” might incline you to encode something as a list even if the items are ordinary paragraphs starting “First, Second, Third”. If a numbered list begins with the word “First” and ends with an item labelled “Finally”, it can still be regarded as a list if in general it behaves like a list in other ways. Looking at a sequence of numbered paragraphs, you can probably tell whether there is some principle grouping them together (for instance, there may be an introductory sentence identifying them as a list) or whether they are just numbered for reference.

In general, we want to avoid encoding things as lists if they seem borderline. Don’t hesitate to NOT encode something as a list. But also remember that the cost of being wrong is very low, so don’t spend too much time agonizing about it either.

3. Sequencing Errors in Lists

We will correct all sequencing errors in lists just as we correct them when they occur in page numbering or act/scene numbering. The reasoning is the same (we don’t want anyone to think it’s our error). Since lists are usually fairly short, the likelihood of needing the “percent new error” criterion is low; we’ll probably just end up encoding the corrected value for each list item as appropriate.

4. Things Which Look Like Lists But Aren’t

--Sequences of endnotes (these are just a series of <note> elements)

--Castlists (these are encoded with <note>)

--Bibliographies (these are just a series of <bibl> elements)

--Sequences of chapters or poems or any other kind of textual subdivision which is numbered to indicate its order within the collection. If a text is divided into confessions, or letters, or questions, or revelations, or any other kind of unit, the collection of these units is not a list.


Example 1. Some sample lists:

Things I need to buy when I go to Truro:
1. Beach gizzard
2. Floppy digger
3. Normal strobe

Points to cover in our next ad campaign:
--only the elderly will lose out
--this product obsolesces all others
--nobody but us can help

Even his mother thought him odd, for several reasons. 1. He was never so much as a second late for dinner. 2. He insisted on eating the floral centerpiece. 3. His television-watching habits were, to say the least, bizarre.

There are all sorts of things I hate to do on Saturdays, like
mowing the lawn
picking over gooseberries
throwing away old catalogues
fixing small appliances
but I do them anyway because otherwise the world would come to an end.

Example 2. A borderline case which are probably not a list. In the following case, the separate items are really just individual paragraphs; there’s no real reason to encode it as a list.

I have never known whether to consider him a werewolf, even though there are many signs which might indicate something amiss.

He never leaves his house before sunset, which is odd considering that he is a tax collector and this ought to hamper his work considerably. I certainly don’t know of any tax collectors who manage to do their work at night; it would be very irritating for the client (though perhaps that is the point).

He speaks in a low growl, and this frightens dogs and cats throughout the neighborhood. Even small children seem to sense a feral quality in his behavior; the kindest words from him still make them shrink back in terror.
His beard is very thick and covers his entire face. In fact, I’ve never seen his chin; he may not even have one.

Example 3. In the following case, the separate items are more simply treated as part of the ordinary syntax of the sentence; no <list> is necessary.
There are so many great foods in Maine: corned hake, New England boiled dinner, pickled tripe, jello pudding salad, and of course blueberry slump.

