Special characters: ordinary characters requiring special treatment [114]

Abstract

Further detail on ordinary characters which must be encoded with entity references in particular contexts or because they serve special functions

Discussion

Many characters which appear in WWP texts (e.g. dashes, digraphs, and accented characters) cannot be typed on a standard keyboard and hence must be treated specially, using an entity reference. These are described in separate entries. However, in addition to these there are characters which can be typed using a standard keyboard, but which still need special treatment because of their special use in markup. These fall into three categories:

1. Characters which must always be encoded using an entity reference, because they might otherwise be taken to be the SGML markup characters (either by an SGML parser or by some home-grown WWP software). These are:

"&", the ampersand, & (which is the entity reference open character, ERO, and thus could be confused with beginning of an entity reference)

"=", the equals sign, = (which is the value indicator character, VI, and thus could be confused with the equals sign pairing an attribute with its value)

"<“, the less than sign, &lt; (which is the start-tag open character, STAGO, and thus could be confused with the beginning of a tag; it is also the first character in the two-character delimiters for processing instruction (PI, ”<?“), markup declaration (MDO, ”<!“), and end-tag (ETAGO, ”</").)

">", the greater than sign, &gt; (which is the tag close character, TAGC, and thus could be confused with the end of a tag)

2. Characters appearing within attribute values which must be encoded with an entity, because they are part of the syntax of the attribute value. These include the characters listed above, plus:

    ‘"’, the straight quote (which can be used as a literal, LIT, to delimit an attribute value)

    “’”, the apostrophe, &apos; (which can be used as an alternative literal, LITA, to delimit an attribute value)

Do not confuse these characters with &ldquo;, &rdquo;, &lsquo;, or &rsquo;. Also note that these characters do not need to be encoded with an entity reference when they occur within the body of the text. Thus in a possessive, “Orinda’s”, the apostrophe should be typed in using the standard keyboard apostrophe.

3. Characters occurring within rendition ladders which must be encoded either using an escape character or an entity (but not both). These are:

    “(”, the left parenthesis, which should be encoded within rendition ladders as “&lpar;” or "\("

    “)”, the right parenthesis, which should be encoded within rendition ladders as “&rpar;” or "\)"

    “\”, the backslash, which should be encoded within rendition ladders as “\\”

Thus for a stage direction surrounded by parentheses, such as (Enter, stage left) the correct encoding would be either of the following:

<stage rend="pre(&lpar;) post(&rpar;)">Enter, stage left</stage>

<stage rend="pre(\() post(\))">Enter, stage left</stage>

Note that these characters do not need to be treated specially when they occur in the body of the text.

list all entries

search

about

wwp