aboutsummaryrefslogtreecommitdiff
path: root/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
diff options
context:
space:
mode:
authorNik Clayton <nik@FreeBSD.org>1999-04-20 20:59:59 +0000
committerNik Clayton <nik@FreeBSD.org>1999-04-20 20:59:59 +0000
commitc4ab126805b0ecf8d200daba8990ddaac649b18f (patch)
tree17449c394290549d27f07d8049ae25ae811bb03f /en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
parent8bdee18bd8c6ee8c3846276f21b651b1ed393420 (diff)
downloaddoc-c4ab126805b0ecf8d200daba8990ddaac649b18f.tar.gz
doc-c4ab126805b0ecf8d200daba8990ddaac649b18f.zip
My primer for people new to the Doc. Proj. Incomplete, but should be
enough for most people, and gets it into the repository, making it easier for others to add to as necessary. This has not (yet) been turned on in the upper level Makefile or listed on the web site yet, I want to get some more feedback from readers first. It should be "made visible" later this week.
Notes
Notes: svn path=/head/; revision=4718
Diffstat (limited to 'en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml')
-rw-r--r--en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml1554
1 files changed, 1554 insertions, 0 deletions
diff --git a/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml b/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
new file mode 100644
index 0000000000..c25bacf1f1
--- /dev/null
+++ b/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
@@ -0,0 +1,1554 @@
+<!-- Copyright (c) 1998, 1999 Nik Clayton, All rights reserved.
+
+ Redistribution and use in source (SGML DocBook) and 'compiled' forms
+ (SGML, HTML, PDF, PostScript, RTF and so forth) with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ 1. Redistributions of source code (SGML DocBook) must retain the above
+ copyright notice, this list of conditions and the following
+ disclaimer as the first lines of this file unmodified.
+
+ 2. Redistributions in compiled form (transformed to other DTDs,
+ converted to PDF, PostScript, RTF and other formats) must reproduce
+ the above copyright notice, this list of conditions and the
+ following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+
+ THIS DOCUMENTATION IS PROVIDED BY NIK CLAYTON "AS IS" AND ANY EXPRESS OR
+ IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL NIK CLAYTON BE LIABLE FOR ANY DIRECT,
+ INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE
+ POSSIBILITY OF SUCH DAMAGE.
+-->
+
+<chapter id="sgml-primer">
+ <title>SGML Primer</title>
+
+ <para>The Documentation Project makes heavy use of the Standard Generalized
+ Markup Language (SGML). This chapter describes what SGML is, how to read
+ and understand markup, and some of the SGML tricks you will see used in
+ the FAQ, Handbook, and website.</para>
+
+ <para>Portions of this section were inspired by Mark Galassi's <ulink
+ url="http://nis-www.lanl.gov/~rosalia/mydocs/docbook-intro/docbook-intro.html">Get Going With DocBook</ulink>.</para>
+
+ <sect1>
+ <title>Overview</title>
+
+ <para>Way back when, electronic text was simple to deal with. Admittedly,
+ you had to know which character set your document was written in (ASCII,
+ EBCDIC, or one of a number of others) but that was about it. Text was
+ text, and what you saw really was what you got. No frills, no
+ formatting, no intelligence.</para>
+
+ <para>Inevitably, this was not enough. Once you have text in a
+ machine-usable format, you expect machines to be able to use it, and
+ manipulate it intelligently. You would like to indicate that certain
+ phrases should be emphasised, or added to a glossary, or be hyperlinks.
+ You might want filenames to be shown in a &ldquo;typewriter&rdquo; style
+ font for viewing on screen, but as &ldquo;italics&rdquo; when printed,
+ or any of a myriad of other options for presentation.</para>
+
+ <para>It was once hoped that Artificial Intelligence (AI) would make this
+ easy. Your computer would read in the document, and automatically
+ identify key phrases, filenames, text that the reader should type in,
+ examples, and more. Unfortunately, real life has not happened quite
+ like that, and our computers require some assistance before the can
+ meaningfully process our text.</para>
+
+ <para>More precisely, they need help identifying what is what. You or I
+ can look at
+
+ <blockquote>
+ <para>To remove <filename>/tmp/foo</filename> use &man.rm.1;.</para>
+
+ <para><command>rm /tmp/foo</command></para>
+ </blockquote>
+
+ and easily see which parts are filenames, which are commands to be typed
+ in, which parts are references to manual pages, and so on. But the
+ computer processing the document can not. For this we need
+ markup.</para>
+
+ <para>&ldquo;Markup&rdquo; is commonly used to describe &ldquo;adding
+ value&rdquo; or &ldquo;increasing cost&rdquo;. The term takes on both
+ these meanings when applied to text. Markup is additional text included
+ in the document, distinguished from the document's content in some way,
+ so that programs that process the document can read the markup and use
+ it when making decisions about the document. Editors can hide the
+ markup from the user, so they are not distracted by it.</para>
+
+ <para>The extra information stored in the markup <emphasis>adds
+ value</emphasis> to the document. Adding the markup to the document
+ must typically be done by a person&mdash;after all, if computers could
+ recognise the text sufficiently well to add the markup then there would
+ be no need to add it in the first place. This <emphasis>increases the
+ cost</emphasis> of the document.</para>
+
+ <para>The previous example is actually represented in this document like
+ this;</para>
+
+ <programlisting><![ CDATA [
+<para>To remove <filename>/tmp/foo</filename> use &man.rm.1;.</para>
+
+<para><command>rm /tmp/foo</command></para>]]></programlisting>
+
+ <para>As you can see, the markup is clearly separate from the
+ content.</para>
+
+ <para>Obviously, if you are going to use markup you need to define what
+ your markup means, and how it should be interpreted. You will need a
+ markup language that you can follow when marking up your
+ documents.</para>
+
+ <para>SGML is <emphasis>not</emphasis> a markup langugage. Instead, SGML
+ is <emphasis>the language in which you write markup
+ languages</emphasis>. There have been many markup languages written
+ using SGML. HTML and DocBook are two of these.</para>
+
+ <para>This is an important point to understand. Most of the time you are
+ not writing SGML documents. Instead, you are writing documents in a
+ particular markup language. The definition of the markup language you
+ are using is written in SGML.</para>
+
+ <para>Each language definition (which is written in SGML) is more properly
+ called a Document Type Definition (DTD). The DTD specifies the name of
+ the elements that can be used, what order they appear in (and whether
+ some markup can be used inside other markup) and related
+ information.</para>
+
+ <para id="sgml-primer-validating">A DTD is a <emphasis>complete</emphasis>
+ specification of all the elements that are allowed to appear, the order
+ in which they should appear, which elements are mandatory, which are
+ optional, and so forth. This makes it possible to write a
+ <emphasis>parser</emphasis> which reads in the DTD and a document which
+ claims to conform to the DTD. The parser can then confirm whether or
+ not all the elements required by the DTD are in the document in the
+ right order, and whether there are any errors in the markup. This is
+ normally referred to as <quote>validating the document</quote>.</para>
+
+ <note>
+ <para>This processing simply confirms that the choice of elements, their
+ ordering, and so on, conforms to that listed in the DTD. It does
+ <emphasis>not</emphasis> check that you have used
+ <emphasis>appropriate</emphasis> markup for the content. If you were
+ to try and mark up all the filenames in your document as function
+ names, the parser would not flag this as an error (assuming, of
+ course, that your DTD defines elements for filenames and functions,
+ and that they are allowed to appear in the same place).</para>
+ </note>
+
+ <para>It is likely that most of your contributions to the Documentation
+ Project will consist of content marked up in either HTML or DocBook,
+ rather than alterations to the DTDs. For this reason this book will
+ not touch on how to write a DTD.</para>
+ </sect1>
+
+ <sect1 id="elements">
+ <title>Elements, tags, and attributes</title>
+
+ <para>All the DTDs written in SGML share certain characteristics. This is
+ hardly surprising, as the philisophy behind SGML will inevitably show
+ through. One of the most obvious manifestations of this philisophy is
+ that of <emphasis>content</emphasis> and
+ <emphasis>elements</emphasis>.</para>
+
+ <para>Your documentation (whether it is a single web page, or a lengthy
+ book) is considered to consist of content. This content is then divided
+ (and further subdivided) into elements. The purpose of adding markup is
+ to name and identify the boundaries of these elements for further
+ processing.</para>
+
+ <para>For example, consider a typical book. At the very top level, the
+ book is itself an element. This &ldquo;book&rdquo; element obviously
+ contains chapters, which can be considered to be elements in their own
+ right. Each chapter will contain more elements, such as paragraphs,
+ quotations, and footnotes. Each paragraph might contain further
+ elements, identifying content that was direct speech, or the name of a
+ character in the story.</para>
+
+ <para>You might like to think of this as &ldquo;chunking&rdquo; content.
+ At the very top level you have one chunk, the book. Look a little
+ deeper, and you have more chunks, the individual chapters. These are
+ chunked further into paragraphs, footnotes, character names, and so
+ on.</para>
+
+ <para>Notice how you can make this differentation between different
+ elements of the content without resorting to any SGML terms. It really
+ is surprisingly straightforward. You could do this with a highlighter
+ pen and a printout of the book, using different colours to indicate
+ different types of content.</para>
+
+ <para>Of course, we don't have an electronic highlighter pen, so we need
+ some other way of indicating which element each piece of content belongs
+ to. In languages written in SGML (HTML, DocBook, et al) this is done by
+ means of <emphasis>tags</emphasis>.</para>
+
+ <para>A tag is used to identify where a particular element starts, and
+ where the ends. <emphasis>The tag is not part of the element
+ itself</emphasis>. Because each DTD was normally written to mark up
+ specific types of information, each one will recognise different
+ elements, and will therefore have different names for the tags.</para>
+
+ <para>For an element called <replaceable>element-name</replaceable> the
+ start tag will normally look like
+ <literal>&lt;<replaceable>element-name</replaceable>&gt;</literal>. The
+ corresponding closing tag for this element is
+ <literal>&lt;/<replaceable>element-name</replaceable>&gt;</literal>.</para>
+
+ <example>
+ <title>Using an element (start and end tags)</title>
+
+ <para>HTML has an element for indicating that the content enclosed by
+ the element is a paragraph, called <literal>p</literal>. This
+ element has both start and end tags.</para>
+
+ <programlisting>
+<![ CDATA [<p>This is a paragraph. It starts with the start tag for
+ the 'p' element, and it will end with the end tag for the 'p'
+ element.</p>
+
+<p>This is another paragraph. But this one is much shorter.</p>]]></programlisting>
+ </example>
+
+ <para>Not all elements require an end tag. Some elements have no content.
+ For example, in HTML you can indicate that you want a horizontal line to
+ appear in the document. Obviously, this line has no content, so just
+ the start tag is required for this element.</para>
+
+ <example>
+ <title>Using an element (start tag only)</title>
+
+ <para>HTML has an element for indicating a horizontal rule, called
+ <literal>hr</literal>. This element does not wrap content, so only has
+ a start tag.</para>
+
+ <programlisting>
+<![ CDATA [<p>This is a paragraph.</p>
+
+<hr>
+
+<p>This is another paragraph. A horizontal rule separates this
+ from the previous paragraph.</p>]]></programlisting>
+ </example>
+
+ <para>If it is not obvious by now, elements can contain other elements.
+ In the book example earlier, the book element contained all the chapter
+ elements, which in turn contained all the paragraph elements, and so
+ on.</para>
+
+ <example>
+ <title>Elements within elements; <sgmltag>em</sgmltag></title>
+
+ <programlisting>
+<![ CDATA [<p>This is a simple <em>paragraph</em> where some
+ of the <em>words</em> have been <em>emphasised</em>.</p>]]></programlisting>
+ </example>
+
+ <para>The DTD will specify the rules detailing which elements can contain
+ other elements, and exactly what they can contain.</para>
+
+ <important>
+ <para>People often confuse the terms tags and elements, and use the terms
+ as if they were interchangeable. They are not.</para>
+
+ <para>An element is a conceptual part of your document. An element has
+ a defined start and end. The tags mark where the element starts and
+ end.</para>
+
+ <para>When this document (or anyone else knowledgable about SGML) refers
+ to &ldquo;the &lt;p&gt; tag&rdquo; they mean the literal text
+ consisting of the three characters <literal>&lt;</literal>,
+ <literal>p</literal>, and <literal>&gt;</literal>. But the phrase
+ &ldquo;the &lt;p&gt; element&rdquo; refers to the whole element.</para>
+
+ <para>This distinction <emphasis>is</emphasis> very subtle. But keep it
+ in mind.</para>
+ </important>
+
+ <para>Elements can have attributes. An attribute has a name and a value,
+ and is used for adding extra information to the element. This might be
+ information that indicates how the content should be rendered, or might
+ be something that uniquely identifies that occurence of the element, or
+ it might be something else.</para>
+
+ <para>An element's attributes are written <emphasis>inside</emphasis> the
+ start tag for that element, and take the form
+ <literal><replaceable>attribute-name</replaceable>="<replaceable>attribute-value</replaceable>"</literal>.</para>
+
+ <para>In sufficiently recent versions of HTML, the <sgmltag>p</sgmltag>
+ element has an attribute called <literal>align</literal>, which suggests
+ an alignment (justification) for the paragraph to the program displaying
+ the HTML.</para>
+
+ <para>The <literal>align</literal> attribute can take one of four defined
+ values, <literal>left</literal>, <literal>center</literal>,
+ <literal>right</literal> and <literal>justify</literal>. If the
+ attribute is not specified then the default is
+ <literal>left</literal>.</para>
+
+ <example>
+ <title>Using an element with an attribute</title>
+
+ <programlisting>
+<![ CDATA [<p align="left">The inclusion of the align attribute
+ on this paragraph was superfluous, since the default is left.</p>
+
+<p align="center">This may appear in the center.</p>]]></programlisting>
+ </example>
+
+ <para>Some attributes will only take specific values, such as
+ <literal>left</literal> or <literal>justify</literal>. Others will
+ allow you to enter anything you want. If you need to include quotes
+ (<literal>"</literal>) within an attribute then use single quotes around
+ the attribute value.</para>
+
+ <example>
+ <title>Single quotes around attributes</title>
+
+ <programlisting>
+<![ CDATA [<p align='right'>I'm on the right!</p>]]></programlisting>
+ </example>
+
+ <para>Sometimes you do not need to use quotes around attribute values at
+ all. However, the rules for doing this are subtle, and it is far simpler
+ just to <emphasis>always</emphasis> quote your attribute values.</para>
+
+ <sect2>
+ <title>For you to do&hellip;</title>
+
+ <para>In order to run the examples in this document you will need to
+ install some software on your system and ensure that an environment
+ variable is set correctly.</para>
+
+ <procedure>
+ <step>
+ <para>Download and install <filename>textproc/docproj</filename>
+ from the FreeBSD ports system. This is a
+ <emphasis>meta-port</emphasis> that should download and install
+ all of the programs and supporting files that are used by the
+ Documentation Project.</para>
+ </step>
+
+ <step>
+ <para>Add lines to your shell startup files to set
+ <envar>SGML_CATALOG_FILES</envar>.</para>
+
+ <example id="sgml-primer-envars">
+ <title><filename>.profile</filename>, for &man.sh.1; and
+ &man.bash.1; users</title>
+
+ <programlisting>
+SGML_ROOT=/usr/local/share/sgml
+SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
+SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
+SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
+SGML_CATALOG_FILES=${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES
+export SGML_CATALOG_FILES</programlisting>
+ </example>
+
+ <example>
+ <title><filename>.login</filename>, for &man.csh.1; and
+ &man.tcsh.1; users</title>
+
+ <programlisting>
+setenv SGML_ROOT /usr/local/share/sgml
+setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
+setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
+setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
+setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES</programlisting>
+ </example>
+
+ <para>Then either log out, and log back in again, or run those
+ commands from the command line to set the variable values.</para>
+ </step>
+ </procedure>
+
+ <procedure>
+ <step>
+ <para>Create <filename>example.sgml</filename>, and enter the
+ following text;</para>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+
+<html>
+ <head>
+ <title>An example HTML file</title>
+ </head>
+
+ <body>
+ <p>This is a paragraph containing some text.</p>
+
+ <p>This paragraph contains some more text.</p>
+
+ <p align="right">This paragraph might be right-justified.</p>
+ </body>
+</html>]]></programlisting>
+ </step>
+
+ <step>
+ <para>Try and validate this file using an SGML parser.</para>
+
+ <para>Part of <filename>textproc/docproj</filename> is the
+ &man.nsgmls.1; <link linkend="sgml-primer-validating">validating
+ parser</link>. Normally, &man.nsgmls.1; reads in a document
+ marked up according to an SGML DTD and returns a copy of the
+ document's Element Structure Information Set (ESIS, but that is
+ not important right now).</para>
+
+ <para>However, when <option>-s</option> is passed as a parameter to
+ it, &man.nsgmls.1; will suppress its normal output, and just print
+ error messages. This makes it a useful way to check to see if your
+ document is valid or not.</para>
+
+ <para>Use &man.nsgmls.1; to check that your document is
+ valid;</para>
+
+ <screen>&prompt.user; <userinput>nsgmls -s example.sgml</userinput></screen>
+
+ <para>As you will see, &man.nsgmls.1; returns without displaying any
+ output. This means that your document validated
+ successfully.</para>
+ </step>
+
+ <step>
+ <para>See what happens when required elements are omitted. Try
+ removing the <sgmltag>title</sgmltag> and <sgmltag>/title</sgmltag>
+ tags, and re-run the validation.</para>
+
+ <screen>&prompt.user; <userinput>nsgmls -s example.sgml</userinput>
+nsgmls:example.sgml:5:4:E: character data is not allowed here
+nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finished</screen>
+
+ <para>The error output from &man.nsgmls.1; is organised into
+ colon-separated groups, or columns.</para>
+
+ <informaltable frame="none">
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>Meaning</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry>1</entry>
+ <entry>The name of the program generating the error. This
+ will always be <literal>nsgmls</literal>.</entry>
+ </row>
+
+ <row>
+ <entry>2</entry>
+ <entry>The name of the file that contains the error.</entry>
+ </row>
+
+ <row>
+ <entry>3</entry>
+ <entry>Line number where the error appears.</entry>
+ </row>
+
+ <row>
+ <entry>4</entry>
+ <entry>Column number where the error appears.</entry>
+ </row>
+
+ <row>
+ <entry>5</entry>
+ <entry>A one letter code indicating the nature of the
+ message. <literal>I</literal> indicates an informational
+ message, <literal>W</literal> is for warnings, and
+ <literal>E</literal> is for errors<footnote>
+ <para>It is not always the fifth column either.
+ <command>nsgmls -sv</command> displays
+ <literal>nsgmls:I: SP version "1.3"</literal>
+ (depending on the installed version). As you can see,
+ this is an informational message.</para>
+ </footnote>, and <literal>X</literal> is for
+ cross-references. As you can see, these messages are
+ errors.</entry>
+ </row>
+
+ <row>
+ <entry>6</entry>
+ <entry>The text of the error message.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ <para>Simply omitting the <sgmltag>title</sgmltag> tags has generated
+ 2 different errors.</para>
+
+ <para>The first error indicates that content (in this case,
+ characters, rather than the start tag for an element) has occured
+ where the SGML parser was expecting something else. In this case,
+ the parser was expecting to see one of the start tags for elements
+ that are valid inside <sgmltag>head</sgmltag> (such as
+ <sgmltag>title</sgmltag>).</para>
+
+ <para>The second error is because <sgmltag>head</sgmltag> elements
+ <emphasis>must</emphasis> contain a <sgmltag>title</sgmltag>
+ element. Because it does not &man.nsgmls.1; considers that the
+ element has not been properly finished. However, the closing tag
+ indicates that the element has been closed before it has been
+ finished.</para>
+ </step>
+
+ <step>
+ <para>Put the <literal>title</literal> element back in.</para>
+ </step>
+ </procedure>
+ </sect2>
+ </sect1>
+
+ <sect1 id="doctype-declaration">
+ <title>The DOCTYPE declaration</title>
+
+ <para>The beginning of each document that you write must specify the name
+ of the DTD that the document conforms to. This is so that SGML parsers
+ can determine the DTD and ensure that the document does conform to the
+ it.</para>
+
+ <para>This information is generally expressed on one line, in the DOCTYPE
+ declaration.</para>
+
+ <para>A typical declaration for document written to conform with version
+ 4.0 of the HTML DTD looks like this;</para>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">]]></programlisting>
+
+ <para>That line contains a number of different components.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>&lt;!</literal></term>
+
+ <listitem>
+ <para>Is the <emphasis>indicator</emphasis> that indicates that this
+ is an SGML declaration. This line is declaring the document type.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>DOCTYPE</literal></term>
+
+ <listitem>
+ <para>Shows that this is an SGML declaration for the document
+ type.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>html</literal></term>
+
+ <listitem>
+ <para>Names the first <link linkend="elements">element</link> that
+ will appear in the document.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>PUBLIC "-//W3C//DTD HTML 4.0//EN"</literal></term>
+
+ <listitem>
+ <para>Lists the Formal Public Identifier (FPI) for the DTD that this
+ document conforms to. Your SGML parser will use this to find the
+ correct DTD when processing this document.</para>
+
+ <para><literal>PUBLIC</literal> is not a part of the FPI, but
+ indicates to the SGML processor how to find the DTD referenced in
+ the FPI. Other ways of telling the SGML parser how to find the DTD
+ are shown <link linkend="fpi-alternatives">later</link>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>&gt;</literal></term>
+
+ <listitem>
+ <para>Returns to the document.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <sect2>
+ <title>Formal Public Identifiers (FPIs)</title>
+
+ <note>
+ <para>You don't need to know this, but it's useful background, and
+ might help you debug problems when your SGML processor can't locate
+ the DTD you are using.</para>
+ </note>
+
+ <para>FPIs must follow a specific syntax. This syntax is as
+ follows;</para>
+
+ <programlisting>
+"<replaceable>Owner</replaceable>//<replaceable>Keyword</replaceable> <replaceable>Description</replaceable>//<replaceable>Language</replaceable>"</programlisting>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable>Owner</replaceable></term>
+
+ <listitem>
+ <para>This indicates the owner of the FPI.</para>
+
+ <para>If this string starts with &ldquo;ISO&rdquo; then this is an
+ ISO owned FPI. For example, the FPI <literal>"ISO
+ 8879:1986//ENTITIES Greek Symbols//EN"</literal> lists
+ <literal>ISO 8879:1986</literal> as being the owner for the set
+ of entities for greek symbols. ISO 8879:1986 is the ISO number
+ for the SGML standard.</para>
+
+ <para>Otherwise, this string will either look like
+ <literal>-//<replaceable>Owner</replaceable></literal> or
+ <literal>+//<replaceable>Owner</replaceable></literal> (notice
+ the only difference is the leading <literal>+</literal> or
+ <literal>-</literal>).</para>
+
+ <para>If the string starts with <literal>-</literal> then the
+ owner information is unregistered, with a <literal>+</literal>
+ it identifies it as being registered.</para>
+
+ <para>ISO 9070:1991 defines how registered names are generated; it
+ might be derived from the number of an ISO publication, an ISBN
+ code, or an organisation code assigned according to ISO 6523. In
+ addition, a registration authority could be created in order to
+ assign registered names. The ISO council delegated this to the
+ American National Standards Institute (ANSI).</para>
+
+ <para>Because the FreeBSD Project hasn't been registered the
+ owner string is <literal>-//FreeBSD</literal>. And as you can
+ see, the W3C are not a registered owner either.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable>Keyword</replaceable></term>
+
+ <listitem>
+ <para>There are several keywords that indicate the type of
+ information in the file. Some of the most common keywords are
+ <literal>DTD</literal>, <literal>ELEMENT</literal>,
+ <literal>ENTITIES</literal>, and <literal>TEXT</literal>.
+ <literal>DTD</literal> is used only for DTD files,
+ <literal>ELEMENT</literal> is usually used for DTD fragments
+ that contain only entity or element declarations.
+ <literal>TEXT</literal> is used for SGML content (text and
+ tags).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable>Description</replaceable></term>
+
+ <listitem>
+ <para>Any description you want to supply for the contents of this
+ file. This may include version numbers or any short text that is
+ meaningful to you and unique for the SGML system.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable>Language</replaceable></term>
+
+ <listitem>
+ <para>This is an ISO two-character code that identifies the native
+ language for the file. <literal>EN</literal> is used for
+ English.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <sect3>
+ <title><filename>catalog</filename> files</title>
+
+ <para>If you use the syntax above and try and process this document
+ using an SGML processor, the processor will need to have some way of
+ turning the FPI into the name of the file on your computer that
+ contains the DTD.</para>
+
+ <para>In order to do this it can use a catalog file. A catalog file
+ (typically called <filename>catalog</filename>) contains lines that
+ map FPIs to filenames. For example, if the catalog file contained the
+ line;</para>
+
+ <programlisting>
+PUBLIC "-//W3C//DTD HTML 4.0//EN" "4.0/strict.dtd"</programlisting>
+
+ <para>The SGML processor would know to look up the DTD from
+ <filename>strict.dtd</filename> in the <filename>4.0</filename>
+ subdirectory of whichever directory held the
+ <filename>catalog</filename> file that contained that line.</para>
+
+ <para>Look at the contents of
+ <filename>/usr/local/share/sgml/html/catalog</filename>. This is the
+ catalog file for the HTML DTDs that will have been installed as part
+ of the <filename>textproc/docproj</filename> port.</para>
+ </sect3>
+
+ <sect3>
+ <title><envar>SGML_CATALOG_FILES</envar></title>
+
+ <para>In order to locate a <filename>catalog</filename> file, your
+ SGML processor will need to know where to look. Many of them feature
+ command line parameters for specifying the path to one or more
+ catalogs.</para>
+
+ <para>In addition, you can set <envar>SGML_CATALOG_FILES</envar> to
+ point to the files. This environment variable should consist of a
+ colon-separated list of catalog files (including their full
+ path).</para>
+
+ <para>Typically, you will want to include the following files;</para>
+
+ <itemizedlist>
+ <listitem>
+ <para><filename>/usr/local/share/sgml/docbook/3.0/catalog</filename></para>
+ </listitem>
+
+ <listitem>
+ <para><filename>/usr/local/share/sgml/html/catalog</filename></para>
+ </listitem>
+
+ <listitem>
+ <para><filename>/usr/local/share/sgml/iso8879/catalog</filename></para>
+ </listitem>
+
+ <listitem>
+ <para><filename>/usr/local/share/sgml/jade/catalog</filename></para>
+ </listitem>
+ </itemizedlist>
+
+ <para>You should <link linkend="sgml-primer-envars">already have done
+ this</link>.</para>
+ </sect3>
+ </sect2>
+
+ <sect2 id="fpi-alternatives">
+ <title>Alternatives to FPIs</title>
+
+ <para>Instead of using an FPI to indicate the DTD that the document
+ conforms to (and therefore, which file on the system contains the DTD)
+ you can explicitly specify the name of the file.</para>
+
+ <para>The syntax for this is slightly different;</para>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html SYSTEM "/path/to/file.dtd">]]></programlisting>
+
+ <para>The <literal>SYSTEM</literal> keyword indicates that the SGML
+ processor should locate the DTD in a system specific fashion. This
+ typically (but not always) means the DTD will be provided as a
+ filename.</para>
+
+ <para>Using FPIs is preferred for reasons of portability. You don't want
+ to have to ship a copy of the DTD around with your document, and if
+ you used the <literal>SYSTEM</literal> identifier then everyone would
+ need to keep their DTDs in the same place.</para>
+ </sect2>
+ </sect1>
+
+ <sect1 id="sgml-escape">
+ <title>Escaping back to SGML</title>
+
+ <para>Earlier in this primer I said that SGML is only used when writing a
+ DTD. This is not strictly true. There is certain SGML syntax that you
+ will want to be able to use within your documents. For example,
+ comments can be included in your document, and will be ignored by the
+ parser. Comments are entered using SGML syntax. Other uses for SGML
+ syntax in your document will be shown later too.</para>
+
+ <para>Obviously, you need some way of indicating to the SGML processor
+ that the following content is not elements within the document, but is
+ SGML that the parser should act upon.</para>
+
+ <para>These sections are marked by <literal>&lt;! ... &gt;</literal> in
+ your document. Everything between these delimiters is SGML syntax as you
+ might find within a DTD.</para>
+
+ <para>As you may just have realised, the <link
+ linkend="doctype-declaration">DOCTYPE declaration</link> is an example
+ of SGML syntax that you need to include in your document&hellip;</para>
+ </sect1>
+
+ <sect1>
+ <title>Comments</title>
+
+ <para>Comments are an SGML construction, and are normally only valid
+ inside a DTD. However, as <xref linkend="sgml-escape"> shows, it is
+ possible to use SGML syntax within your document.</para>
+
+ <para>The delimiters for SGML comments is the string
+ &ldquo;<literal>--</literal>&rdquo;. The first occurence of this string
+ opens a comment, and the second closes it.</para>
+
+ <example>
+ <title>SGML generic comment</title>
+
+ <programlisting>
+&lt;!-- test comment --></programlisting>
+
+ <programlisting><![ CDATA [
+<!-- This is inside the comment -->
+
+<!-- This is another comment -->
+
+<!-- This is one way
+ of doing multiline comments -->
+
+<!-- This is another way of --
+ -- doing multiline comments -->]]></programlisting>
+ </example>
+
+ <![ %output.print; [
+ <important>
+ <title>Use 2 dashes</title>
+
+ <para>There is a problem with producing the Postscript and PDF versions
+ of this document. The above example probably shows just one hyphen
+ symbol, <literal>-</literal> after the <literal>&lt;!</literal> and
+ before the <literal>&gt;</literal>.</para>
+
+ <para>You <emphasis>must</emphasis> use two <literal>-</literal>,
+ <emphasis>not</emphasis> one. The Postscript and PDF versions have
+ translated the two <literal>-</literal> in the original to a longer,
+ more professional <emphasis>em-dash</emphasis>, and broken this
+ example in the process.</para>
+
+ <para>The HTML, plain text, and RTF versions of this document are not
+ affected.</para>
+ </important>
+ ]]>
+
+ <para>If you have used HTML before you may have been shown different rules
+ for comments. In particular, you may think that the string
+ <literal>&lt!--</literal> opens a comment, and it is only closed by
+ <literal>--&gt;</literal>.</para>
+
+ <para>This is <emphasis>not</emphasis> the case. A lot of web browsers
+ have broken HTML parsers, and will accept that as valid. However, the
+ SGML parsers used by the Documentation Project are much stricter, and
+ will reject documents that make that error.</para>
+
+ <example>
+ <title>Errorneous SGML comments</title>
+
+ <programlisting><![ CDATA [
+<!-- This is in the comment --
+
+ THIS IS OUTSIDE THE COMMENT!
+
+ -- back inside the comment -->]]></programlisting>
+
+ <para>The SGML parser will treat this as though it were actually;</para>
+
+ <programlisting>
+&lt;!THIS IS OUTSIDE THE COMMENT&gt;</programlisting>
+
+ <para>This is not valid SGML, and may give confusing error
+ messages.</para>
+
+ <programlisting>
+<![ CDATA [<!--------------- This is a very bad idea --------------->]]></programlisting>
+
+ <para>As the example suggests, <emphasis>do not</emphasis> write
+ comments like that.</para>
+
+ <programlisting>
+<![ CDATA [<!--===================================================-->]]></programlisting>
+
+ <para>That is a (slightly) better approach, but it still potentially
+ confusing to people new to SGML.</para>
+ </example>
+
+ <sect2>
+ <title>For you to do&hellip;</title>
+
+ <procedure>
+ <step>
+ <para>Add some comments to <filename>example.sgml</filename>, and
+ check that the file still validates using &man.nsgmls.1;</para>
+ </step>
+
+ <step>
+ <para>Add some invalid comments to
+ <filename>example.sgml</filename>, and see the error messages that
+ &man.nsgmls.1; gives when it encounters an invalid comment.</para>
+ </step>
+ </procedure>
+ </sect2>
+ </sect1>
+
+ <sect1>
+ <title>Entities</title>
+
+ <para>Entities are an SGML term. You might feel more comfortable thinking
+ of them as variables. There are two types of entity in SGML, general
+ entities and parameter entities.</para>
+
+ <sect2 id="general-entities">
+ <title>General Entities</title>
+
+ <para>General entities are a way of assigning names to chunks of text,
+ and reusing that text (which may contain markup) throughout your
+ document.</para>
+
+ <para>You can not use general entities in an SGML context (although you
+ define them in one). They can only be used in your document. Contrast
+ this with <link linkend="parameter-entities">parameter
+ entities</link>.</para>
+
+ <para>Each general entity has a name. When you want to reference a
+ general entity (and therefore include whatever text it represents in
+ your document), you write
+ <literal>&amp;<replaceable>entity-name</replaceable>;</literal>. For
+ example, suppose you had an entity called
+ <literal>current.version</literal> which expanded to the current
+ version number of your product. You could write;</para>
+
+ <programlisting>
+<![ CDATA [<para>The current version of our product is
+ &current.version;.</para>]]></programlisting>
+
+ <para>When the version number changes you can simply change the
+ definition of the value of the general entity and reprocess your
+ document.</para>
+
+ <para>You can also use general entities to enter characters that you
+ could not normally include in an SGML document. For example, &lt; and
+ &amp; can not normally appear in an SGML document. Normally, when the
+ SGML processor sees a &lt; symbol it assumes that a tag (either a start
+ tag or an end tag) is about to appear, and when it sees a &amp; symbol
+ it assumes the next text will be the name of an entity.</para>
+
+ <para>Fortunately, you can use the two general entities &amp;lt; and
+ &amp;amp; whenever you need to include one or other of these </para>
+
+ <para>A general entity can only be defined within an SGML context.
+ Typically, this is done immediately after the DOCTYPE
+ declaration.</para>
+
+ <example>
+ <title>Defining general entities</title>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+<!ENTITY current.version "3.0-RELEASE">
+<!ENTITY last.version "2.2.7-RELEASE">
+]>]]></programlisting>
+
+ <para>Notice how the DOCTYPE declaration has been extended by adding a
+ square bracket at the end of the first line. The two entities are
+ then defined over the next two lines, before the square bracket is
+ closed, and then the DOCTYPE declaration is closed.</para>
+
+ <para>The square brackets are necessary to indicate that we are
+ extending the DTD indicated by the DOCTYPE declaration.</para>
+ </example>
+ </sect2>
+
+ <sect2 id="parameter-entities">
+ <title>Parameter entities</title>
+
+ <para>Like <link linkend="general-entities">general entities</link>,
+ parameter entities are used to assign names to reusable chunks of
+ text. However, where as general entities can only be used within your
+ document, parameter entities can only be used within an <link
+ linkend="sgml-escape">SGML context</link>.</para>
+
+ <para>Parameter entities are defined in a similar way to general
+ entities. However, instead of using
+ <literal>&amp;<replaceable>entity-name</replaceable>;</literal> to
+ refer to them, use
+ <literal>%<replaceable>entity-name</replaceable>;</literal><footnote>
+ <para><emphasis>P</emphasis>arameter entities use the
+ <emphasis>P</emphasis>ercent symbol.</para>
+ </footnote>. The definition also includes the <literal>%</literal>
+ between the <literal>ENTITY</literal> keyword and the name of the
+ entity.</para>
+
+ <example>
+ <title>Defining parameter entities</title>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+<!ENTITY % param.some "some">
+<!ENTITY % param.text "text">
+<!ENTITY % param.new "%param.some more %param.text">
+
+<!-- %param.new now contains "some more text" -->
+]>]]></programlisting>
+ </example>
+
+ <para>This may not seem particularly useful. It will be.</para>
+ </sect2>
+
+ <sect2>
+ <title>For you to do&hellip;</title>
+
+ <procedure>
+ <step>
+ <para>Add a general entity to
+ <filename>example.sgml</filename>.</para>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
+<!ENTITY version "1.1">
+]>
+
+<html>
+ <head>
+ <title>An example HTML file</title>
+ </head>
+
+ <!-- You might well have some comments in here as well -->
+
+ <body>
+ <p>This is a paragraph containing some text.</p>
+
+ <p>This paragraph contains some more text.</p>
+
+ <p align="right">This paragraph might be right-justified.</p>
+
+ <p>The current version of this document is: &version;</p>
+ </body>
+</html>]]></programlisting>
+ </step>
+
+ <step>
+ <para>Validate the document using &man.nsgmls.1;</para>
+ </step>
+
+ <step>
+ <para>Load <filename>example.sgml</filename> into your web browser
+ (you may need to copy it to <filename>example.html</filename>
+ before your browser recognises it as an HTML document).</para>
+
+ <para>Unless your browser is very advanced, you won't see the entity
+ reference <literal>&amp;version;</literal> replaced with the
+ version number. Most web browsers have very simplistic parsers
+ which don't do proper SGML<footnote>
+ <para>This is a shame. Imagine all the problems and hacks (such
+ as Server Side Includes) that could be avoided if they
+ did.</para>
+ </footnote>.</para>
+ </step>
+
+ <step>
+ <para>The solution is to <emphasis>normalise</emphasis> your
+ document. Normalising it involves converting all the entity
+ references to the values of those entities.</para>
+
+ <para>You can use &man.sgmlnorm.1; to do this.</para>
+
+ <screen>&prompt.user; <userinput>sgmlnorm example.sgml > example.html</userinput></screen>
+
+ <para>You should find a normalised (i.e., entity references
+ expanded) copy of your document in
+ <filename>example.html</filename>, ready to load into your web
+ browser.</para>
+ </step>
+
+ <step>
+ <para>If you look at the output from &man.sgmlnorm.1; you will see
+ that it does not include a DOCTYPE declaration at the start. To
+ include this you need to use the <option>-d</option>
+ option;</para>
+
+ <screen>&prompt.user; <userinput>sgmlnorm -d example.sgml > example.html</userinput></screen>
+ </step>
+ </procedure>
+ </sect2>
+ </sect1>
+
+ <sect1>
+ <title>Using entities to include files</title>
+
+ <para>Entities (both <link linkend="general-entities">general</link> and
+ <link linkend="parameter-entities">parameter</link>) come into their own
+ when you realise they can be used to include other files.</para>
+
+ <sect2 id="include-using-gen-entities">
+ <title>Using general entities to include files</title>
+
+ <para>Suppose you have some content for an SGML book organised into
+ files, one file per chapter, called
+ <filename>chapter1.sgml</filename>,
+ <filename>chapter2.sgml</filename>, and so forth, with a
+ <filename>book.sgml</filename> file that will contain these
+ chapters.</para>
+
+ <para>In order to use the contents of these files as the values for your
+ entities, you declare them with the <literal>SYSTEM</literal> keyword.
+ This directs the SGML parser to use the contents of the named file as
+ the value of the entity.</para>
+
+ <example>
+ <title>Using general entities to include files</title>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+<!ENTITY chapter.1 SYSTEM "chapter1.sgml">
+<!ENTITY chapter.2 SYSTEM "chapter2.sgml">
+<!ENTITY chapter.3 SYSTEM "chapter3.sgml">
+<!-- And so forth -->
+]>
+
+<html>
+ <!-- Use the entities to load in the chapters -->
+
+ &chapter.1;
+ &chapter.2;
+ &chapter.3;
+</html>]]></programlisting>
+ </example>
+
+ <warning>
+ <para>When using general entities to include other files within a
+ document, the files being included
+ (<filename>chapter1.sgml</filename>,
+ <filename>chapter2.sgml</filename>, and so on) <emphasis>must
+ not</emphasis> start with a DOCTYPE declaration. This is a syntax
+ error.</para>
+ </warning>
+ </sect2>
+
+ <sect2>
+ <title>Using parameter entities to include files</title>
+
+ <para>Recall that parameter entities can only be used inside an SGML
+ context. Why then would you want to include a file within an SGML
+ context?</para>
+
+ <para>You can use this to ensure that you can reuse your general
+ entities.</para>
+
+ <para>Suppose that you had many chapters in your document, and you
+ reused these chapters in two different books, each book organising the
+ chapters in a different fashion.</para>
+
+ <para>You could list the entities at the top of each book, but this
+ quickly becomes cumbersome to manage.</para>
+
+ <para>Instead, place the general entity definitions inside one file,
+ and use a parameter entity to include that file within your
+ document.</para>
+
+ <example>
+ <title>Using parameter entities to include files</title>
+
+ <para>First, place your entity definitions in a separate file, called
+ <filename>chapters.ent</filename>. This file contains the
+ following;</para>
+
+ <programlisting>
+<![ CDATA [<!ENTITY chapter.1 SYSTEM "chapter1.sgml">
+<!ENTITY chapter.2 SYSTEM "chapter2.sgml">
+<!ENTITY chapter.3 SYSTEM "chapter3.sgml">]]></programlisting>
+
+ <para>Now create a parameter entity to refer to the contents of the
+ file. Then use the parameter entity to load the file into the
+ document, which will then make all the general entities available
+ for use. Then use the general entities as before;</para>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+<!-- Define a parameter entity to load in the chapter general entities -->
+<!ENTITY % chapters SYSTEM "chapters.ent">
+
+<!-- Now use the parameter entity to load in this file -->
+%chapters;
+]>
+
+<html>
+ &chapter.1;
+ &chapter.2;
+ &chapter.3;
+</html>]]></programlisting>
+ </example>
+ </sect2>
+
+ <sect2>
+ <title>For you to do&hellip;</title>
+
+ <sect3>
+ <title>Use general entities to include files</title>
+
+ <procedure>
+ <step>
+ <para>Create three files, <filename>para1.sgml</filename>,
+ <filename>para2.sgml</filename>, and
+ <filename>para3.sgml</filename>.</para>
+
+ <para>Put content similar to the following in each file;</para>
+
+ <programlisting>
+<![ CDATA [<p>This is the first paragraph.</p>]]></programlisting>
+ </step>
+
+ <step>
+ <para>Edit <filename>example.sgml</filename> so that it looks like
+ this;</para>
+
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+<!ENTITY version "1.1">
+<!ENTITY para1 SYSTEM "para1.sgml">
+<!ENTITY para2 SYSTEM "para2.sgml">
+<!ENTITY para3 SYSTEM "para3.sgml">
+]>
+
+<html>
+ <head>
+ <title>An example HTML file</title>
+ </head>
+
+ <body>
+ <p>The current version of this document is: &version;</p>
+
+ &para1;
+ &para2;
+ &para3;
+ </body>
+</html>]]></programlisting>
+ </step>
+
+ <step>
+ <para>Produce <filename>example.html</filename> by normalising
+ <filename>example.sgml</filename>.</para>
+
+ <screen>&prompt.user; <userinput>sgmlnorm -d example.sgml > example.html</userinput></screen>
+ </step>
+
+ <step>
+ <para>Load <filename>example.html</filename> in to your web
+ browser, and confirm that the
+ <filename>para<replaceable>n</replaceable>.sgml</filename> files
+ have been included in <filename>example.html</filename>.</para>
+ </step>
+ </procedure>
+ </sect3>
+
+ <sect3>
+ <title>Use parameter entities to include files</title>
+
+ <note>
+ <para>You must have taken the previous steps first.</para>
+ </note>
+
+ <procedure>
+ <step>
+ <para>Edit <filename>example.sgml</filename> so that it looks like
+ this;</para>
+ <programlisting>
+<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+<!ENTITY % entities SYSTEM "entities.sgml"> %entities;
+]>
+
+<html>
+ <head>
+ <title>An example HTML file</title>
+ </head>
+
+ <body>
+ <p>The current version of this document is: &version;</p>
+
+ &para1;
+ &para2;
+ &para3;
+ </body>
+</html>]]></programlisting>
+ </step>
+
+ <step>
+ <para>Create a new file, <filename>entities.sgml</filename>, with
+ this content;</para>
+
+ <programlisting>
+<![ CDATA [<!ENTITY version "1.1">
+<!ENTITY para1 SYSTEM "para1.sgml">
+<!ENTITY para2 SYSTEM "para2.sgml">
+<!ENTITY para3 SYSTEM "para3.sgml">]]></programlisting>
+ </step>
+
+ <step>
+ <para>Produce <filename>example.html</filename> by normalising
+ <filename>example.sgml</filename>.</para>
+
+ <screen>&prompt.user; <userinput>sgmlnorm -d example.sgml > example.html</userinput></screen>
+ </step>
+
+ <step>
+ <para>Load <filename>example.html</filename> in to your web
+ browser, and confirm that the
+ <filename>para<replaceable>n</replaceable>.sgml</filename> files
+ have been included in <filename>example.html</filename>.</para>
+ </step>
+ </procedure>
+ </sect3>
+ </sect2>
+ </sect1>
+
+ <sect1>
+ <title>Marked sections</title>
+
+ <para>SGML provides a mechanism to indicate that particular pieces of the
+ document should be processed in a special way. These are termed
+ &ldquo;marked sections&rdquo;.</para>
+
+ <example>
+ <title>Structure of a marked section</title>
+
+ <programlisting>
+&lt;![ <replaceable>KEYWORD</replaceable> [
+ Contents of marked section
+]]&gt;</programlisting>
+ </example>
+
+ <para>As you would expect, being an SGML construct, a marked section
+ starts <literal>&lt!</literal>.</para>
+
+ <para>The first square bracket begins to delimit the marked
+ section.</para>
+
+ <para><replaceable>KEYWORD</replaceable> describes how this marked
+ section should be processed by the parser.</para>
+
+ <para>The second square bracket indicates that the content of the marked
+ section starts here.</para>
+
+ <para>The marked section is finished by closing the two square brackets,
+ and then returning to the document context from the SGML context with
+ <literal>&gt;</literal></para>
+
+ <sect2>
+ <title>Marked section keywords</title>
+
+ <sect3>
+ <title><literal>CDATA</literal>, <literal>RCDATA</literal></title>
+
+ <para>These keywords denote the marked sections <emphasis>content
+ model</emphasis>, and allow you to change it from the
+ default.</para>
+
+ <para>When an SGML processor is processing a document, it keeps track
+ of what is called the &ldquo;content model&rdquo;.</para>
+
+ <para>Briefly, the content model describes what sort of content the
+ parser is expecting to see, and what it will do with it when it
+ finds it.</para>
+
+ <para>The two content models you will probably find most useful are
+ <literal>CDATA</literal> and <literal>RCDATA</literal>.</para>
+
+ <para><literal>CDATA</literal> is for &ldquo;Character Data&rdquo;. If
+ the parser is in this content model then it is expecting to see
+ characters, and characters only. In this model the &lt; and &amp;
+ symbols lose their special status, and will be treated as ordinary
+ characters.</para>
+
+ <para><literal>RCDATA</literal> is for &ldquo;Entity references and
+ character data&rdquo; If the parser is in this content model then it
+ is expecting to see characters <emphasis>and</emphasis> entities.
+ &lt; loses its special status, but &amp; will still be treated as
+ starting the beginning of a general entity.</para>
+
+ <para>This is particularly useful if you are including some verbatim
+ text that contains lots of &lt; and &amp; characters. While you
+ could go through the text ensuring that every &lt; is converted to a
+ &amp;lt; and every &amp; is converted to a &amp;amp;, it can be
+ easier to mark the section as only containing CDATA. When the SGML
+ parser encounters this it will ignore the &lt; and &amp; symbols
+ embedded in the content.</para>
+
+ <!-- The nesting of CDATA within the next example is disgusting -->
+
+ <example>
+ <title>Using a CDATA marked section</title>
+
+ <programlisting>
+&lt;para>Here is an example of how you would include some text
+ that contained many &amp;lt; and &amp;amp; symbols. The sample
+ text is a fragment of HTML. The surrounding text (&lt;para> and
+ &lt;programlisting>) are from DocBook.&lt;/para>
+
+&lt;programlisting>
+ &lt![ CDATA [ <![ CDATA [
+ <p>This is a sample that shows you some of the elements within
+ HTML. Since the angle brackets are used so many times, it's
+ simpler to say the whole example is a CDATA marked section
+ than to use the entity names for the left and right angle
+ brackets throughout.</p>
+
+ <ul>
+ <li>This is a listitem</li>
+ <li>This is a second listitem</li>
+ <li>This is a third listitem</li>
+ </ul>
+
+ <p>This is the end of the example.</p>]]>
+ ]]&gt;
+&lt/programlisting></programlisting>
+
+ <para>If you look at the source for this document you will see this
+ technique used throughout.</para>
+ </example>
+ </sect3>
+
+ <sect3>
+ <title><literal>INCLUDE</literal> and
+ <literal>IGNORE</literal></title>
+
+ <para>If the keyword is <literal>INCLUDE</literal> then the contents
+ of the marked section will be processed. If the keyword is
+ <literal>IGNORE</literal> then the marked section is ignored and
+ will not be processed. It will not appear in the output.</para>
+
+ <example>
+ <title>Using <literal>INCLUDE</literal> and
+ <literal>IGNORE</literal> in marked sections</title>
+
+ <programlisting>
+&lt;![ INCLUDE [
+ This text will be processed and included.
+]]&gt;
+
+&lt;![ IGNORE [
+ This text will not be processed or included.
+]]&gt;</programlisting>
+ </example>
+
+ <para>By itself, this isn't too useful. If you wanted to remove text
+ from your document you could cut it out, or wrap it in
+ comments.</para>
+
+ <para>It becomes more useful when you realise you can use <link
+ linkend="parameter-entities">parameter entities</link> to control
+ this. Remember that parameter entities can only be used in SGML
+ contexts, and the keyword of a marked section
+ <emphasis>is</emphasis> an SGML context.</para>
+
+ <para>For example, suppose that you produced a hard-copy version of
+ some documentation and an electronic version. In the electronic
+ version you wanted to include some extra content that wasn't to
+ appear in the hard-copy.</para>
+
+ <para>Create a parameter entity, and set it's value to
+ <literal>INCLUDE</literal>. Write your document, using marked
+ sections to delimit content that should only appear in the
+ electronic version. In these marked sections use the parameter
+ entity in place of the keyword.</para>
+
+ <para>When you want to produce the hard-copy version of the document,
+ change the parameter entity's value to <literal>IGNORE</literal> and
+ reprocess the document.</para>
+
+ <example>
+ <title>Using a parameter entity to control a marked
+ section</title>
+
+ <programlisting>
+&lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+&lt;!ENTITY % electronic.copy "INCLUDE">
+]]&gt;
+
+...
+
+&lt;![ %electronic.copy [
+ This content should only appear in the electronic
+ version of the document.
+]]&gt;</programlisting>
+
+ <para>When producing the hard-copy version, change the entity's
+ definition to;</para>
+
+ <programlisting>
+&lt!ENTITY % electronic.copy "IGNORE"></programlisting>
+
+ <para>On reprocessing the document, the marked sections that use
+ <literal>%electronic.copy</literal> as their keyword will be
+ ignored.</para>
+ </example>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>For you to do&hellip;</title>
+
+ <procedure>
+ <step>
+ <para>Create a new file, <filename>section.sgml</filename>, that
+ contains the following;</para>
+
+ <programlisting>
+&lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
+&lt;!ENTITY % text.output "INCLUDE">
+]&gt;
+
+&lt;html>
+ &lt;head>
+ &lt;title>An example using marked sections&lt;/title>
+ &lt;/head>
+
+ &lt;body>
+ &lt;p>This paragraph &lt;![ CDATA [contains many &lt;
+ characters (&lt; &lt; &lt; &lt; &lt;) so it is easier
+ to wrap it in a CDATA marked section ]]&gt;&lt/p>
+
+ &lt;![ IGNORE [
+ &lt;p>This paragraph will definitely not be included in the
+ output.&lt;/p>
+ ]]&gt;
+
+ &lt;![ <![ CDATA [%text.output]]> [
+ &lt;p>This paragraph might appear in the output, or it
+ might not.&lt;/p>
+
+ &lt;p>Its appearance is controlled by the <![CDATA[%text.output]]>
+ parameter entity.&lt;/p>
+ ]]&gt;
+ &lt;/body>
+&lt;/html></programlisting>
+ </step>
+
+ <step>
+ <para>Normalise this file using &man.sgmlnorm.1; and examine the
+ output. Notice which paragraphs have appeared, which have
+ disappeared, and what has happened to the content of the CDATA
+ marked section.</para>
+ </step>
+
+ <step>
+ <para>Change the definition of the <literal>text.output</literal>
+ entity from <literal>INCLUDE</literal> to
+ <literal>IGNORE</literal>. Re-normalise the file, and examine the
+ output to see what has changed. </para>
+ </step>
+ </procedure>
+ </sect2>
+ </sect1>
+</chapter>
+
+<!--
+ Local Variables:
+ mode: sgml
+ sgml-declaration: "../chapter.decl"
+ sgml-indent-data: t
+ sgml-omittag: nil
+ sgml-always-quote-attributes: t
+ sgml-parent-document: ("../book.sgml" "part" "chapter")
+ End:
+-->