A List Apart

Menu
Issue № 194

Validating a Custom DTD

by Published in HTML · 25 Comments

In his article in this issue, Peter-Paul Koch proposes adding custom attributes to form elements to allow triggers for specialized behaviors. The W3C validator won’t validate a document with these attributes, as they aren’t part of the XHTML specification.

Article Continues Below

This article will show you how to create a custom DTD that will add those custom attributes, and will show you how to validate documents that use those new attributes. Here is a sample of the HTML with the custom attributes that let us specify the maximum length of a text area and whether a form element is required or not:

<form>
<p>
  Name:
  <input type="text" name="yourName" size="40" />
</p>
<p>
  Email:
  <input type="text" name="email" size="40"
  <span class="highlight">required="true" />
</p>
<p>
  Comments:
<textarea <span class="highlight">maxlength="300" required="false" rows="7" cols="50"></textarea> </p> <p> <input type="submit" value="Send Data" /> </p> </form>

What’s a DTD?

A Document Type Definition (DTD) is a file that specifies which elements and attributes exist in a markup language and where they can appear. Thus, the XHTML DTD specifies that <p> is a valid element, and that it can appear inside a <div>, but not inside a <b>. The URL at the end of your DOCTYPE declaration points to a place where you will find the DTD for the flavor of HTML you’re using. Neither your browser nor the W3C Validator goes out to the web to find the DTD — they have a “wired-in” list of the valid DOCTYPEs and use the URL for identification purposes only. As you will see later, this will change when you make a custom DTD.

Specifying the attributes

Adding attributes to an existing DTD is easy. For each attribute, you need to specify which element it goes with, what the attribute name is, what type of values it may have, and whether the attribute is optional or not.  This information is specified in this model:

<!ATTLIST
  elementName attributeName type optionalStatus
>

To add the maxlength attribute to the <textarea> element, you write this:

<!ATTLIST textarea maxlength CDATA #IMPLIED>

The CDATA specification means that the attribute value can contain any old character data you please; thus maxlength=“300” or maxlength=“ten” will both be valid. For “open-ended” data, DTDs don’t let you get more specific.  The #IMPLIED specification means that the attribute is optional.  A required attribute would specify #REQUIRED.

When you have a list of possible values for an attribute, you may specify them in the DTD.  This is the case with the attribute named required, which has the values true and false. The values are case sensitive; in this example only the lowercase values are specified, so a value of TRUE would not be considered valid.

<!ATTLIST textarea required (true|false) #IMPLIED>

Confusion alert! This attribute is named “required,” but you don’t have to put it on every <textarea> element, so it’s an optional attribute.

The attribute named required should also be available to the <input> and <select> elements. All in all, the specifications to modify the DTD look like this:

<!ATTLIST textarea maxlength CDATA #IMPLIED>
<!ATTLIST textarea required (true|false) #IMPLIED>
<!ATTLIST input required (true|false) #IMPLIED>
<!ATTLIST select required (true|false) #IMPLIED>

Note: Adding new attributes to existing elements is easy; adding new elements is somewhat more difficult and beyond the scope of this article.

Placing the attributes

Now that you’ve defined the custom attributes, how do you place them where a validator can find them?  The very best place to put them would be as the internal subset directly in your document:

<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[
  <!ATTLIST textarea maxlength CDATA #IMPLIED>
  <!ATTLIST textarea required (true|false) #IMPLIED>
  <!ATTLIST input required (true|false) #IMPLIED>
  <!ATTLIST select required (true|false) #IMPLIED>
]>

If you run such a file through the W3C validator, you find that it validates wonderfully well. If you download the sample files for this article and validate file internal.html, you can see this for yourself. Unfortunately, when you display the file in a browser, the ]> shows up on the screen.  There’s no way around this bug, so this approach is right out.

Modifying the DTD

An approach that does workrequires you to obtain the XHTML transitional DTD and add your modifications to that file. The original version of the DTD is file xhtml1-transitional.dtd in directory dtd from this article’s sample files.  You will also find three files with the .ent extension in that directory. These three files define all the entities that you use in HTML, such as and ñ. You need to keep all these files together in the same directory.

The customized file, named xhtml1-custom.dtd was created by opening file xhtml1-transitional.dtd and adding the new attribute specifications at the end of the file. When adding attributes, you want to add your customizations at the end of the DTD to ensure that everything they need to reference has already been defined.

Changing the DOCTYPE

You must now change the <!DOCTYPE> in your HTML file to indicate that you are now using this custom “flavor” of XHTML. Since the custom DTD isn’t one of the publicly registered ones, the DOCTYPE will not use the PUBLIC specifier. Instead, you use the keyword SYSTEM followed by the location of the custom DTD. This may be a relative or absolute path name, or, if your DTD is on a server, a URL.  The path must point to where your custom DTD really is! File custom.html in the sample files for this article uses a relative path name:

<!DOCTYPE html SYSTEM
   "dtd/xhtml1-custom.dtd">

When you try to use the W3C validator on custom.html, it rejects the document because you aren’t using one of the validator’s approved DTDs.

Using a different validator

The solution is to use a different validator which will actually go out to the URL that you have specified and use it to check whether your document is valid or not. Because the document you’re validating is XHTML, you can use any XML parser that does validation. This article will uses the Xerces parser, available from xml.apache.org.  This parser is written in Java™, so you will need to have Java installed on your system. When you unzip the Xerces download file, it will create a directory named xerces-2_6_2 (or whatever version is current).  In the following text, the assumption is that you have unzipped it to the top level of the C: drive on Windows or to /usr/local on Linux.

One of the sample files that comes with Xerces is the Counter program. This program counts the number of elements, attributes, ignorable whitespaces, and characters appearing in an XML (or, in this case, XHTML) document. This program has an option to turn on validation as it parses the document, making it perfect for the task at hand. You run the Counter program (which is going to be your “validator”) from a batch file for Windows or a shell script for Linux. Here is the batch file, named validate.bat. It is all on one line, but shown here split across lines to fit on the page. Please note: there is a blank before the word dom and after the -v.

java -cp c:xerces-2_6_2xercesImpl.jar; »
c:xerces-2_6_2xmlParserAPIs.jar; »
c:xerces-2_6_2xercesSamples.jar dom/Counter -v »
%1 %2 %3 %4 %5 %6 %7 %8

Here is the Linux shell script, named validate.sh.

java -cp /usr/local/xerces-2_6_2/xercesImpl.jar:\
/usr/local/xerces-2_6_2/xmlParserAPIs.jar:\
/usr/local/xerces-2_6_2/xercesSamples.jar \
dom/Counter -v $1 $2 $3 $4 $5 $6 $7 $8

Of course, if you have unzipped Xerces to a different location, you will have to change the path names. Once this is all set up, you can validate the file custom.html by typing this on a Windows command line:

validate custom.html

Or this at a Linux shell prompt:

./validate.sh custom.html

If your file is valid, you will receive a message giving the filename and some statistics about the file, like this:

custom.html: 543;50;0 ms
  (15 elems, 20 attrs, 9 spaces, 43 chars)

If the file isn’t valid, you will get error messages as well. For example, if you try to validate a file named badfile.html which contains these errors:

<p>Email: <input type="text" name="email" size="40"
 required="<span class="highlight">yes" /></p>
<p>Comments:
<textarea maxlength="300" <span class="highlight">inquirer="false" rows="7" cols="50"></textarea>

You’ll get this output from the validator:

[Error] badfile.html:12:70: Attribute "required"
  with value "yes" must have a value from the
  list "true false ".
[Error] badfile.html:14:63: Attribute "inquirer"
  must be declared for element type "textarea"
badfile.html:
  611;82;0 ms (15 elems, 20 attrs, 9 spaces, 43 chars)

Another validation method

If you are using the jEdit editor, you may download the XML plugin. If you name your file with the extension .xhtml, jEdit will validate using your custom DTD as specified in the DOCTYPE.

Conclusion

It is easy to specify additional attributes for XHTML elements; with a little bit of work, you can set up a validator to check your files against your custom version of HTML.  Download all the sample files from this article and give it a whirl.

25 Reader Comments

Load Comments