In his article in this issue, Peter-Paul Koch proposes adding custom attributes to form elements to allow triggers for specialized behaviors. The W3C validator won’t validate a document with these attributes, as they aren’t part of the XHTML specification.
This article will show you how to create a custom DTD that will add those custom attributes, and will show you how to validate documents that use those new attributes. Here is a sample of the HTML with the custom attributes that let us specify the maximum length of a text area and whether a form element is required or not:
<form>
<p>
Name:
<input type="text" name="yourName" size="40" />
</p>
<p>
Email:
<input type="text" name="email" size="40"
<span class="highlight">required="true" />
</p>
<p>
Comments:
<textarea <span class="highlight">maxlength="300" required="false"
rows="7" cols="50"></textarea>
</p>
<p>
<input type="submit" value="Send Data" />
</p>
</form>
What’s a DTD?
A Document Type Definition (DTD) is a file that
specifies which elements and attributes exist in a markup language and
where they can appear. Thus, the XHTML DTD specifies that
<p> is a valid element, and that it can appear
inside a <div>, but not inside a <b>.
The URL at the end of your DOCTYPE declaration points
to a place where you will find the DTD for the flavor of HTML you’re
using. Neither your browser nor the W3C Validator goes out to the web to find
the DTD — they have a “wired-in” list of the valid
DOCTYPEs and use the URL for identification purposes only. As you will see
later, this will change when you make a custom DTD.
Specifying the attributes
Adding attributes to an existing DTD is easy. For each attribute, you need to specify which element it goes with, what the attribute name is, what type of values it may have, and whether the attribute is optional or not. This information is specified in this model:
<!ATTLIST
elementName attributeName type optionalStatus
>
To add the maxlength attribute to the
<textarea> element, you write this:
<!ATTLIST textarea maxlength CDATA #IMPLIED>
The CDATA specification means that the attribute value
can contain any old character data you please; thus
maxlength=“300” or maxlength=“ten” will both
be valid. For “open-ended” data, DTDs don’t let you
get more specific. The #IMPLIED specification means that
the attribute is optional. A required attribute would specify
#REQUIRED.
When you have a list of possible values for an attribute, you may specify
them in the DTD. This is the case with the attribute named
required,
which has the values true and false. The values
are case sensitive; in this example only the lowercase values are specified, so
a value of TRUE would not be considered valid.
<!ATTLIST textarea required (true|false) #IMPLIED>
Confusion alert! This attribute is named “required,”
but you don’t have to put it on every <textarea>
element, so it’s an optional attribute.
The attribute named required should also be available to the
<input> and <select> elements. All
in all, the specifications to modify the DTD look like this:
<!ATTLIST textarea maxlength CDATA #IMPLIED>
<!ATTLIST textarea required (true|false) #IMPLIED>
<!ATTLIST input required (true|false) #IMPLIED>
<!ATTLIST select required (true|false) #IMPLIED>
Note: Adding new attributes to existing elements is easy; adding new elements is somewhat more difficult and beyond the scope of this article.
Placing the attributes
Now that you’ve defined the custom attributes, how do you place them where a validator can find them? The very best place to put them would be as the internal subset directly in your document:
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[
<!ATTLIST textarea maxlength CDATA #IMPLIED>
<!ATTLIST textarea required (true|false) #IMPLIED>
<!ATTLIST input required (true|false) #IMPLIED>
<!ATTLIST select required (true|false) #IMPLIED>
]>
If you run such a file through the W3C
validator, you find that it validates wonderfully well.
If you download the sample files for this article and validate
file internal.html, you can see this for yourself.
Unfortunately,
when you display the file in a browser, the ]>
shows up on the screen. There’s no way around this bug, so this
approach is right out.
Modifying the DTD
An approach that does workrequires you to obtain the
XHTML transitional DTD and add your modifications to that file.
The original version of the DTD is file
xhtml1-transitional.dtd in directory dtd
from this article’s sample files. You will also find
three files with the .ent extension in that
directory. These three files
define all the entities that you use in HTML,
such as ’ and ñ. You
need to keep all these files together in the same directory.
The customized file, named xhtml1-custom.dtd was
created by opening file xhtml1-transitional.dtd and
adding the new attribute specifications at the end of the file. When
adding attributes, you
want to add your customizations at the end of the DTD to
ensure that everything they need to reference
has already been defined.
Changing the DOCTYPE
You must now change the <!DOCTYPE> in your HTML
file to indicate that you are now using this custom “flavor”
of XHTML.
Since the custom DTD isn’t one of the publicly registered ones,
the DOCTYPE will not use the PUBLIC specifier. Instead,
you use the keyword SYSTEM followed by the location of the
custom DTD. This may be a relative or absolute path name, or, if your
DTD is on a server, a URL. The path must point to where your
custom DTD really is!
File custom.html in the sample files for this article
uses a relative path name:
<!DOCTYPE html SYSTEM
"dtd/xhtml1-custom.dtd">
When you try to use the W3C validator on
custom.html, it rejects
the document because you aren’t using one of the validator’s
approved DTDs.
Using a different validator
The solution is to use a different validator which will actually go
out to the URL that you have specified and use it to check whether your
document is valid or not.
Because the document you’re validating is XHTML,
you can use any XML parser that
does validation. This article will uses the Xerces parser,
available from
xml.apache.org. This parser is written in
Java™, so you will need to have Java installed on your system.
When you unzip the Xerces download file, it will create a directory named
xerces-2_6_2 (or whatever version is current). In the
following text, the assumption is that you have unzipped it to the top
level of the
C: drive on Windows or to /usr/local on Linux.
One of the sample files that comes
with Xerces is the Counter program. This program
counts the number of elements,
attributes, ignorable whitespaces, and characters appearing in
an XML (or, in this case, XHTML) document. This program has an option
to turn on validation as it parses the document, making it perfect for
the task at hand.
You run the Counter program (which is going to be your
“validator”) from
a batch file for Windows or a shell script for Linux.
Here is the
batch file, named
validate.bat.
It is all on one line, but shown here split across lines to
fit on the page. Please note: there is a blank before the word
dom and after the -v.
java -cp c:xerces-2_6_2xercesImpl.jar; »
c:xerces-2_6_2xmlParserAPIs.jar; »
c:xerces-2_6_2xercesSamples.jar dom/Counter -v »
%1 %2 %3 %4 %5 %6 %7 %8
Here is the Linux shell script, named validate.sh.
java -cp /usr/local/xerces-2_6_2/xercesImpl.jar:\
/usr/local/xerces-2_6_2/xmlParserAPIs.jar:\
/usr/local/xerces-2_6_2/xercesSamples.jar \
dom/Counter -v $1 $2 $3 $4 $5 $6 $7 $8
Of course, if you have unzipped Xerces to a different location, you
will have to change the path names.
Once this is all set up, you can validate the file
custom.html by typing
this on a Windows command line:
validate custom.html
Or this at a Linux shell prompt:
./validate.sh custom.html
If your file is valid, you will receive a message giving the filename and some statistics about the file, like this:
custom.html: 543;50;0 ms
(15 elems, 20 attrs, 9 spaces, 43 chars)
If the file isn’t valid, you will get error messages as well.
For example, if you try to validate a file named badfile.html
which contains these errors:
<p>Email: <input type="text" name="email" size="40"
required="<span class="highlight">yes" /></p>
<p>Comments:
<textarea maxlength="300" <span class="highlight">inquirer="false"
rows="7" cols="50"></textarea>
You’ll get this output from the validator:
[Error] badfile.html:12:70: Attribute "required"
with value "yes" must have a value from the
list "true false ".
[Error] badfile.html:14:63: Attribute "inquirer"
must be declared for element type "textarea"
badfile.html:
611;82;0 ms (15 elems, 20 attrs, 9 spaces, 43 chars)
Another validation method
If you are using the
jEdit editor,
you may download the XML plugin. If you name your file with the
extension .xhtml, jEdit will validate using your custom
DTD as specified in the DOCTYPE.
Conclusion
It is easy to specify additional attributes for XHTML elements; with a little bit of work, you can set up a validator to check your files against your custom version of HTML. Download all the sample files from this article and give it a whirl.

25 Reader Comments
Load Comments