CheckHtmlEsis 1.0.2

CheckHtmlEsis is a Java program that checks the output from James Clark’s nsgmls program and checks to see if the attributes comply with the HTML 4.0 specifications. Additionally it can check to see if http URI’s referenced by the document exist.

How to get CheckHtmlEsis

CheckHtmlEsis 1.0.2 is available to download in zip file format.

How do I use CheckHtmlEsis

System Requirements

To use CheckHtmlEsis, you need to have JDK 1.1 and nsgmls installed and running on your system. CheckHtmlEsis may work with JDK 1.0 but I have not tested it. CheckHtmlEsis is run from the command line.

Note to Mac users: I have no idea how this would run on a Mac. Feedback is welcome.

Running CheckHtmlEsis

CheckHtmlEsis takes the output from nsgmls from Standard Input. I usually run it with a command line such as:

nsgmls -l \sgml\decl\HTML4.decl index.html | java -classpath e:\java11\lib\classes.zip;f:\CheckHtmlEsis\classes.zip oconnor.russell.html.CheckHtmlEsis -l -b http://www.undergrad.math.uwaterloo.ca/%7Eroconnor/

Error messages will be written to Standard Error. If the -t option is given, Text will be written to Standard Out.

How you run CheckHtmlEsis will vary based on where you have your HTML declaration, where you have Java installed, where you install CheckHtmlEsis, and what OS you are running. Consult your local documentation for nsgmls, Java, and your OS for more information. You may find it useful to wrap this program in a batch file or shell script.

nsgmls should be run with the -l option so that CheckHtmlEsis can get access to the line numbers of the original document. This will help you locate your errors in your document.

With the -t option CheckHtmlEsis will output the text in the document. This can be used with programs like ispell. For example:

nsgmls -l index.html | java -classpath /usr/local/java/lib/classes.zip:/u/roconnor/classes.zip oconnor.russell.html.CheckHtmlEsis -t | ispell -l | sort -u > index.html.sperr

Options

CheckHtmlEsis take a few command line options.

-l
Turn on link checking. With this option CheckHtmlEsis will make HTTP connections to see if linked resources are actually available, or have moved.
-b <base URI>
Resolves relative links to using <base URI> as a base. This is only important if you use the -l option.
-w
Suppress warnings. Only Error messages will be written to Standard Error, no warnings will be written.
-t
Write all text to Standard Out. Any element character data will be written to standard out. Element data for the SCRIPT and STYLE elements won't be written. I do not consider that data to be text. The data from attributes with text will also be written.
-h or -?
Displays help.

BUGS

When CheckHtmlEsis dumps text to Standard Out, element data is be split up at the ends of the elements. For example M<SUP>lle</SUP> will be output as:

M
lle

Flags “news” URIs as errors. Probably flags other less common URIs as errors as well.

The “ARCHIVE” attribute of the “OBJECT” should be a space separated list of URIs. CheckHtmlEsis will incorrectly flag an error if it contains more than one URI.

The source code is a bit messy, and has very little documentation.

CheckHtmlEsis uses data resources to check attribute types like %LanguageCode. These data files can be recompiled by the user if e wants to update new additions. This process needs to be documented. Ambitious people can look through the code. Look at the ``Main'' routines for CheckCharsetAttribute, CheckContentTypeAttribute, CheckLanguageCodeAttribute, CheckLinkTypesAttribute, and CheckMediaDescAttribute classes.

Future Development

I want to check ``Context Sensitive'' errors in the document. Such errors as:

<OL>
<LI TYPE=disc>List Item
</OL>
and
<P>
<INS><DIV>...block-level content...</DIV></INS>
</P>

Legal Stuff


Russell O’Connor: contact me