Introduction to mathinhtml

What it does

The perl script "mathinhtml" allows anyone to easily include mathematical content in web pages. For example, it will convert the string

 The graph of <EQ>y = {x^3 \over x-4}</EQ> looks like <CENTER><GR>plot [-18:13][-200:300] x**3/(x-4) </GR></CENTER> 

into the output:

The graph of looks like

More examples can be found in the examples of mathinhtml output.

How it all started

In Fall 2002, I wanted to provide my calculus students with web pages of homework problems. Faced with the daunting task of assembling (and then keeping track of) collections of dozens of images for each page, I looked about for some way to automate the process. I wanted to be able to write a web page, pretending that web browsers could interpret standard mathematical notation, and then have it translated into a web page that could really be interpreted by all browsers. Finding nothing that suited my needs, I decided to write such a translator myself.

How it works

Mathinhtml works by scanning a "mathinhtml source file" for special elements, and replacing those elements by text or a link to an image file. The current default elements and their actions include

 . . . The contents are run through TeX in "display style", and the resulting dvi file is converted to an image file. The element is replaced by a link to this image file. . . . The contents are run through gnuplot with some added options for graphing, then the resulting postscript file is converted to an image file. The element is replaced by a link to this image file. . . . The contents are run through the perl script gpdiagram, then the resulting postscript file is converted to an image file. The element is replaced by a link to this image file.
In addition, there are tags <TEX> . . . </TEX> and <GNUPLOT> . . . </GNUPLOT> that feed their contents directly to their respective programs.

The script is designed for easy extensibility, so you can add additional elements as you see fit. Mathinhtml can drive any program that takes a text file as input and produces text or a postscript file as output.

Alternatives to mathinhtml

There are a number of different approaches to putting mathematical objects on web pages.

If you prefer writing web pages in a "what you see is what you get" environment, then try OpenOffice. If you prefer writing in a markup language, but would rather write in LaTeX than in html, then see Steve Mayer's TeX Converter page. If you prefer to write in html, then mathinhtml will suit your needs. If you don't need to include graphs on your pages, then have a look at GladTeX also.

Installation and use

Requirements

This script has been tested only for Linux and Windows 98, but I'd like it to be as portable as possible. To that end, I intentionally wrote it to use software that is free (as in beer and speech) and available for a wide range of operating systems.

Mathinhtml requires perl ( http://www.perl.com/pub/a/language/info/software.html) and Ghostscript ( http://www.gnu.org/software/ghostscript/ghostscript.html) for its basic operations. To use the EQ element, you will need TeX ( http://www.ams.org/tex/public-domain-tex.html). To use the GR element, you will need gnuplot ( ftp://ftp.gnuplot.info/pub/gnuplot - see the README file for what it all means).

Installation

• If necessary, alter the beginning of the script to correspond with the program names on your system. Windows users will need to rename the file to mathinhtml.bat.
• Check that it all works by downloading test.mih into a temporary folder, running the command mathinhtml test.mih test.html and looking at test.html with your favorite browser.

Use

Mathinhtml is a "web page preprocessor". What this means is that you write a "mathinhtml source file" for your page, and mathinhtml translates this source file to the actual html file that you put on the web. Any editorial changes must be made to the source file, which must then be retranslated.

The mathinhtml source file contains everything (text, html markup, and "ordinary" images) that you want in your html page, except for the mathematical images (e.g. equations and graphs). In the places that these math images are to appear, the source file contains special elements that describe to mathinhtml what the math images should look like. For example, the element <EQ>y=x^2</EQ> tells mathinhtml that you want an image of the equation to appear at that point.

To get a feel for this, use your favorite text editor to make some changes to the test.mih file (that you downloaded to check your installation), then run "mathinhtml test.mih test.html", and look at test.html with your favorite browser to see the effect of those changes. Note in particular that (for efficiency) mathinhtml retranslates only those images that you have altered.

Mathinhtml is copyright (C) 2003 Michael J Miller

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Stability and feedback

I am releasing this software under the "release early and often" mantra. Among other things, this means that this is currently beta software, that its specifications are not yet completed, and that it is likely to have bugs. While I will try to maintain backward compatibility, future versions may drift in ways as yet unpredictable.

I'd very much appreciate bug reports (and suggested solutions if you have them), requests for features, and notes of appreciation. While the script suits my needs, I am sure that there are many ways it can be improved, and your feedback will help that process greatly. I can be reached at millermj.mail.lemoyne.edu (replace the first dot with an @).

Security

The mathinhtml script should NOT be used in an insecure environment (for example, as part of an interactive web page where untrusted users can type in input to be translated by mathinhtml.) The reason is that pathological input may allow a malicious user to compromise your machine.

Technical details

Specifications

A "mathinhtml source file" is an html file which includes "special elements". The specifications for special elements are below; everything else is copied unchanged to the output file (and so must be well-formed html.)

To define terms, label the parts of the special element <EQ ALIGN="TOP">y=x^2</EQ> as follows.

 start tag EQ tag name ALIGN="TOP" attribute ALIGN attribute name TOP attribute value y=x^2 contents end tag

For special elements, mathinhtml requires the following:

• All tags must be balanced (every start tag must have a corresponding end tag.)
• Tag names may not contain whitespace.
• All attribute values must be in quotes (either single-quotes or double-quotes), and the value itself may not contain the kind of quotes that deliminate it.
• Contents and attribute values may contain any characters (including <, > and &) except the strings <P>, <BR>, <LI> and <TD> and the end tag of the element.

What mathinhtml really does

Mathinhtml reads your source file in chunks (parts ending with <P>, <BR>, <LI> and <TD>). It then scans each chunk for special elements, extracting the tag name, attributes and contents of each. As directed by the rules for that element (located at the beginning of the script), mathinhtml formats the contents according to the "input" rule(s), writes the formatted text to the file tempfile.n, then asks your operating system to run the programs specified in the "action" rule. If this creates a file named tempfile.ps, then the script converts this to an image file and replaces the special element by a link to that image file; otherwise, the script replaces the special element by the text output of your action rule.

Html attributes (e.g. ALIGN="TOP") may be included in the element rules (e.g. as attrib => 'ALIGN="TOP"') or within an individual element's start tag (e.g. as <EQ ALIGN="TOP"> y=x^2 </EQ>). Attributes set for individual elements override those set by the rules, which override those set by the mathinhtml script itself.

If you frequently find yourself typing the same thing, you may want to define a new element. This can be done by cloning an existing element and then altering the "input" line accordingly. For an example of how to do this, look at the mathinhtml script to see how the rules for EQ are derived from the rules for TEX.

If you want to do something that cannot be easily accomplished from TeX or gnuplot, you may want to add a new program to mathinhtml's repertoire. To do this, you will need to find a program that can produce either a text or a postscript version of what you want in your html file. Create the rules for your new element in accordance with "What mathinhtml really does" (above).

Improvements and workarounds

To do - new features

• Allow a "preprocess" rule for each element. This rule would be perl code which would be applies to the contents of the element before they were written to tempfile.
• Improve print quality.
• Decide how to present TeX errors.
• Compress images.

To do - fix bugs

• Get image baselines better aligned with text baselines.

Workarounds

• Ghostscript antialiasing bug
• Symptom: "Unrecoverable error: range check in .putdeviceprops"
• Cause: Old versions of ghostscript don't allow antialiasing in png files.
• Workaround: Edit the mathinhtml script and set -dTextAlphaBits=1 and -dGraphicsAlphaBits=1 in the subroutine ps2png.