Converting LaTeX to HTML

Tags: software

Published on
« Previous post: Fare thee well, NanoBlogger! — Next post: IPv6 with FreeBSD and m0n0wall »

Since I write almost everything in LaTeX these days, be it personal stuff (letters, essays, documentation) or academcial things (papers, reports, my thesis), I was interested how to publish them in other formats. While PDF and Postscript files are great for storage and printing, nothing beats HTML in its simplicity and ubiquity.

So far I tried two different programs, latex2html and tth. They follow the same premise (converting a LaTeX document into a series of HTML files) but differ in their approach.

latex2html

latex2html is a rather old tool. Apparently, it is not updated anymore. Consequently, its output is rather peculiar and it supports HTML 4.0 only. No XHTML and definitely no strict variant. Here is a loose list of my experiences while trying to convert several LaTeX documents:

  • I ran into some problems concerning German umlauts (which I specified as "a, for example). latex2html expects these to be specified as \"{a}.
  • If latex2html encounters an unknown environment, it falls back to the LaTeX interpreter and generates a picture instead. This is also done when mathematical formulas are involved.
  • There is no possibility (to my knowledge) for generating the body of the document only. This makes inclusion of LaTeX content for an existing website harder.
  • Theming the generated files is possible, albeit cumbersome.

All in all, latex2html was not enough for my purposes. So the search continued and I eventually arrived at tth.

tth

tth has a very novel approach: Instead of generating images for mathematical formulas, tth tries to generate HTML code that mostly resembles a formula.

All in all, this works quite well. Even with complex documents. Here are my notes:

  • By default, a single HTML page is generated. This is done quite fast, even for a larger document.
  • tth is very resilient concerning unknown commands. It tries to parse the whole document and simply ignores erroneous sections.
  • The layout is represented very well: Tables, sections, it is all there.
  • Short formulas are easily readable. For longer formulas, I find the output of tth tedious to read.
  • tth is very tunable: There is even an option for generating the body of the document only.

Conclusion

All in all, tth proved to be sufficient for my purposes. Yet, there seems to be a lack of publishing software for LaTeX sources. This is a pity, as publishing documents in several formats at once without many adjustments would be interesting. Another possibility would be to allow rendering of PDF files inside the browser (the currently available plugins are rather disappointing, in my opinion), although I do not like this option as it makes a browser even more bloated. Furthermore, in comparison to PDF, HTML offers still some advantages in readability (especially for disabled readers).