Converting LaTeX to HTML
Tags: software« Previous post: Fare thee well, NanoBlogger! — Next post: IPv6 with FreeBSD and m0n0wall »
Since I write almost everything in LaTeX these days, be it personal stuff (letters, essays, documentation) or academcial things (papers, reports, my thesis), I was interested how to publish them in other formats. While PDF and Postscript files are great for storage and printing, nothing beats HTML in its simplicity and ubiquity.
So far I tried two different programs,
follow the same premise (converting a LaTeX document into a series of
HTML files) but differ in their approach.
latex2html is a rather old tool.
Apparently, it is not updated anymore. Consequently, its output is
rather peculiar and it supports HTML 4.0 only. No XHTML and definitely
strict variant. Here is a loose list of my experiences while trying
to convert several LaTeX documents:
- I ran into some problems concerning German umlauts (which I specified as
"a, for example).
latex2htmlexpects these to be specified as
latex2htmlencounters an unknown environment, it falls back to the LaTeX interpreter and generates a picture instead. This is also done when mathematical formulas are involved.
- There is no possibility (to my knowledge) for generating the body of the document only. This makes inclusion of LaTeX content for an existing website harder.
- Theming the generated files is possible, albeit cumbersome.
All in all,
latex2html was not enough for my purposes. So the search
continued and I eventually arrived at
tth has a very novel approach: Instead of generating
images for mathematical formulas,
tth tries to generate HTML code that mostly resembles a
- By default, a single HTML page is generated. This is done quite fast, even for a larger document.
tthis very resilient concerning unknown commands. It tries to parse the whole document and simply ignores erroneous sections.
- The layout is represented very well: Tables, sections, it is all there.
- Short formulas are easily readable. For longer formulas, I find the output of
tthtedious to read.
tthis very tunable: There is even an option for generating the body of the document only.
All in all,
tth proved to be sufficient for my purposes. Yet, there seems to be a lack of
publishing software for LaTeX sources. This is a pity, as publishing documents in several formats at
once without many adjustments would be interesting. Another possibility would be to allow
rendering of PDF files inside the browser (the currently available plugins are rather disappointing,
in my opinion), although I do not like this option as it makes a browser even more bloated.
Furthermore, in comparison to PDF, HTML offers still some advantages in readability (especially for