What is the PDF format good for? Short version

Nothing. Use HTML and/or (compressed) Postscript instead.

What is the PDF format good for? Long version

PDF (portable document format) is a document file format proposed by Adobe. It is a stripped-down version of Postscript, with some incompatible additions. It is somewhat more compact than plain uncompressed Postscript, due to compression, and it offers hypertext facilities. But is it actually useful?

Is it useful for on-screen reading?

No. The fixed formatting of PDF documents means that it never fits well on the screen. The division into pages makes little sense for on-screen reading, where scrolling is much more natural than leafing; and the aspect ratio (which is usually not 4:3) and size of the pages require an irritating combination of scrolling and leafing. In the meantime Adobe have noticed this and their Acroread shows the document as a big scrollable entity, but the page breaks are still disturbing, and other viewers (xpdf, gv) don't do this yet.

The attraction of PDF for authors is that the reader will supposedly see the document formatted in the same way as the author (however, incomplete PDFs often ruin that). This makes some sense for paper, where the size is standardized; but it does not make sense for screens, which come in many sizes and resolutions, from the 10" 640*480 notebook screen to the 21" 1600*1200 CRT, in monochrome or colour. (Actually, if we consider people with bad eyes, fixed formatting is not even a good idea for paper; it's just an artifact of mass reproduction techniques like the printing press.)

Authors should forget about complete control in such an environment, unless they want to inconvenience their readers. It is much better to describe the document structure, and leave the formatting to the browser, guided by the reader's preferences. That's the idea behind standard HTML. Try to get by with it. (Note that many Netscape "extensions" to HTML are contrary to this spirit, and indeed, documents employing them often look badly when displayed with a different window size, font, or colour depth than they were designed for; not to mention how they look on non-Netscape browsers).

Is it useful for on-paper reading?

In this respect PDF offers no advantages (that I know of) over Postscript except a more compact representation when not compressed. However, this reverses itself when you use decent compression programs:

File size comparison

This subsection is as long as it is, because many people have made claims about it, so I made some measurements; it is not that long because I think it's particularly important.

As you can see below, gzipped Postscript is almost twice as compact as gzipped PDF and more than twice as compact as plain PDF.

				   Size			File&Format
-rw-r--r--   1 anton    vip      1041321 Nov  9 11:45 intel-opt32-ap526.ps
-rw-r--r--   1 anton    vip       411050 Nov  8 18:30 intel-opt32-ap526.pdf
-rw-r--r--   1 anton    vip       329586 Nov  8 18:30 intel-opt32-ap526.pdf.gz
-rw-r--r--   1 anton    vip       181135 Nov  9 11:45 intel-opt32-ap526.ps.gz
In this example, the Postscript was generated from the PDF file, which contained no links. The conversion to Postscript was performed with xpdf 0.7. The original file is the Intel Architecture Optimizations Manual; you can get it here (but only a newer version, I used 24281601). With xpdf 0.7a (encryption) the Postscript output and it's compressed form is more bloated, but the ps.gz is still much smaller than the .pdf:
-rw-r--r--   1 anton    vip      1353166 Jun 10 13:55 xxx.ps
-rw-r--r--   1 anton    vip       254710 Jun 10 13:55 xxx.ps.gz
Another example is the files available here. These were converted from Postscript into PDF with ghostscript 5.01. The sums of the file sizes are:
.doc (Winword):			 2.885.849
.ps (with Winword from .doc):	 4.959.680
.ps.gz:				 1.346.021
.pdf (with gs-5.01 from .ps):	 1.704.794
.pdf.gz:			 1.441.079
Interestingly, here converting the PDF back into Postscript with xpdf 0.7a (encryption) results in .ps.gz files that are larger than the .pdf files (unfortunately, xpdf 0.7 does not work on these files):
.ps (with xpdf 0.7a from .pdf):	 9726203
.ps.gz:				 2186707
The following shows more examples (all converted from PDF to Postscript using xpdf 0.7a (encryption), compressed with gzip -9); I did not select these files for compressability (except manual.ps, which I downloaded because its author claimed that it is an example where PDF is more compact than .ps.gz), these are just all the PDF files that happened to be in my download directory (plus intel-opt32-ap526.pdf).
   .PDF	.pdf.gz	     .ps  .ps.gz .ps.gz/.pdf
 243943  190754   656255  123315 	51%  PRG.pdf
 846435  455866  1037500  280930 	33%  ThinkingInPostScript.pdf
5996975 5894534 12689450 5443575 	91%  allconnr.pdf
 306033  262771  1436313  270212 	88%  manual.pdf
1986820 1605659  5184809  944873 	48%  pem32b.pdf
1398720	 547655	 3108610  471592	34%  k6-2-optimization.pdf	
Another example is one of my own papers, where the original ps.gz file takes 63484 bytes (298500 bytes uncompressed), whereas the PDF (converted by Peter Knaggs) takes 169520 bytes, a factor 2.67.

But I want to have a single source for on-screen and on-paper documents.

Then the answer is to write the document in a form that can be converted to formats suitable for on-screen and on-paper reading. An example is the texinfo format, which can be converted to the on-screen formats info and HTML, and to Postscript and other printer formats supported by TeX. Another example is LaTeX, which can be printed and/or converted to HTML by tth or latex2html. A third example is the Linuxdoc-SGML format, which can be converted to HTML, plain text, LaTeX and other formats. And if that's too involved for you, HTML also gives decent printouts for not-too-long documents.

A good example of how to do it is the Supercomuting'96 proceedings CD. It contains all the papers in HTML and Postscript forms. (The only negative point I have noticed is that the table-of-contents does not work with my browsers).

Other PDF Disadvantages

PDF readers and converters are not as widely available as HTML browsers or Postscript previewers or printers. And even if they are available, they often leave something to be desired:

For those of you who happen to have a PDF reader but not a Postscript reader, look at the Ghostscript Home Page for relief.


Maybe there are applications where PDF is really useful, but I have yet to see one. Before you publish a PDF file, please ponder the points discussed above and consider other formats like HTML and gzipped Postscript.

Related Stuff

Anton Ertl