Quick Fixes

Free Software, Writing and other Stuff

Linux InsideSoftwareTutorials

Script-Fu: Converting PDFs to (pretty) Previews with ImageMagick

Say you publish articles, or books, or pamphlets. Say a lot of your work is tied up in hefty PDFs that don’t look good on the web. You can still showcase your work in an appealing way on your website by using ImageMagick’s seemingly infinite box of tricks.

If you write books or for magazines, or your artwork is featured in a publication, you will often find you are not at liberty to distribute your own work, since the distribution rights will belong to your publisher. However, you can still showcase your work in the shape of a preview. By using ImageMagick you can convert a PDF of your writing or art into an appealing image and display it on your website.

This method is also a good way to present your work using a really light file. I mean, check it out: the original PDF I use in this example weighs over 11 MBs, but the generated JPG you can see under these lines weighs less than 2, and everybody likes fast loading websites, right?

This is what we want to do: Automatically transform a PDF into an eye-catching, ultralight graphic to showcase your work.
This is what we want to do: Automatically transform a PDF into an eye-catching, ultralight graphic to showcase your work.

Pretty Pages

To get started, let’s take one of my own articles, a six page PDF, and feed it to ImageMagick’s convert and see what happens:

convert your.pdf yourimage.png

Turns out convert is clever enough to break the PDF into separate images, one image per page. So, the instruction above will give you…

yourimage-0.png
yourimage-1.png
yourimage-2.png
yourimage-3.png
yourimage-4.png
yourimage-5.png

That is, as many images as pages your original document contains.

This is a good first step in the right direction, but look how the first page of the original PDF and the first image generated by convert compare:

Converting from PDF to images is not so straightforward. Look at all those pixels!
Converting from PDF to images is not so straightforward. Look at all those pixels!

You’ll notice two things: (1) the resolution is shot to hell in the image (on the right — the PDF is on the left) and (2) the colours are all wrong.

The former you can solve with the -density option:

convert -density 300 your.pdf yourimage.png

ImageMagick takes 72 dpi as the default density for vector-based images (such as PDFs) when it converts them to bitmaps. This was okay back in the day when screens where 1024×768 or an image on the web could make your 56K modem choke, burst into flames, and die, but nowadays it’s way to low and results in chunky images.

By including the -density option as shown above (i.e., applying it to the PDF you want to convert), you tell convert to take your.pdf as having 300 dpi, which is what is passed on to the PNG images. You can see the big difference this makes below.

Increasing the pixel density gives you a much better quality bitmap image.
Increasing the pixel density gives you a much better quality bitmap image.

Colour me Maroon

Having solved the resolution issue, let’s tackle the colour problem. This is going to be a little trickier, but follow this section carefully and you’ll get there.

The problem with the colours is that, in the converted PNG, they look solarised and are in general ugly and flat. Here, let me show you another example from a different file:

On the left, the original PDF. On the right, the PNG with ugly colours.
On the left, the original PDF. On the right, the PNG with ugly colours.

The PDF with nice, realistic, and subtle colours is on the left. On the right is the PNG with colours way out of whack. Check out the differences in blues and greens and how they look garish in the converted picture.

You see, as long as your original PDF document (a) does not contain images, or (b) only contains black and white images, or (c) contains images in the RGB colour space, you’re going to be fine. But, the moment your document is (d), i.e., it contains images in the CMYK colour space, you are heading for trouble.

It turns out PDFs containing images in CMYK colour space are much more common than you may think. CMYK (which stands for Cyan, Magenta, Yellow and blacK) refers to the ink colours used in printing presses. PDF is the default document format in the printing industry. Do the math: it is very likely your PDFs will be rendered in the CMYK colour space, especially if you got them from your publisher.

That said, one way to find out for sure is by using ImageMagick’s identify tool. Use

identify your.pdf

and you’ll get some information about the PDF, but use

identify -verbose your.pdf

and you’ll get a whole bunch more, including the colour space.

But -verbose gives you nearly too much information, plus each page in the PDF is treated as a different image, and gets its own avalanche of information. Hence, if you have 6-page PDF like you do, you’ll get swamped by lines and lines of data. To pick out what you’re really interested in, you may want to use the -format option. The following line:

identify -format "%r\n" your.pdf

will output:

DirectClass CMYK
DirectClass CMYK
DirectClass CMYK
DirectClass CMYK
DirectClass CMYK
DirectClass CMYK

i.e, the colour space for each page contained in your PDF (assuming, of course, the colour space for your PDF is DirectClass CMYK).

Just to clarify, -format allows you to cherry pick the information identify shows, and how to show it. In this case %r lifts out the colour space information, and \n inserts a newline so that each instance for each page gets printed on its own line. There’s a whole list  of options you can use with -format on the ImageMagick documentation page.

The problem with the colour is that, although identify can extract the colour space string from the PDFs metadata, ImageMagick doesn’t really know what the colour space is or the exact correspondence of the CMYK colours to the RGB colour space. So, when asked to translate pixel colours from one space to another, it guesses. Or, more precisely, it takes each pixel’s colour value in the original colour space (in this case CMYK) and tries to calculate its equivalent in the target colour space (in this case RGB). The result, as you see above, is often not very good.

To take the element of chance out of colour conversion, you can add a colour profile to the original image using the -profile option. A colour profile is basically used to map the colours in a given image. You can then convert from CMYK to RGB using an RGB profile and colours will look much better because convert can match them better. This is possible because we know how the colours are mapped in each space. Your amended command will look something like this:

convert -density 300 your.pdf -profile [CMYK_profile.icc] -profile [RGB_profile.icc] yourimage.png

What’s happening here is that you are using a CMYK colour profile file you would have previously installed on your computer (more about this in a minute) and you’re swapping out the original profile from your.pdf and giving it a new one. ImageMagick converts your.pdf to this new CMYK profile. Hopefully the new CMYK profile will be similar enough to the original file’s profile to avoid any major colour shifting. Once done, your.pdf has a colour profile ImageMagick can work with.

Next you tell convert you want to change the image’s colour profile again, this time more radically to RGB, another profile contained in a file you will have installed on your computer. As ImageMagick can now look up the colour on the CMYK profile and map better the values in it to those in the RGB profile, when the time comes to actually convert the PDF into a PNG image, the colours come out much nearer to the original. See the examples below.

This is necessary, by the way, because PNG files can only do RGB. If you export to other formats, say, JPEG or TIFF, both of which can support CMYK colour spaces, you may or may not have the same problems.

The first caveat to the method described above is that you will probably have to track down and install colour profiles yourself (they usually have the extension *.icc). Linux distros usually supply some along with applications such as ghostscript, Tex, and other packages. There may also be packages in your distro’s repositories especially dedicated to installing standalone ICC files, often engineered to be compatible with profiles commonly used within the printing industry. Finally, you can go online and search for more colour profile files around the web. In this latter case, you will have to download and copy the files to the right place yourself.

The second caveat is that you might end up finding you have several CMYK and RGB ICC files to choose from. If you can’t find an exact match for whatever identify spits out, the only way to know which one works best is to experiment with different combinations.

My own personal command line actually looks like this:

convert -density 300 your.pdf -profile /usr/share/ghostscript/9.15/iccprofiles/default_cmyk.icc -profile /usr/share/ghostscript/9.15/iccprofiles/default_rgb.icc yourimage.png

That’s because the best match I have found for the PDFs my publisher sends me is the default_cmyk.icc file provided by ghostscript. The default_rgb gives me a nice converted colour for the PNG.

Check out how different the converted PDF looks now. As always, the original PDF is on the left and the converted PNG is on the right. If anything, the new image’s colours are too understated, but they are good enough for me. At least now it doesn’t look like it was colourised by a raver on acid.

Using profiles you get a much better colour match. The PDF is on the left and the converted image is on the right.
Using profiles you get a much better colour match. The PDF is on the left and the converted image is on the right.

If you now go back and convert your original PDF, you will notice the greens look much better:

Although not a perfect, the greens in the PNG (on the right) won't give you eye cancer anymore.
Although not a perfect, the greens in the PNG (on the right) don’t look radioactive anymore.

50 Shades of Shadows

Now you have the PDF split into separate images and correctly colourised, let’s get on to prettifying each image and putting them back together to make your preview. You’ll do this by taking one of the images you generated above, let’s say yourimage-0.png, and gradually building up the command line piling on the options until you get the result you need.

First step: adding a shadow to the image.

ImageMagick actually comes with a -shadow option:

convert yourimage-0.png -shadow 75x50+10+20 shadow.png

Let’s break down the parameters -shadow takes:

  • The 75 is the shadow’s opacity expressed as a percentage. 0% is transparent and 100% is totally opaque.
  • The 50 is what ImageMagick calls the shadow’s sigma. In other places, namely in bitmap editing software such as Photoshop or GIMP, this is known as the shadow’s radius or spread. The larger the number here, the bigger and blurrier the shadow.
  • The +10 is the shadow’s displacement on the X axis. In this case, you displace the shadow 10 pixels to the right with regards to the original image.
  • Conversely, the +20 is the shadow’s displacement on the Y axis. In this case, you displace the shadow 20 pixels down from the original image’s position.

The last two parameters can also take a percentage (add %) as a value. Then, instead of pixels, the displacement is calculated as a percentage of the total size of the image:

convert yourimage-0.png -shadow 75x50+10+20% shadow.png

Will push the shadow 10% to the right and 20% down from the image’s original position. That is, if your image is a perfect 100×100 pixel square, the shadow will be displaced 10 pixels on the X axis and 20 pixels on the Y axis. If your picture is a 200×200 square the shadow will be displaced 20 (10% of 200) pixels on the X axis and 40 (20% of 200) pixels on the Y axis.

You can also use negative numbers in the displacement. In the line above, the shadow is displaced to the right and down, as if the light source were shining from the upper left. However, if you did this:

convert yourimage-0.png -shadow 75x50-10-20% shadow.png

the shadow would be projected upwards and to the left, as if the image were being illuminated from below and the right.

So, getting back on track, if you run

convert yourimage-0.png -shadow 75x50+10+20% shadow.png

the resulting image would look something like what is shown below.

Your first experiment with -shadow will probably yield something you did not expect.
Your first experiment with -shadow will probably yield something you did not expect.

Interestingly, ImageMagick colours shadows white by default!

This is easily solved. Try this:

convert yourimage-0.png -background black -shadow 75x50+10+20% shadow.png

and you’ll get this:

A black shadow. Now, that's proper.
A black shadow. Now, that’s proper.

But, of course, you don’t really want an independent picture of the shadow of your page. What you really want is to have a stack of layers, with a (white) background at the bottom, then the shadow on top of that, and finally the image of the page extracted from the original PDF on top of everything else.

Well, lucky you: while you don’t push the final image out to a file, ImageMagick stores transitory intermediate images as layers… on a stack.

Try this:

convert yourimage-0.png \( +clone -background black -shadow 75x50+10+20% \) shadow.png

This actually generates two output images, shadow-0.png and shadow-1.png, one per layer. One of the layers comes from yourimage-0.png. The other comes from the +clone option.

What’s going on here is that you give convert the input image (yourimage-0.png) and that goes onto the stack as layer 0. Then you tell convert to +clone layer 0 (+clone creates a copy of the last image that you put on the stack). Now you have two layers. Then you transform the cloned image into a shadow of itself as explained above. When you try to output, convert sees there are two images/layers in the stack and, as you haven’t given any instructions as to what to do with them, it outputs two images, one per layer.

But what you really want is to merge them both together, right? You can do that with

convert yourimage-0.png \( +clone -background black -shadow 75x50+10+20% \) -layers merge shadow.png

Unfortunately, the resulting image is still not what you want:

Layers get piled from first (at the bottom) to last (on top), which is not what you want in this case.
Layers get piled from first (at the bottom) to last (on top), which is not what you want in this case.

The shadow is on top because ImageMagick piles the layers from first (the original PNG image in this case) on the bottom, to last (the shadow) on top. What you need is to swap both layers around:

convert yourimage-0.png \( +clone -background black -shadow 75x50+10+20% \) +swap -layers merge shadow.png

That will put the layers in the correct order (+swap  unsurprisingly swaps the order of the last two layers), but renders the result with a black background. As your shadow is also black, you won’t be able to see it. You need to add the white background we mentioned above:

convert yourimage-0.png \( +clone -background black -shadow 75x50+10+20% \) +swap -background white -layers merge shadow.png

And we’re done at a last. Check it out:

A page rendered with a shadow.
A page rendered with a shadow.

You can now put this command line into a for ... do ... done loop to add a shadow to all the images you generated from the PDF.

Something like this:

for i in yourimage-*.png
 do
  convert $i \( +clone -background black -shadow 75x50+10+20% \) +swap -background white -layers merge ${i%.png}.shadow.png
 done

would do the job.

The Full Montage

To sew all the images together you’ll use ImageMagick’s montage command. As this tool is designed to do just that, i.e., join images together, the command line is quite straightforward:

montage -tile x1 -geometry +1+1 yourimage*.shadow.png yourimage.jpg

montage will take all the images that fit the pattern yourimage*.shadow.png and tile them along the Y axis (-tile x1) to make a horizontal row which it then dumps it into yourimage.jpg. If you want a column, you could do:

montage -tile 1x -geometry +1+1 yourimage*.shadow.png yourimage.jpg

If you want a grid with 2 columns and 3 rows, you could do:

montage -tile 2x3 -geometry +1+1 yourimage*.shadow.png yourimage.jpg

… and so on.

The -geometry +1+1 part of the command line works differently to how it does in convert. Here it tells montage to create an image as big as is needed to fit all the individual PNGs into it (otherwise montage assumes you need tiny thumbnails). For a more detailed explanation on how this works, check out montage’s online documentation.

If you find rows boring, you can always organise your pages into a grid with montage's -tile option.
If you find rows boring, you can always organise your pages into a grid with montage’s -tile option.

Altogether now!

The thing to do now is to bring everything together into one script. My version looks like this

#!/bin/bash

# I am the convertPDFs.sh script

# ARGUMENTS
# $1 - PDF to convert
# $2 - Pixel Density
# $3 - Scale (in %)
# $4 - Shadow opacity
# $5 - Shadow sigma
# $6 - Relative distance of shadow (%)

# Convert PDF to PNGs (one image per page)
convert -density $2 $1 -profile /usr/share/ghostscript/9.15/iccprofiles/default_cmyk.icc -profile /usr/share/color/icc/OpenICC/sRGB.icc -scale $3% ${1%.pdf}_%03d.png

# Add shadow
for i in ${1%.pdf}*.png
 do
  convert $i \( +clone -background black -shadow $4x$5+$6x$6% \) +swap -background white -layers merge ${i%png}shadow.png
 done

# Mount into one image one after the other
montage -tile x1 -geometry +1+1 ${1%.pdf}*.shadow.png ${1%pdf}jpg

# Remove temporary unnecessary PNG files
rm -f ${1/.pdf/}*.png

And it allows you to input all the values you need as parameters. To do something similar to what you have done step by step in this article, you would run the script like so:

convertPDFs.sh your.pdf 150 50 75 10 10

This outputs the image you can see at the top of this article.

Conclusion

I’ve said it before and I’ll say it again: ImageMagick is an amazingly powerful toolbox. By combining options and chaining several tools together, it is hard to think of an image transformation process that cannot be carried out using a convert, montage, identify and so on. Being able to apply unsupervised changes massively to whole directories full of files makes ImageMagick a time- and life-saver.

Before you even think of firing up a heavy graphical image editor, ask yourself if what you’re going to do is something you’ll do more than once. If the answer is “yes”, it’s a task you should probably leave to ImageMagick.

Cover image: img-1109 by dhester for MorgueFile.com.