On a Linux system, a program called 'pdfimages' is useful for extracting images from a .pdf file.
On Debian, the application is part of the poppler-utils package which has the following applications:
- pdfdetach -- lists or extracts embedded files (attachments)
- pdffonts -- font analyzer
- pdfimages -- image extractor
- pdfinfo -- document information
- pdfseparate -- page extraction tool
- pdfsig -- verifies digital signatures
- pdftocairo -- PDF to PNG/JPEG/PDF/PS/EPS/SVG converter using Cairo
- pdftohtml -- PDF to HTML converter
- pdftoppm -- PDF to PPM/PNG/JPEG image converter
- pdftops -- PDF to PostScript (PS) converter
- pdftotext -- text extraction
- pdfunite -- document merging tool
To see what sort of images are in a .pdf, run the -list command:
$ pdfimages -list /home/rpb/data/resources/books/The.Stock.Market.Course.pdf page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- 45 2 stencil 571 439 - 1 1 ccitt no 133 0 151 151 1718B 5.5% 85 3 image 398 267 rgb 3 8 jpeg no 253 0 86 86 22.6K 7.3%
A number of options, with '-all' being the easiest, are available to extract images, and to perform optional conversions:
$ pdfimages pdfimages version 0.48.0 Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC Usage: pdfimages [options]-f : first page to convert -l : last page to convert -png : change the default output format to PNG -tiff : change the default output format to TIFF -j : write JPEG images as JPEG files -jp2 : write JPEG2000 images as JP2 files -jbig2 : write JBIG2 images as JBIG2 files -ccitt : write CCITT images as CCITT files -all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt -list : print list of images instead of saving -opw : owner password (for encrypted files) -upw : user password (for encrypted files) -p : include page numbers in output file names -q : don't print any messages or errors -v : print copyright and version info -h : print usage information -help : print usage information --help : print usage information -? : print usage information