Skip to content

Rearranging Pages of a PDF

Jorj X. McKie edited this page Mar 28, 2017 · 6 revisions

Using Document.select()

Since V 1.9.0, the Document class has a new method, select([...]). The only parameter is a list of pages (given by zero-based integers), that should be selected.

A successfull execution of the method (return code not negative) will alter the document's representation in memory. This means for example, after select([0]), only the first page will be left over, everything else will have gone, pageCount will be 1, and so on. If you now save the document by save(...), you will have a new 1-page PDF reflecting what has happened.

Interesting to note, that all links, bookmarks and annotations will be preserved, if they do not point to a deleted page.

How can this method be used?

If you know how to manipulate Python lists, and especially the list of all pages of a PDF document lst = list(range(doc.pageCount)), you are only limited by your phantasy. You can for example

  • Delete pages containing no text or a specific text
  • Only include odd / even pages, e.g. to support double sided printing on some printer hardware
  • Re-arrange pages, e.g. the whole document from back to front: take lst = list(range(doc.pageCount-1, -1, -1)) as the list to be selected.
  • "Concatenate" a document with itself by specifying lst + lst as the list of pages to be taken
  • doc.select([1,1,1,5,5,5,9,9,9]) will do what it looks like: create a 9-page document of 3 times 3 equal pages
  • Take the first / last 10 pages: lst = list(range(10)), lst = list(range(doc.pageCount - 10, doc.pageCount)), respectively.
  • etc.

You can apply several such selects in a row. After each one, the document structure will get updated (doc.loadPage will always reflect the current count, etc.).

The original PDF content is no longer accessible. You first have to do a doc.close() followed by a fitz.open(...) of the original.

When you are finished and want to save your work, issue doc.save(...). Be sure to include the garbage=4 option if you have deleted many pages (to reduce the PDF file size).

Using Other Methods

There are situations when the select() method may be a too big caliber to achieve something fairly small.

In such situations, Documentmethods deletePage(), deletePageRange(), copyPage() or movePage() may be mmore appropriate. They all do what their names imply, and they all accept 0-based page number as arguments.

By combining them, you can always achieve, what select() is able to achieve in one statement.

As a general rule, use select() when many pages have to be handled and / or when some algorithm is needed to generate a list of required pages. Otherwise these single-page-methods may be more appropriate.

Clone this wiki locally