-
Notifications
You must be signed in to change notification settings - Fork 617
Rearranging Pages of a PDF
Since V 1.9.0, the Document class has a new method, select([...])
. The only parameter is a list of pages (given by zero-based integers), that should be selected.
A successfull execution of the method (return code not negative) will alter the document's representation in memory. This means for example, after select([0])
, only the first page will be left over, everything else will have gone, pageCount
will be 1, and so on. If you now save the document by save(...)
, you will have a new 1-page PDF reflecting what has happened.
Interesting to note, that all links, bookmarks and annotations will be preserved, if they do not point to a deleted page.
How can this method be used?
If you know how to manipulate Python lists, and especially the list of all pages of a PDF document lst = list(range(doc.pageCount))
, you are only limited by your phantasy. You can for example
- Delete pages containing no text or a specific text
- Only include odd / even pages, e.g. to support double sided printing on some printer hardware
- Re-arrange pages, e.g. the whole document from back to front: take
lst = list(range(doc.pageCount-1, -1, -1))
as the list to beselected
. - "Concatenate" a document with itself by specifying
lst + lst
as the list of pages to be taken -
doc.select([1,1,1,5,5,5,9,9,9])
will do what it looks like: create a 9-page document of 3 times 3 equal pages - Take the first / last 10 pages:
lst = list(range(10))
,lst = list(range(doc.pageCount - 10, doc.pageCount))
, respectively. - etc.
You can apply several such selects in a row. After each one, the document structure will get updated (doc.loadPage will always reflect the current count, etc.).
The original PDF content is no longer accessible. You first have to do a doc.close()
followed by a fitz.open(...)
of the original.
When you are finished and want to save your work, issue doc.save(...)
. Be sure to include the garbage=4
option if you have deleted many pages (to reduce the PDF file size).
There are situations when the select()
method may be a too big caliber to achieve something fairly small.
In such situations, Document
methods deletePage()
, deletePageRange()
, copyPage()
or movePage()
may be mmore appropriate. They all do what their names imply, and they all accept 0-based page number as arguments.
By combining them, you can always achieve, what select()
is able to achieve in one statement.
As a general rule, use select()
when many pages have to be handled and / or when some algorithm is needed to generate a list of required pages. Otherwise these single-page-methods may be more appropriate.
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance