Skip to content

Inserting Pages from other PDFs

Jorj X. McKie edited this page Mar 6, 2018 · 10 revisions

Method insertPDF()

Method fitz.Document.insertPDF() allows you to insert page ranges from another PDF document. Usage looks like this:

doc1 = fitz.open("file1.pdf") # must be a PDF
doc2 = fitz.open("file2.pdf") # must be a PDF
doc1.insertPDF(doc2,          # cannot be the same object as doc1
               from_page=n,   # first page to copy, default: 0
               to_page=m,     # last page to copy, default: last page
               start_at=k,    # target location in doc1, default: at end
               rotate=deg,    # rotate copied pages
               links=True)    # also copy links & annotations

Except doc2, all parameters are optional.

Remarks

This makes available the MuPDF CLI tool mutool merge to Python. In technical PDF terms, for every page object, /Contents, /Resources, /MediaBox, /CropBox, /BleedBox, /TrimBox, /ArtBox, /Rotate, /UserUnit, /Annots are copied.

Bookmarks / outlines of doc2 are not copied. But the TOC structure of doc1 will remain intact with the copy operation.

In PyMuPDF we have extended the copy scope in the following way:

  1. Annotations (and Links) are copied if they point to pages in the copy range or to some outside resource.
  2. Optionally rotate copied pages.
  3. doc1 and doc2 must not be the same object, but may be the same file (opened twice under different objects)

Obviously, from_page may equal to_page - then only one page is copied.

Less obvious: if you specify from_page > to_page (!), then the same range is copied, but back to front.

It is quite easy to create joined tables of content (TOC) when concatenating complete files - see below. For a more sophisticated solution look at this example. It can join arbitrary ranges of PDF files together with their respective TOC pieces.

Examples

This will concatenate two PDFs, including their tables of content:

len1 = len(doc1)                      # number of doc1 pages
toc1 = doc1.getToC(simple = False)    # TOC of doc1
toc2 = doc2.getToC(simple = False)    # TOC of doc2
for bm in toc2:                       # bookmarks of doc2 ...
    bm[2] += len1                     # need increased page numbers
toc = toc1 + toc2                     # concatenate TOC's
doc1.insertPDF(doc2)                  # concatenate PDFs
doc1.setToC(toc)                      # new TOC

Copy pages 10 to 20 from some PDF, but rotated and in reversed order in front of doc1 pages:

doc1.insertPDF(doc2, from_page = 20, to_page = 10,
               start_at = 0, rotate = -90)
Clone this wiki locally