Skip to content

A More Detailed Look at PyMuPDF's Performance

Jorj X. McKie edited this page Dec 2, 2015 · 5 revisions

We have stated during several occasions that MuPDF and therefor also our Python binding PyMuPDF ranges at the top when it comes to performance.

I have been wondering how these bold statements could proved, or at least underpinned with some quantitative data. A full comparisons of all the many PDF tools on the market is merely impossible - differences in functionality, scope, intended use, platform (in)dependence, pricing and openness and so forth are just to large.

So I decided to start with a minimal approach. I just want to illustrate how fast MuPDF can read in and interpret PDF files and thus make them available for the actual processing desired.

Most of what I am writing here is also contained in the PyMuPDF documentation. This docu already contains a chapter on performance and resource requirements for the text extraction methods.

I felt that simply opening a PDF and immediately saving it again as a new PDF, should cover the complete code responsible for interpreting a PDF's data and re-arranging them to form a new PDF. At the same time, every tool should at least be able to do this basic task.

We have chosen the following tools for the comparison.

tbc

Clone this wiki locally