PyMuPDF Layout
10× faster PDF parsing with layout analysis.
Trained on structure, not images. CPU-only.

Trusted by teams from startups
to enterprises worldwide
8.3K Stars on Github
Open Source with Flexible Licensing
PyMuPDF is built on open collaboration and always will be. Our code is freely available on GitHub under the AGPL license, welcoming contributions from developers worldwide. For projects requiring different terms, we also offer commercial licensing through Artifex.

Get PyMuPDF Pro: Office Support + RAG/LLM + Layout
Everything in Open Source, Plus Three Powerful Extensions
Keep the speed and accuracy you love. Get the full package with Office document support, PyMuPDF4LLM for RAG pipelines, PyMuPDF Layout for advanced analysis, all with commercial licensing for production.
PyMuPDF Pro for
Office Document
PyMuPDF Pro supports a wide range of Office file formats, including DOC/DOCX, PPT/PPTX, XLS/XLSX, as well as HWP and HWPX, the widely used formats for Korean word processing.

PyMuPDF4LLM for
RAG Integrations
PyMuPDF integrates seamlessly with LangChain, Llamaparse and more! Prepare your data for RAG solutions and give your LLM the data that your users can trust.

Advanced Layout
Analysis Included
PyMuPDF Layout delivers enterprise-grade document structure extraction without the enterprise overhead. Built into PyMuPDF Pro, it analyzes PDF internals directly, no GPUs, no cloud dependencies, just pure CPU performance that's 10× faster than comparable tools.

Get Started with PyMuPDF
import pymupdf
doc = pymupdf.open("a.pdf") # open a document
out = open("output.txt", "wb") # create a text output
for page in doc: # iterate the document pages
text = page.get_text().encode("utf8") # get plain text (is in UTF-8)
out.write(text) # write text of page
out.write(bytes((12,))) # write page delimiter (form feed 0x0C)
out.close()


