4LLM

PyMuPDF4LLM
now ships with Layout.

TRY DEMO
New Release / v0.2+

Introducing the New
PyMuPDF4LLM:
Now Including Layout

The first document intelligence library that reads PDF structure natively — no image rendering, no GPU, no reconstruction loss.

0×Faster than vision models
0×Cost reduction at scale
0%Table accuracy, financial docs
0.0MParameters vs VLM billions

GNN trained on PDF internals — not pixels

Graph Neural Network reads vector structure directly. No image rendering pipeline, no OCR uncertainty. By parsing primitives instead of pixels, we preserve 100% of the table semantics and document hierarchy.

TRY IT NOW
GNN trained on PDF internals — not pixels

One Foundation. Multiple Extensions.

From low-level PDF manipulation to LLM-ready extraction.
Choose what your workflow needs.

PYMUPDF01

The Fastest PDF Processing Library in Python

Lightning-fast PDF processing at maximum speed, with minimal dependencies, powered by the MuPDF engine.

LEARN MORE
PYMUPDF
PYMUPDF4LLM02

Seamless PDF Integration for LLMs

Connect your PDF documents directly to Large Language Models with optimized text extraction.

LEARN MORE
PYMUPDF4LLM
PYMUPDF PRO03

Advanced PDF Capabilities for Enterprise

Enhanced features for complex document workflows and enterprise-grade performance.

LEARN MORE
PYMUPDF PRO

Compare Features

Not sure which product is right for you? Here's how they stack up.

Product
LICENSE
AGPLFree to use under the GNU Affero General Public License. Requires open-sourcing your application if distributed.CommercialPaid license for proprietary or commercial use without AGPL obligations.
SOURCE CODE
Open SourceSource code is publicly available and can be inspected, modified, and contributed to.
INPUT FILES
PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, ImagePDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, and Image formats are natively supported.
DOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPXDOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPX are not supported in this product.
OUTPUT FILES
PDF, SVG, ImageGenerates PDF, SVG, and Image formats directly.
Markdown, JSON, TXTMarkdown, JSON, and TXT are supported — ideal for structured or AI-ready output.
PAGE ANALYSIS
Advanced Page AnalysisUses trained data for enhanced structural recognition and superior layout results.
TEXT EXTRACTION
Advanced Text ExtractionExtracts text with structure tags (headings, lists, tables), page layout analysis, and semantic understanding. Includes superior table extraction with full cell structure and data type recognition.
IMAGE EXTRACTION
Advanced Image ExtractionAdvanced detection and rendering of image areas on the page — saves to disk or embeds in Markdown output.
VECTOR EXTRACTION
Advanced Vector ExtractionSuperior detection of picture areas with precise vector element identification.
OCR
Automatic OCRAutomatically applies OCR based on page content analysis — no manual trigger needed.

Everything you need for PDF workflows

SEE ALL CAPABILITIES
01 EXTRACTION

Extract text, images, tables, and metadata. Pull structured data from any PDF with precision. Get raw text, formatted tables, embedded images, fonts, annotations, and document metadata, all with simple Python commands.

02 ANALYSIS

Understand document structure and layout. Analyze reading order, detect document elements, identify tables and columns, and preserve visual hierarchy. Perfect for building RAG pipelines or processing complex documents.

03 CONVERSION

Convert PDFs to any format you need. Transform PDFs into Markdown, HTML, images, or text while maintaining formatting.

04 MANIPULATION

Create, edit, and transform PDFs programmatically. Merge, split, rotate, crop, and watermark PDFs. Add annotations, modify pages, insert images, and generate new PDFs from scratch. Full programmatic control over every element.

Join Our Forum

No marketing speak, no sales pitches. Just developers helping developers ship document processing features faster.

VISIT FORUM
PyMuPDF Forum Community

Your Next Document Pipeline
Starts Here

Install PyMuPDF, extract your first document, and see why thousands of developers trust us for production document processing.

PyMuPDF Logo

© 2026 Artifex Software Inc. All rights reserved.