Skip to content

Overview

mineru-runpod is a generic, reusable MinerU PDF-parsing service running on RunPod Serverless. MinerU 3.2.x runtime with the MinerU2.5-Pro-2605-1.2B VLM by default. The repo deliberately knows nothing about any specific downstream system — it ships a worker image, a small Python client, and the deploy / destroy glue. Anything that needs PDF → structured Markdown + JSON (a RAG pipeline, a document indexer, an Office-doc archive) calls it the same way.

  • handler.py — the serverless worker. Accepts a PDF via URL, base64, or mounted-volume path; calls MinerU’s async parse; returns Markdown + content_list + middle.json + images.
  • mineru_client/ — the Python package consumers import. One class (MineruClient), two methods (parse_document, parse_document_from_file). Pure-Python; imports nothing GPU- or MinerU-related, so it’s safe to depend on from any caller.
  • deploy.py / destroy.py — stand up and tear down the RunPod endpoint from CLI flags. Every dashboard setting is exposed as a flag.
  • .runpod/hub.json — disabled RunPod Hub listing metadata (title, description, GPU pool list, CUDA versions).
  • examples/parse_url_example.py, parse_b64_example.py, and a parser_adapter_example.py showing how to wrap MinerU output in your own typed domain model.
If you’re building…What you get
Office document indexing (Word / PowerPoint / Excel exported to PDF)Spiky ingest, pay only during bursts; preserves tables + figures
Document RAG pipelinesSection-aware chunks with page provenance out of the box
Contract / spec / standards parsingHandles long attribute tables and cross-page constructs
Invoice / receipt extractionTable fidelity + image extraction in one pass
Multi-language documentsMinerU’s pipeline backend supports 109 languages, including handwriting
  • Accuracy. The MinerU2.5-Pro-2605-1.2B VLM leads the OmniDocBench leaderboard on text, formula, table, and reading-order metrics — see the HuggingFace model card and the technical report.
  • Economics. Per-second billing on RunPod with FlashBoot means an idle worker costs nothing. ~$0.0003 per page on a 24 GB RTX 4090 (default) at current rates (see RunPod pricing; rates change).
  • Licensing. MinerU is Apache 2.0 with explicit commercial thresholds (free below 100M MAU and $20M monthly revenue, with attribution). Among open-source GPU-class PDF parsers, that’s the cleanest license for production SaaS use.

MinerU2.5-Pro-2605 vs other PDF parsers — OmniDocBench leaderboard

Source: MinerU2.5-Pro-2605-1.2B model card and the MinerU 2.5 technical report.
mineru-runpodMarkerGROBIDNougat
Scale-to-zero✅ ready to use⚠️ possible, needs extra setup❌ always-on
GPUrequiredCPU or GPUCPUrequired
Equations✅ LaTeX✅ LaTeX✅ LaTeX
Multi-lang✅ 109 (pipeline backend)per upstream READMEEN onlyper upstream README
LicenseApache 2.0 + thresholdsGPL-3.0 code + modified RAIL-M weightsApache 2.0MIT code, CC-BY-NC 4.0 weights
Commercial SaaS⚠️ depends on RAIL-M competitor clause⚠️ subject to CC-BY-NC non-commercial clause

Marker uses Surya as its in-process OCR/layout engine; Surya’s weights ship under a modified RAIL-M license. The license’s §2(c) competitor clause does not include the $2M revenue carveout that §2(a) and §2(b) carry, while Marker’s own README markets the model weights as free for “startups under $2M funding/revenue.” The two read differently — get counsel before depending on Marker for a service that could be characterized as competitive. Datalab’s Chandra model (what their hosted API runs) carries the same modified RAIL-M license.

See the project README for the fully source-cited version of this comparison.

  • Inputs: PDF, image (PNG/JPEG/GIF/BMP/TIFF/WebP), DOCX, PPTX, XLSX — auto-detected from bytes. Three transports: URL, base64, or a path on a mounted volume. See Input formats.
  • Outputs: Markdown + content_list + middle.json + extracted images. Three transport modes: base64 tarball, inline fields, or presigned URL to an S3-compatible bucket. See Output modes.
  • Backends: five MinerU backends — the VLM (model card tagged English + Chinese; empirically handles Cyrillic correctly on the Pro model), the pipeline OCR (109 languages via PaddleOCR, documented-safe for any non-Latin script), the hybrid auto-router, and split-tier deploys. See Picking a backend.

There are two paths from here, depending on whether you want to run the official image as-is or run your own customised build:

Easiest — deploy from the RunPod Hub. Sign up via this referral link, open The Hub → Serverless repos, find mineru-runpod, click Deploy. Grab the endpoint id and you’re parsing PDFs.

Customise — fork and auto-build. Fork the repo, then in the RunPod dashboard do The Hub → Serverless repos → Import Git Repository pointed at your fork. RunPod builds the image on every push to main. Use this path if you need to pin different MinerU / vLLM versions or modify handler.py.

Either way, total wall time is roughly 10 minutes assuming RunPod has spare capacity. See the project README’s Deploy section for the full step-by-step.