Skip to content

API reference

This page documents the JSON payload contract the worker accepts. It mirrors the docstring in handler.py, which is the source of truth; this is a friendlier rendering.

Send a POST to /v2/{endpoint_id}/runsync (or /run for async) with an input object:

{
"input": {
"file_url": "https://example.com/report.pdf",
"start_page": 0,
"end_page": 99,
"lang": "en",
"backend": "vlm-auto-engine",
"formula_enable": true,
"table_enable": true,
"return": "tarball_b64",
"basename": "my-doc"
}
}

The worker accepts any of these formats — PDF, image (PNG/JPEG/GIF/BMP/TIFF/WebP), DOCX, PPTX, XLSX — auto-detected from the input bytes. Provide exactly one transport; providing zero or two raises a validation error.

FieldTypeNotes
file_urlstringPublic or presigned HTTP/HTTPS URL the worker can GET
file_b64stringBase64-encoded file bytes. RunPod’s gateway caps payloads at 10 MB on /run and 20 MB on /runsync; for bigger files use file_url or volume_path
volume_pathstringAbsolute path to a file inside the container. Useful for files mounted via a RunPod network volume or baked into the image
FieldTypeDefaultNotes
start_pageint00-based, inclusive
end_pageint-10-based, inclusive. -1 or omitted means “to end of document”
langstring"en"Language hint, pipeline backend only (VLM backends ignore it). Use script-family codes — east_slavic, cyrillic, latin, arabic, devanagari, japan, korean, chinese_cht, el, th, etc. 109 languages supported. NOT ISO codes. See Input formats for the full list.
backendstring"vlm-auto-engine"One of pipeline | vlm-auto-engine | vlm-http-client | hybrid-auto-engine | hybrid-http-client. See Choosing a GPU → Picking a backend
server_urlstringnullRequired for *-http-client backends. URL of an external vLLM OpenAI-compatible server (e.g. https://your-host/v1)
formula_enablebooltrueExtract LaTeX equations
table_enablebooltrueExtract structured HTML tables
returnstring"tarball_b64""tarball_b64" (default, base64-encoded .tar.gz), "inline" (markdown + content_list + middle + images embedded in the response), or "s3" (upload to a configured S3-compatible bucket and return a presigned URL — requires BUCKET_* env vars). See Output modes for the trade-offs.
basenamestring"doc"Filename stem for output files. Must be alphanumeric with - or _

I just want the Markdown — how do I get it?

Section titled “I just want the Markdown — how do I get it?”

Set "return": "inline" in the job input. The response then contains a top-level "markdown" field with the full document rendered as Markdown:

{
"input": {
"file_url": "https://example.com/report.pdf",
"return": "inline"
}
}

Response (truncated):

{
"ok": true,
"markdown": "# Document title\n\nFirst paragraph...\n\n## Section\n\nA table:\n\n<table>...</table>\n",
"content_list": [...],
"middle": {...},
"images": {"img-0.png": "<base64>"}
}

tarball_b64 (the default) also includes a .md file inside the gzipped tarball at {basename}.md — extract the tarball and the markdown is there too. Use inline when you want to read the markdown directly without unpacking; use tarball_b64 (or s3) when you also want the structured JSON / image files together.

{
"ok": true,
"elapsed_seconds": 18.4,
"pages_processed": 100,
"mineru_version": "3.2.x",
"source": "url:https://example.com/report.pdf",
"tarball_b64": "<base64-encoded gzipped tarball>"
}
FieldTypeNotes
okboolAlways true on success
elapsed_secondsfloatWall time inside the handler. Does not include cold-start time or transport
pages_processedintNumber of pages parsed. -1 if end_page was open-ended
mineru_versionstringThe MinerU version that produced the parse (e.g. 3.1.15)
sourcestringEcho of the input transport: url:..., b64, or volume:/path/...
tarball_b64stringPresent when return: "tarball_b64" (default). Base64-encoded .tar.gz of the output directory
markdown, content_list, middle, imagesvariousPresent when return: "inline". See below
debugobjectObservability data: backend used, model_dir (which snapshot loaded), gpu info, phase_ms timings. See below

Instead of tarball_b64, the success response contains four inline fields:

FieldTypeNotes
markdownstringThe full Markdown rendering of the document
content_listarrayFlat list of typed entries: {"type": "text"|"equation"|"table"|"image"|"code", "page_idx": int, ...}. Suitable for RAG chunking
middleobjectMinerU’s intermediate representation with layout, bounding boxes, reading order
imagesobject{filename: base64-encoded-png-bytes} for every extracted image

Every response includes a debug block with information that lets you correlate a parse to its environment without having to read worker logs:

{
"debug": {
"backend": "vlm-auto-engine",
"model_dir": "/root/.cache/huggingface/hub/models--opendatalab--MinerU2.5-Pro-2605-1.2B/snapshots/<hash>",
"gpu": {
"available": true,
"name": "NVIDIA RTX 4090",
"compute_capability": "8.9",
"total_memory_gb": 23.99
},
"phase_ms": {
"fetch_pdf": 12,
"mineru_parse": 18420,
"package": 95
}
}
}
FieldNotes
backendThe backend that ran. Echoes the input or the default
model_dirFilesystem path of the model snapshot that loaded. Both VLM and pipeline models are baked into the image at /root/.cache/huggingface/; the path you see here proves which snapshot the worker resolved (null if HF cache is empty, which should not happen on the published image)
gpuCard name, compute_capability (8.6 = Ampere, 8.9 = Ada, 9.0 = Hopper, 12.0 = Blackwell), VRAM. Helps debug “why did my job land on a different card than my pool config?”
phase_msPer-phase timings: fetch_pdf (download/decode), mineru_parse (MinerU’s aio_do_parse), package (tarball or inline assembly)

On failure, debug still contains gpu, model_dir, and whatever phase_ms was collected before the error.

When the handler raises or returns an error, the response sets ok: false and includes a top-level error key. RunPod marks the job FAILED in the dashboard based on the presence of that key.

{
"error": "ValueError: must provide exactly one of file_url / file_b64 / volume_path",
"ok": false,
"elapsed_seconds": 0.1,
"mineru_version": "3.2.x",
"traceback": "Traceback (most recent call last):\n File ..."
}
FieldTypeNotes
errorstringType name + message, e.g. ValueError: ...
okboolAlways false on failure
elapsed_secondsfloatTime before the error
mineru_versionstringVersion that was running
tracebackstringLast 5 frames of the Python traceback; useful for debugging

The handler emits runpod.serverless.progress_update events during a parse. Phases:

{"phase": "fetching_pdf"}
{"phase": "parsing", "pdf_bytes": 1234567, "start_page": 0, "end_page": 99}
{"phase": "packaging"}

Consume them via the RunPod SDK’s endpoint.stream(job_id) or the HTTP GET /v2/{endpoint_id}/stream/{job_id} endpoint. The MineruClient Python wrapper does not surface these (it only supports run_sync); use the RunPod SDK directly if you need progress.

The handler validates input before doing any work:

  1. Field types and bounds via runpod.serverless.utils.rp_validator. Wrong type, missing required field, or out-of-range value raises ValueError.
  2. XOR transport rule. Exactly one of file_url, file_b64, volume_path must be set. Zero or two raises ValueError.
  3. Basename safety. basename must match [a-zA-Z0-9_-]+; otherwise ValueError.
  4. Inline file size. file_b64 decoded length must be ≤ 20 MB (matching RunPod’s /runsync gateway cap).
  5. Format detection. The first few bytes must match a known signature (PDF / image / OOXML); otherwise ValueError.

All validation errors produce a failure response with error: "ValueError: ..." and the job is marked FAILED.

This page is regenerated by hand when the contract changes. The authoritative source is the docstring at the top of handler.py plus the INPUT_SCHEMA dict and _validate_input function in that file. If you see a discrepancy, the code wins; please open an issue so the docs page can be updated.