API reference
This page documents the JSON payload contract the worker accepts. It mirrors the docstring in handler.py, which is the source of truth; this is a friendlier rendering.
Job input
Section titled “Job input”Send a POST to /v2/{endpoint_id}/runsync (or /run for async) with an input object:
{ "input": { "file_url": "https://example.com/report.pdf", "start_page": 0, "end_page": 99, "lang": "en", "backend": "vlm-auto-engine", "formula_enable": true, "table_enable": true, "return": "tarball_b64", "basename": "my-doc" }}Required (exactly one of)
Section titled “Required (exactly one of)”The worker accepts any of these formats — PDF, image (PNG/JPEG/GIF/BMP/TIFF/WebP), DOCX, PPTX, XLSX — auto-detected from the input bytes. Provide exactly one transport; providing zero or two raises a validation error.
| Field | Type | Notes |
|---|---|---|
file_url | string | Public or presigned HTTP/HTTPS URL the worker can GET |
file_b64 | string | Base64-encoded file bytes. RunPod’s gateway caps payloads at 10 MB on /run and 20 MB on /runsync; for bigger files use file_url or volume_path |
volume_path | string | Absolute path to a file inside the container. Useful for files mounted via a RunPod network volume or baked into the image |
Optional
Section titled “Optional”| Field | Type | Default | Notes |
|---|---|---|---|
start_page | int | 0 | 0-based, inclusive |
end_page | int | -1 | 0-based, inclusive. -1 or omitted means “to end of document” |
lang | string | "en" | Language hint, pipeline backend only (VLM backends ignore it). Use script-family codes — east_slavic, cyrillic, latin, arabic, devanagari, japan, korean, chinese_cht, el, th, etc. 109 languages supported. NOT ISO codes. See Input formats for the full list. |
backend | string | "vlm-auto-engine" | One of pipeline | vlm-auto-engine | vlm-http-client | hybrid-auto-engine | hybrid-http-client. See Choosing a GPU → Picking a backend |
server_url | string | null | Required for *-http-client backends. URL of an external vLLM OpenAI-compatible server (e.g. https://your-host/v1) |
formula_enable | bool | true | Extract LaTeX equations |
table_enable | bool | true | Extract structured HTML tables |
return | string | "tarball_b64" | "tarball_b64" (default, base64-encoded .tar.gz), "inline" (markdown + content_list + middle + images embedded in the response), or "s3" (upload to a configured S3-compatible bucket and return a presigned URL — requires BUCKET_* env vars). See Output modes for the trade-offs. |
basename | string | "doc" | Filename stem for output files. Must be alphanumeric with - or _ |
I just want the Markdown — how do I get it?
Section titled “I just want the Markdown — how do I get it?”Set "return": "inline" in the job input. The response then contains a top-level "markdown" field with the full document rendered as Markdown:
{ "input": { "file_url": "https://example.com/report.pdf", "return": "inline" }}Response (truncated):
{ "ok": true, "markdown": "# Document title\n\nFirst paragraph...\n\n## Section\n\nA table:\n\n<table>...</table>\n", "content_list": [...], "middle": {...}, "images": {"img-0.png": "<base64>"}}tarball_b64 (the default) also includes a .md file inside the gzipped tarball at {basename}.md — extract the tarball and the markdown is there too. Use inline when you want to read the markdown directly without unpacking; use tarball_b64 (or s3) when you also want the structured JSON / image files together.
Success response
Section titled “Success response”{ "ok": true, "elapsed_seconds": 18.4, "pages_processed": 100, "mineru_version": "3.2.x", "source": "url:https://example.com/report.pdf", "tarball_b64": "<base64-encoded gzipped tarball>"}| Field | Type | Notes |
|---|---|---|
ok | bool | Always true on success |
elapsed_seconds | float | Wall time inside the handler. Does not include cold-start time or transport |
pages_processed | int | Number of pages parsed. -1 if end_page was open-ended |
mineru_version | string | The MinerU version that produced the parse (e.g. 3.1.15) |
source | string | Echo of the input transport: url:..., b64, or volume:/path/... |
tarball_b64 | string | Present when return: "tarball_b64" (default). Base64-encoded .tar.gz of the output directory |
markdown, content_list, middle, images | various | Present when return: "inline". See below |
debug | object | Observability data: backend used, model_dir (which snapshot loaded), gpu info, phase_ms timings. See below |
When return: "inline"
Section titled “When return: "inline"”Instead of tarball_b64, the success response contains four inline fields:
| Field | Type | Notes |
|---|---|---|
markdown | string | The full Markdown rendering of the document |
content_list | array | Flat list of typed entries: {"type": "text"|"equation"|"table"|"image"|"code", "page_idx": int, ...}. Suitable for RAG chunking |
middle | object | MinerU’s intermediate representation with layout, bounding boxes, reading order |
images | object | {filename: base64-encoded-png-bytes} for every extracted image |
Debug observability
Section titled “Debug observability”Every response includes a debug block with information that lets you correlate a parse to its environment without having to read worker logs:
{ "debug": { "backend": "vlm-auto-engine", "model_dir": "/root/.cache/huggingface/hub/models--opendatalab--MinerU2.5-Pro-2605-1.2B/snapshots/<hash>", "gpu": { "available": true, "name": "NVIDIA RTX 4090", "compute_capability": "8.9", "total_memory_gb": 23.99 }, "phase_ms": { "fetch_pdf": 12, "mineru_parse": 18420, "package": 95 } }}| Field | Notes |
|---|---|
backend | The backend that ran. Echoes the input or the default |
model_dir | Filesystem path of the model snapshot that loaded. Both VLM and pipeline models are baked into the image at /root/.cache/huggingface/; the path you see here proves which snapshot the worker resolved (null if HF cache is empty, which should not happen on the published image) |
gpu | Card name, compute_capability (8.6 = Ampere, 8.9 = Ada, 9.0 = Hopper, 12.0 = Blackwell), VRAM. Helps debug “why did my job land on a different card than my pool config?” |
phase_ms | Per-phase timings: fetch_pdf (download/decode), mineru_parse (MinerU’s aio_do_parse), package (tarball or inline assembly) |
On failure, debug still contains gpu, model_dir, and whatever phase_ms was collected before the error.
Failure response
Section titled “Failure response”When the handler raises or returns an error, the response sets ok: false and includes a top-level error key. RunPod marks the job FAILED in the dashboard based on the presence of that key.
{ "error": "ValueError: must provide exactly one of file_url / file_b64 / volume_path", "ok": false, "elapsed_seconds": 0.1, "mineru_version": "3.2.x", "traceback": "Traceback (most recent call last):\n File ..."}| Field | Type | Notes |
|---|---|---|
error | string | Type name + message, e.g. ValueError: ... |
ok | bool | Always false on failure |
elapsed_seconds | float | Time before the error |
mineru_version | string | Version that was running |
traceback | string | Last 5 frames of the Python traceback; useful for debugging |
Progress updates (streaming)
Section titled “Progress updates (streaming)”The handler emits runpod.serverless.progress_update events during a parse. Phases:
{"phase": "fetching_pdf"}{"phase": "parsing", "pdf_bytes": 1234567, "start_page": 0, "end_page": 99}{"phase": "packaging"}Consume them via the RunPod SDK’s endpoint.stream(job_id) or the HTTP GET /v2/{endpoint_id}/stream/{job_id} endpoint. The MineruClient Python wrapper does not surface these (it only supports run_sync); use the RunPod SDK directly if you need progress.
Validation behaviour
Section titled “Validation behaviour”The handler validates input before doing any work:
- Field types and bounds via
runpod.serverless.utils.rp_validator. Wrong type, missing required field, or out-of-range value raisesValueError. - XOR transport rule. Exactly one of
file_url,file_b64,volume_pathmust be set. Zero or two raisesValueError. - Basename safety.
basenamemust match[a-zA-Z0-9_-]+; otherwiseValueError. - Inline file size.
file_b64decoded length must be ≤ 20 MB (matching RunPod’s/runsyncgateway cap). - Format detection. The first few bytes must match a known signature (PDF / image / OOXML); otherwise
ValueError.
All validation errors produce a failure response with error: "ValueError: ..." and the job is marked FAILED.
Source of truth
Section titled “Source of truth”This page is regenerated by hand when the contract changes. The authoritative source is the docstring at the top of handler.py plus the INPUT_SCHEMA dict and _validate_input function in that file. If you see a discrepancy, the code wins; please open an issue so the docs page can be updated.