API reference

This page documents the JSON payload contract the worker accepts. It mirrors the docstring in handler.py, which is the source of truth; this is a friendlier rendering.

Job input

Send a POST to /v2/{endpoint_id}/runsync (or /run for async) with an input object:

{
  "input": {
    "file_url":       "https://example.com/report.pdf",
    "start_page":     0,
    "end_page":       99,
    "lang":           "en",
    "backend":        "vlm-auto-engine",
    "formula_enable": true,
    "table_enable":   true,
    "transport":      "tarball_b64",
    "formats":        ["markdown", "content_list", "middle", "images"],
    "basename":       "my-doc"
  }
}

Required (exactly one of)

The worker accepts any of these formats — PDF, image (PNG/JPEG/GIF/BMP/TIFF/WebP), DOCX, PPTX, XLSX — auto-detected from the input bytes. Provide exactly one transport; providing zero or two raises a validation error.

Field	Type	Notes
`file_url`	string	Public or presigned HTTP/HTTPS URL the worker can `GET`. Downloaded server-side with a 200 MB cap and a 120 s timeout
`file_b64`	string	Base64-encoded file bytes. RunPod’s gateway caps payloads at 10 MB on `/run` and 20 MB on `/runsync`; for bigger files use `file_url` or `volume_path`
`volume_path`	string	Absolute path to a file inside the container. Useful for files mounted via a RunPod network volume or baked into the image

Optional

Field	Type	Default	Notes
`start_page`	int	`0`	0-based, inclusive
`end_page`	int	`-1`	0-based, inclusive. `-1` or omitted means “to end of document”
`lang`	string	`"en"`	Language hint, pipeline backend only (VLM backends ignore it). Use script-family codes — `east_slavic`, `cyrillic`, `latin`, `arabic`, `devanagari`, `japan`, `korean`, `chinese_cht`, `el`, `th`, etc. 109 languages supported. NOT ISO codes. See Input formats for the full list.
`backend`	string	`"vlm-auto-engine"`	One of `pipeline` \| `vlm-auto-engine` \| `vlm-http-client` \| `hybrid-auto-engine` \| `hybrid-http-client`. The `pipeline` and `hybrid-*` backends use MinerU’s PP-OCRv6 OCR models. See Choosing a GPU → Picking a backend
`effort`	string	`null`	Hybrid backends only (`hybrid-auto-engine` / `hybrid-http-client`). `"medium"` (MinerU’s default when omitted) or `"high"`: `"high"` enables image/chart analysis at a speed cost, `"medium"` skips it for throughput. Rejected on non-hybrid backends.
`server_url`	string	`null`	Required for `*-http-client` backends. URL of an external vLLM OpenAI-compatible server (e.g. `https://your-host/v1`)
`formula_enable`	bool	`true`	Extract LaTeX equations
`table_enable`	bool	`true`	Extract structured HTML tables
`transport`	string	`"tarball_b64"`	How the worker ships output. `"tarball_b64"` (default, base64-encoded .tar.gz inside the entry), `"inline"` (per-format keys inside the entry, filterable via `formats`), or `"s3"` (presigned URL — requires `BUCKET_*` env vars on the endpoint). See Output modes.
`formats`	array of string	all four	Subset of `["markdown", "content_list", "middle", "images"]`. Selects which artifacts the inline payload contains. Omit (or pass all four) to get everything. No-op for `tarball_b64` and `s3` — those transports always ship a self-contained archive with all four artifacts. Empty list is rejected.
`basename`	string	`"doc"`	Filename stem for output files. Must be alphanumeric with `-` or `_`
`archive_format`	string	`"tar.gz"`	Archive container for the `tarball_b64` and `s3` transports: `"tar.gz"` (default) or `"zip"`. No-op for `inline`. See Output modes → archive format.
`probe`	bool	`false`	Diagnostic mode. `{"probe": true}` skips parsing and every field above, then returns the worker’s filesystem layout plus the running MinerU version instead of a parse result — for checking baked model paths and volume mounts.

I just want the Markdown — how do I get it?

Set "transport": "inline" and (optionally) filter to just markdown:

{
  "input": {
    "file_url": "https://example.com/report.pdf",
    "transport": "inline",
    "formats": ["markdown"]
  }
}

Response (truncated):

{
  "ok": true,
  "elapsed_seconds": 4.2,
  "mineru_version": "3.4.x",
  "results": [
    {
      "basename": "doc",
      "source": "url:https://example.com/report.pdf",
      "pages_requested": -1,
      "markdown": "# Document title\n\nFirst paragraph...\n\n## Section\n\nA table:\n\n<table>...</table>\n"
    }
  ],
  "debug": {"...": "..."}
}

For single-file jobs the markdown lives at result.results[0].markdown. Higher-level helpers: the Python MineruClient exposes MineruClient.first(result) to skip the indexing.

tarball_b64 (the default) also includes a .md file inside the gzipped tarball at {basename}.md — extract the tarball and the markdown is there too. Use inline when you want to read the markdown directly without unpacking; use tarball_b64 (or s3) when you also want the structured JSON / image files together.

Success response

{
  "ok": true,
  "elapsed_seconds": 18.4,
  "mineru_version": "3.4.x",
  "results": [
    {
      "basename": "doc",
      "source": "url:https://example.com/report.pdf",
      "pages_requested": 100,
      "tarball_b64": "<base64-encoded gzipped tarball>"
    }
  ],
  "debug": {"...": "..."}
}

Top-level keys (job-scoped)

Field	Type	Notes
`ok`	bool	Always `true` on success
`elapsed_seconds`	float	Wall time inside the handler. Does not include cold-start time or transport
`mineru_version`	string	The MinerU version that produced the parse (e.g. `3.4.4`)
`results`	array	One entry per parsed file. Single-file jobs have a one-element list. See below for the entry shape.
`debug`	object	Observability data: `backend` used, `model_dir` (which snapshot loaded), `gpu` info, `phase_ms` timings. See below
`refresh_worker`	bool	Present only when a `REFRESH_WORKER_AFTER_*` threshold has tripped. Tells RunPod to recycle the worker after returning.

Per-entry keys (file-scoped, inside `results[]`)

Field	Type	Notes
`basename`	string	Echo of the input `basename`, handy for correlating many concurrent jobs
`source`	string	Echo of the input transport: `url:...`, `b64`, or `volume:/path/...`
`pages_requested`	int	The slice the caller asked for. `-1` if `end_page` was open-ended
`tarball_b64`	string	Present when `transport: "tarball_b64"`. Base64-encoded archive of the output directory — `.tar.gz` by default, `.zip` when `archive_format: "zip"`
`markdown`, `content_list`, `middle`, `images`	various	Present when `transport: "inline"`. The set of keys reflects the `formats` filter — see below
`tarball_url`, `tarball_url_expires_in`, `bucket_key`, `bucket_bytes`	various	Present when `transport: "s3"`. The archive is `.tar.gz` by default, `.zip` when `archive_format: "zip"`. See Output modes → s3

Each job parses one file. To process many documents, submit them as individual jobs and raise workers_max — RunPod’s queue runs them in parallel across workers (see Scaling).

When `transport: "inline"`

Each entry contains the requested format keys directly. With the default formats=["markdown", "content_list", "middle", "images"]:

Field	Type	Notes
`markdown`	string	The full Markdown rendering of the document
`content_list`	array	Flat list of typed entries: `{"type": "text"\|"equation"\|"table"\|"image"\|"code", "page_idx": int, ...}`. Suitable for RAG chunking
`middle`	object	MinerU’s intermediate representation with layout, bounding boxes, reading order
`images`	object	`{filename: base64-encoded-png-bytes}` for every extracted image

Filtering with `formats`

The formats field whitelists which artifacts the inline payload contains. Omitted keys are absent from the entry — not present-as-empty. Useful when only the markdown matters and the per-page images would otherwise bloat the response.

{
  "input": {
    "file_url": "https://example.com/report.pdf",
    "transport": "inline",
    "formats": ["markdown", "content_list"]
  }
}

The resulting entry has markdown and content_list only; middle and images are not present in the response. Empty list is rejected (formats: [] is a validation error).

For transport: "tarball_b64" and transport: "s3", formats is a no-op: the tarball always carries the full set of four artifacts (filtering inside an archive would be confusing for downstream callers).

Debug observability

Every response includes a top-level debug block with information that lets you correlate a parse to its environment without having to read worker logs:

{
  "debug": {
    "backend": "vlm-auto-engine",
    "model_dir": "/root/.cache/huggingface/hub/models--opendatalab--MinerU2.5-Pro-2605-1.2B/snapshots/<hash>",
    "gpu": {
      "available": true,
      "name": "NVIDIA RTX 4090",
      "compute_capability": "8.9",
      "total_memory_gb": 23.99
    },
    "phase_ms": {
      "fetch_input": 12,
      "mineru_parse": 18420,
      "package": 95
    }
  }
}

Field	Notes
`backend`	The backend that ran. Echoes the input or the default
`model_dir`	Filesystem path of the model snapshot that loaded. Both VLM and pipeline models are baked into the image at `/root/.cache/huggingface/`; the path you see here proves which snapshot the worker resolved (`null` if HF cache is empty, which should not happen on the published image)
`gpu`	Card name, `compute_capability` (8.6 = Ampere, 8.9 = Ada, 9.0 = Hopper, 12.0 = Blackwell), VRAM. Helps debug “why did my job land on a different card than my pool config?”
`phase_ms`	Per-phase timings: `fetch_input` (download/decode), `mineru_parse` (MinerU’s `aio_do_parse`), `package` (tarball or inline assembly)

On failure, debug still contains gpu, model_dir, and whatever phase_ms was collected before the error.

Failure response

When the handler raises or returns an error, the response sets ok: false and includes a top-level error key. RunPod marks the job FAILED in the dashboard based on the presence of that key. There is no results list on a failure response.

{
  "error": "ValueError: must provide exactly one of file_url / file_b64 / volume_path",
  "ok": false,
  "elapsed_seconds": 0.1,
  "mineru_version": "3.4.x",
  "traceback": "Traceback (most recent call last):\n  File ...",
  "debug": {"gpu": {"...": "..."}, "model_dir": "...", "phase_ms": {"...": "..."}}
}

Field	Type	Notes
`error`	string	Type name + message, e.g. `ValueError: ...`
`ok`	bool	Always `false` on failure
`elapsed_seconds`	float	Time before the error
`mineru_version`	string	Version that was running
`traceback`	string	Last 5 frames of the Python traceback; useful for debugging
`debug`	object	Same shape as the success response — `gpu`, `model_dir`, and whatever `phase_ms` was collected before the failure

Progress updates (streaming)

The handler emits runpod.serverless.progress_update events during a parse. Phases:

{"phase": "fetching_input"}
{"phase": "parsing", "input_bytes": 1234567, "input_format": "pdf", "start_page": 0, "end_page": 99}

There is deliberately no packaging phase event: packaging completes milliseconds before the final result is posted, and a progress update that close to completion can arrive after the COMPLETED status and overwrite it, leaving the job stuck IN_PROGRESS.

Consume them via the RunPod SDK’s endpoint.stream(job_id) or the HTTP GET /v2/{endpoint_id}/stream/{job_id} endpoint. The MineruClient Python wrapper does not surface these (it only supports run_sync); use the RunPod SDK directly if you need progress.

Validation behaviour

The handler validates input before doing any work:

Field types and bounds via runpod.serverless.utils.rp_validator. Wrong type, missing required field, out-of-range value, or unknown field name raises ValueError.
XOR transport rule. Exactly one of file_url, file_b64, volume_path must be set. Zero or two raises ValueError.
Basename safety. basename must match [a-zA-Z0-9_-]+; otherwise ValueError.
Inline file size. file_b64 decoded length must be ≤ 20 MB (matching RunPod’s /runsync gateway cap).
Format detection. The first few bytes must match a known signature (PDF / image / OOXML); otherwise ValueError.
formats membership. Each entry must be one of ["markdown", "content_list", "middle", "images"]; empty list is rejected; duplicates are silently collapsed.

All validation errors produce a failure response with error: "ValueError: ..." and the job is marked FAILED.

Source of truth

This page is regenerated by hand when the contract changes. The authoritative source is the docstring at the top of handler.py plus the INPUT_SCHEMA dict and validate_input function in worker/schema.py. If you see a discrepancy, the code wins; please open an issue so the docs page can be updated.