Troubleshooting
When something doesn’t work, the worker tries hard to fail loudly. Every response (success and failure) includes a debug block with what backend ran, which model loaded, what GPU the worker landed on, and per-phase timings. Start there.
How to read the debug block
Section titled “How to read the debug block”{ "debug": { "backend": "vlm-auto-engine", "input_format": "pdf", "model_dir": "/root/.cache/huggingface/hub/models--opendatalab--MinerU2.5-Pro-2605-1.2B/snapshots/<hash>", "gpu": { "available": true, "name": "NVIDIA RTX 4090", "compute_capability": "8.9", "total_memory_gb": 23.99 }, "phase_ms": { "fetch_input": 12, "mineru_parse": 18420, "package": 95 } }}What to look at:
| Field | What’s wrong if it’s surprising |
|---|---|
backend | The string passed to MinerU. If you set pipeline but see vlm-auto-engine, your caller isn’t sending what you think it is |
input_format | Auto-detected from bytes. If you uploaded a PDF and see unknown, your transport returned an error page (HTML), not the file body |
model_dir | Filesystem path of the snapshot that actually loaded. Both VLM and pipeline models are baked into the image at /root/.cache/huggingface/; if this is null on a successful job, the image build skipped the model bake step (unusual — file a bug) |
gpu.compute_capability | 8.6 = Ampere (3090, A5000, A6000), 8.9 = Ada (4090, RTX 6000 Ada), 9.0 = Hopper (H100), 12.0 = Blackwell — VLM will crash |
phase_ms.fetch_input | If hundreds of seconds on a file_url job, the source URL is slow / failing |
phase_ms.mineru_parse | Per-page guidance for warm workers, highly GPU- and content-dependent: MinerU upstream cites ~0.5 s/page (2.12 fps) on an A100 for the VLM backend; we measured a range of ~1 s/page on uniform multi-page reports up to ~10 s/page on dense financial forms on an A5000 24 GB (≈3.5 s/page is a reasonable single-number estimate) under the default gpu_memory_utilization=0.5. Pipeline backend is ~3–5 s/page across GPUs (CPU-bound for layout, GPU-bound only for OCR). First call on a fresh worker is much higher (model load + vLLM warmup adds ~90–130 s for the VLM backend). If a warm-worker call is 5× the expected per-page number for your GPU and content type, you’re memory-bound and vLLM is swapping |
The worker also emits structured log lines visible in RunPod’s worker log viewer — see reading worker logs below.
Hub build fails on the validator test pod
Section titled “Hub build fails on the validator test pod”After every push, RunPod’s Hub builds the image and then spins up a real GPU pod to execute .runpod/tests.json. The image is fine; the test pod fails. Three failure modes account for almost everything we’ve seen here:
“Pod could not be created”
Section titled ““Pod could not be created””Pod could not be created: This machine does not have the resources to deploy your pod. Please try a different machine.Cause: RunPod can’t allocate the gpuTypeId declared in .runpod/tests.json during the build window. The Docker image is fine — RunPod just couldn’t find a free host of that type.
Fix: switch gpuTypeId to a higher-availability pool. The template currently uses "NVIDIA GeForce RTX 4090" because it has the best pool availability across RunPod’s regions; "NVIDIA RTX A5000" works too but tends to be scarcer. Re-trigger the build after editing.
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.9
Section titled “nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.9”Error response from daemon: failed to create task for container: ...nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.9,please update your driver to a newer version, or use an earlier cuda containerCause: the container’s CUDA floor (12.9, inherited from vllm/vllm-openai:v0.11.2) is higher than the CUDA version the host driver exposes. RunPod scheduled the test pod on a host that satisfied allowedCudaVersions on paper but doesn’t actually meet the container’s prestart-hook requirement.
The trap: allowedCudaVersions tells RunPod “the worker accepts these driver CUDA versions.” If older versions are listed there, RunPod is free to schedule on older-driver hosts, and the container’s own requirement labels then reject the host at prestart. Result: intermittent failures (depends which host got picked).
Fix: keep allowedCudaVersions in both .runpod/tests.json and .runpod/hub.json aligned with the actual minimum the container needs. For the current vLLM v0.11.x base, that’s ["13.0", "12.9"]. Don’t pad the list with older versions just because they look harmless — every entry that the container can’t actually run on is a future flake.
If you bump vLLM, re-check the CUDA floor from upstream’s release notes (vLLM v0.11.0 was the bump to CUDA 13).
Build timeout (30 minutes)
Section titled “Build timeout (30 minutes)”Build exceeded maximum time limit of 1800 seconds (30.0 minutes). Build terminated.Cause: RunPod’s build pipeline has a hard 30-minute ceiling. The image bakes ~4 GB of model weights (MinerU VLM + PDF-Extract-Kit) and installs vLLM + Torch on top; on a slow build-region day, those steps alone can blow past the cap.
Fix: the Dockerfile already uses hf-xet with HF_XET_HIGH_PERFORMANCE=1 for fast model bakes, and the two model downloads are split into separate RUN layers so a partial cache survives retries. If you still time out:
- Re-trigger the build (often a transient HF-egress slowdown)
- Pin a smaller VLM model via
MINERU_VL_MODEL_NAMEfor http-client backends — doesn’t help here since the bake is unconditional - As a last resort, drop one of the baked models and rely on RunPod’s per-endpoint model cache (one model only — see the model caching docs)
Escape hatch: skip the validator entirely
Section titled “Escape hatch: skip the validator entirely”If a release is urgent and the Hub validator is the only thing blocking it, rename .runpod/tests.json to .runpod/tests_.json in the repo (underscore suffix). The Hub validator looks for the exact filename tests.json; the renamed file is invisible to it and no test pod is scheduled. Rename it back when the underlying issue is resolved.
This loses all CI signal — only use it as a temporary unblock, not a default.
VLM backend crashes on Blackwell GPUs (cc=12.0)
Section titled “VLM backend crashes on Blackwell GPUs (cc=12.0)”Symptom: worker logs show successful model load followed by:
compute_capability: 12.0 >= 8.0INFO Starting to load model .../MinerU2.5-Pro-2605-1.2B/...INFO Model loading took 2.16 GiB and 0.36 secondsCUDA error (...flash-attention/hopper/flash_fwd_launch_template.h:188): invalid argumentdebug.gpu.compute_capability in the response is 12.0 (e.g. NVIDIA RTX PRO 6000 Blackwell Server Edition MIG 1g.24gb).
Cause: xformers / flash-attention in vllm 0.11.2 (our base image) ships kernels for Ampere (8.x), Ada (8.9), and Hopper (9.0) — no Blackwell-SM120 code path. On Blackwell, xformers misroutes to the Hopper kernel and crashes during VLM model init.
Why we can’t just bump vllm: MinerU 3.2.x’s pyproject pins vllm>=0.10.1.1,<0.12. The first vllm release with any Blackwell mention in its notes is v0.13.0 (2025-12-19, SM103 / GB300 “Blackwell Ultra”); broader Blackwell coverage follows in v0.14+. All Blackwell-aware vllm versions sit above MinerU’s <0.12 ceiling, so until MinerU loosens that pin we’re stuck on v0.11.x, which has no SM120 kernel path. The v0.6.6 → v0.11.2 bump did not change this because v0.11.2 only adds SM100 (data-center Blackwell B200/GB200) MoE-prep code, not the SM120 (consumer 5090 / PRO 6000 Blackwell) flash-attn paths the VLM uses.
Fix: keep the default gpuIds: "ADA_24,AMPERE_24,AMPERE_48" — all unambiguously pre-Blackwell. If you’ve manually opted into ADA_48_PRO, that pool can mix in Blackwell SKUs since RunPod groups them under the same pool name — remove it from your endpoint’s GPU pool list. For workloads that need a 48 GB Ada-or-newer card, the pipeline backend doesn’t use xformers/flash-attn and is unaffected, so it runs on Blackwell fine; only the VLM/hybrid backends crash. See Choosing a GPU for the GPU-pool background.
”Pod scaled to zero” but the next job has a noticeable cold start
Section titled “”Pod scaled to zero” but the next job has a noticeable cold start”Symptom: spiky workload. First job after a quiet period takes a long time; subsequent jobs in the same window are fast.
Cause: RunPod tears down the worker after idle_timeout seconds of inactivity. The next request spins a fresh worker — the cost is unpacking the image, loading the model into VRAM, and (for the VLM backend) JIT-compiling vLLM kernels.
Expected magnitudes (measured on RTX A5000):
| Scenario | First-job latency |
|---|---|
| Warm worker, VLM, on A100 (per MinerU upstream) | ~0.5 s/page (2.12 fps) |
| Warm worker, VLM, on A5000 24 GB (our measurement) | ~1 s/page (uniform reports) – ~10 s/page (dense forms) |
| Warm worker, pipeline (any GPU ≥ 4 GB) | ~3–5 s/page |
| FlashBoot happy path — host reuse, snapshot restored | ~7–8 s wall-clock — model + engine restored from snapshot |
| FlashBoot cold path — new host, image cached | ~110 s — fresh boot, warmup runs (1× per host) |
Cold worker, no warmup (MINERU_SKIP_WARMUP=1) | ~110–130 s per request after every scale-from-zero (no per-host amortization) |
| Cold worker, pipeline backend, no warmup | ~10–15 s per request (lighter; no vLLM warmup) |
| Brand-new worker host (no image cached) | +3–5 min for the initial image pull, on top of whichever path above applies |
Per-phase cold-start breakdown (VLM, A5000 24 GB)
Section titled “Per-phase cold-start breakdown (VLM, A5000 24 GB)”If you’re tracking why a cold start takes ~110 s, here’s the live measurement we captured against the deployed template. Times are wall-clock between consecutive log entries from RunPod’s worker log viewer, totals approximate ±2 s:
| Phase | Time | Source log line |
|---|---|---|
| Worker boot + 7 fitness checks (CUDA, GPU, network, disk, memory) | ~3 s | --- Starting Serverless Worker --- → All fitness checks passed. |
| Queue dispatch + RunPod SDK ready | ~5 s | All fitness checks passed. → Started. |
MinerU lazy import → Using vllm-async-engine selection | <1 s | Started. → mineru.utils.engine_utils:get_vlm_engine — Using vllm-async-engine |
vLLM engine config + model path resolve (HF_HUB_OFFLINE lookup, arch detection) | ~19 s | → arg_utils.py:592 HF_HUB_OFFLINE is True |
| Model weight load (1 safetensors shard, 2.16 GiB → VRAM) | ~21 s | gpu_model_runner.py:3338 Model loading took 2.1601 GiB memory and 21.4 seconds |
torch.compile (Dynamo + Inductor, dynamic shape) | ~25 s | monitor.py:34 torch.compile takes 25.54 s in total |
| KV cache profile + budget allocation | ~2 s | gpu_worker.py:359 Available KV cache memory: 8.17 GiB |
| CUDA graph capture (35 mixed prefill-decode + 19 decode-FULL) | ~3 s | gpu_model_runner.py:4244 Graph capturing finished in 2 secs, took 0.27 GiB |
| vLLM engine init total (sum of phases above) | 34.22 s | core.py:250 init engine (profile, create kv cache, warmup model) took 34.22 seconds |
| MinerU’s wrapper-level total (includes vLLM init plus its own setup) | 100.63 s | mineru.backend.vlm.vlm_analyze:get_model — get vllm-async-engine predictor cost: 100.63s |
| Actual page parse (single page) | ~6 s | VLM processing window 1/1 → response delivered |
| End-to-end cold start (queue → response) | ~108 s | Jobs in queue: 1 → response |
| Subsequent warm-worker parse, same page count | ~6 s | mineru_parse phase_ms |
Headline observations from this run:
- vLLM engine init dominates (
34 sof the100 sMinerU wrapper time). Of that,torch.compileis25 s— the single biggest cost. - Model weight load is only
21 sdespite being 2.16 GiB. The image bakes the model into/root/.cache/huggingface/, so this is a local FS read, not a network download. Available KV cache memory: 8.17 GiBon a 24 GB A5000. That’s vLLM’s KV budget after model + activations + reserve. Constraining factor forMINERU_MAX_CONCURRENCYif you try to raise it above1.Maximum concurrency for 8,192 tokens per request: 87.13x— vLLM’s in-engine batch ceiling on this hardware. Different from our per-worker concurrency knob; this is sequences per single vLLM forward pass.
Fix: not a bug, but levers if it’s a problem:
-
Bump
idle_timeout(template default is 10 s) — workers stay warm longer, you pay for that time -
Set
workers_min=1— at least one worker is always warm, you pay 24/7 for it -
Enable RunPod’s FlashBoot explicitly in the endpoint config (it’s on by default for templates from the Hub)
-
Eager warmup is now active by default. The worker runs one throwaway parse against the baked test fixture during boot, before
runpod.serverless.start()claims the event loop. This loads the MinerU model into VRAM and JIT-compiles vLLM kernels, so the first real request lands on a warm engine. Look for[mineru-warmup] starting (backend=... lang=... fixture=/worker/test-fixture.pdf)then[mineru-warmup] done in Nsin the worker logs. To disable (e.g., for debugging cold-start ordering), setMINERU_SKIP_WARMUP=1on the endpoint.Tune via env vars on the endpoint:
MINERU_WARMUP_BACKEND(defaultvlm-auto-engine) — which backend to warm. Must match the backend most callers will use; warmingvlm-auto-enginebut servingpipelinerequests means the first pipeline call still pays cold-start.MINERU_WARMUP_LANG(defaulten) — only meaningful for the pipeline backend; VLM ignores it.MINERU_SKIP_WARMUP=1— bypass entirely (worker falls back to lazy load on first request, ~100s tax).
Expected post-warmup cold-start latency for the first request: ~7–8 s wall-clock on A5000 (measured 2026-05-26 against the deployed template). Breakdown: ~3 s of FlashBoot snapshot restore + ~5 s of parse on a fully-warm engine. FlashBoot empirically captures Python process memory + CUDA VRAM + the vLLM engine subprocess — see FlashBoot mechanism below for the full analysis.
FlashBoot mechanism (confirmed)
Section titled “FlashBoot mechanism (confirmed)”We confirmed on 2026-05-26 that FlashBoot is process-snapshot based (CRIU or functional equivalent), and that snapshots are scoped per (host, image-SHA) — not per endpoint. Each worker host maintains its own snapshot store. When RunPod’s scheduler picks a host that has run this image before, you get a fast restore. When it picks a new host, the worker re-runs warmup once.
The four-request investigation that pinned this down. Same short single-page PDF, same parameters every time, worker scaled to zero between every request:
| # | Wall-clock | Host | Snapshot? | What the worker log showed |
|---|---|---|---|---|
| 1 | 456 s | A (post-rebuild, fresh image pull) | none | Full cold path: image pull → fitness checks → [mineru-warmup] done in 101.0s → parse 5.6 s |
| 2 | 7.6 s | A (same as R1) | yes (post-R1) | Zero boot logs. Went straight from Jobs in queue: 1 to "starting job". No [mineru-warmup] line. |
| 3 | 122 s | B (different host) | none | Image cached on B, but fresh process: fitness checks + [mineru-warmup] done in 101.5s + parse 5.6 s |
| 4 | 7.4 s | B (same as R3) | yes (post-R3) | Same pattern as R2 — snapshot restore, no boot logs |
Worker identity is visible in the logs three ways: the EngineCore_DP0 pid=NNN line (different per container), the distributed_init_method=tcp://192.168.X.X pod-internal IP, and the request-id -u1 / -u2 suffix (RunPod’s region/partition identifier). All three agreed: R1+R2 were the same pod; R3+R4 were a different same pod.
The per-host model:
FlashBoot lookup = (worker host, image SHA)- match → restore snapshot in ~3 s, parse in ~5 s → ~7-8 s wall-clock- no match → fresh boot, run fitness checks + warmup → ~110 s wall-clockWhat gets preserved on a successful restore: Python interpreter state, MinerU’s in-memory engine handle, vLLM’s AsyncLLMEngine subprocess (PID persists), CUDA VRAM (model weights + KV cache + CUDA graphs), torch.compile cache, and the boot-time signal handlers.
Practical implications:
- The boot-time warmup pays off per host that the worker visits, not once per endpoint or once forever. Each new host pays the warmup tax once; every subsequent restore on that same host is fast.
- Snapshot invalidation: the obvious triggers are image rebuild (new SHA),
MINERU_SKIP_WARMUP=1, and presumably eventual eviction after long idle. RunPod doesn’t document the eviction policy. - Either way, the per-request cold tax is gone — even the slow case (~110 s) is the worker boot paying it once, not every request paying it.
What controls which path you’ll see:
| Scenario | Likely outcome |
|---|---|
workers_min ≥ 1 | Worker stays on its host — every request is on a fully warm worker (~5 s parse, no cold start at all) |
| High-frequency endpoint, workers scale up and down fast | Same hosts get re-selected — most cold starts are happy-path restores (~7 s) |
| Quiet endpoint, infrequent requests, long idle gaps | RunPod’s scheduler may pick a different host — some cold starts will be on new hosts (~110 s) |
| First request after a rebuild | Always cold path — every endpoint’s first request after a fresh image pays ~5-7 min (image pull) + ~110 s (warmup). One-time cost per worker host. |
MINERU_SKIP_WARMUP=1 | Every cold start is ~110-130 s; no per-host amortization. Don’t do this in production. |
CUDA out of memory
Section titled “CUDA out of memory”Symptom: handler errors with CUDA out of memory mid-parse. debug.gpu.total_memory_gb shows your card has fewer GB than the workload needs.
Cause: vLLM allocates KV cache upfront based on gpu_memory_utilization (default 0.5). On a 24 GB card this targets ~12 GB. If concurrency or document size pushes KV usage above the budget plus model weights (~2.2 GB) plus activations (~0.75 GB), you OOM.
Fix:
- Bump to a 48 GB pool for the affected workload (
AMPERE_48) - Switch to the pipeline backend which doesn’t use vLLM and is documented at 4 GB minimum VRAM (per MinerU’s hardware compatibility table), regardless of doc length
- Reduce concurrency (
concurrencyModifieron the endpoint) so fewer pages are in flight at once - For one-off huge docs, set
backend: "pipeline"per-job — same worker can handle small docs with VLM and giant docs with pipeline
See Choosing a GPU for the VRAM math.
unexpected handler return type: <class 'NoneType'> after a successful parse
Section titled “unexpected handler return type: <class 'NoneType'> after a successful parse”Symptom: the client raises MineruClientError: unexpected handler return type: <class 'NoneType'> (or run_sync returns None directly). Worker logs show the handler completed successfully — [mineru-worker] done: elapsed=Xs phase_ms={...} — immediately followed by:
"Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/<endpoint>/job-done/<worker>/<request>?gpu=...&isStream=false'"Cause: RunPod’s /runsync gateway caps the response payload at ~20 MB. The worker built a valid result; when it tried to POST it back via /job-done, the gateway returned HTTP 400 and discarded it. The SDK then sees no output → None → our client raises this error.
Triggers (measured on a real 82-page PDF):
return: "inline"— markdown + content_list + middle.json + base64 images add up fast; ~80 pages with embedded images was enough to exceed the cap.return: "tarball_b64"— gzip compresses the JSON, but the images inside the tarball are already raster bytes, so it often doesn’t fit either. Confirmed same failure on the same doc.
Fix: use return: "s3" for large outputs. The worker uploads the .tar.gz to an S3-compatible bucket and returns only a small presigned URL (~1 h TTL) — no gateway cap in the path.
{ "input": { "file_url": "https://example.com/big.pdf", "return": "s3" }}Configure the bucket via these env vars on the endpoint (not the template):
| Env var | Cloudflare R2 example |
|---|---|
BUCKET_ENDPOINT_URL | https://<account-id>.r2.cloudflarestorage.com |
BUCKET_NAME | your bucket name |
BUCKET_ACCESS_KEY_ID | R2 API token access key |
BUCKET_SECRET_ACCESS_KEY | R2 API token secret |
BUCKET_REGION (optional) | auto for R2 |
The Python client handles the rest via client.save_s3_tarball(result, dest_dir) — it follows result["tarball_url"], downloads the .tar.gz, and extracts. From curl, the response includes tarball_url, tarball_url_expires_in, and bucket_key; download within the TTL.
If you can’t wire S3, the fallback is page chunking: split with start_page / end_page into segments small enough that each tarball fits under the cap, then concatenate the .md files client-side. Slower (two cold starts if workers go cold between calls) but no infra changes.
Worker returns ValueError: input bytes do not match any supported format
Section titled “Worker returns ValueError: input bytes do not match any supported format”Symptom: the response has ok: false and the above error message.
Cause: the worker’s _detect_format checked the first ~16 bytes against known magic numbers (%PDF, \x89PNG, PK\x03\x04 for OOXML, etc.) and didn’t match anything.
Most common reasons:
file_urlreturned an HTML error page instead of the file. The URL is wrong, expired, or behind auth. The response body starts with<!DOCTor<html.file_b64was double-encoded or not base64. The decoded bytes are random.volume_pathpoints at a file that exists but isn’t a supported format (e.g. a.csvor.txt). MinerU doesn’t accept plain text — convert to PDF first.
Fix: verify the bytes. Download from your file_url directly with curl, run file on the result, or check that base64 -d < input.b64 | xxd | head shows the right magic bytes.
Job times out before the parse finishes
Section titled “Job times out before the parse finishes”Symptom: MineruClientError: endpoint transport failed: timeout after some number of seconds (default 900 s in the client, configurable via timeout=).
Cause: large documents take longer than your client-side timeout, or longer than the endpoint’s executionTimeoutMs.
Fix:
- Client-side: pass a larger
timeouttoparse_document(timeout=3600, ...) - Endpoint-side: bump
--execution-timeoutindeploy.py(defaults to 900 s). Use--execution-timeout 3600for full books. - Per-page math (warm worker, GPU- and content-dependent): MinerU upstream cites ~0.5 s/page for the VLM backend on an A100. We measured ~1–10 s/page on an A5000 24 GB depending on content density (uniform multi-page reports run fast; dense financial forms run slow). Pipeline ≈ 3–5 s/page across GPUs. A 1000-page book on pipeline = ~3000–5000 s; on VLM-on-A5000 ≈ ~1000–10000 s depending on content; on VLM-on-A100 ≈ ~500 s. Add 90–130 s for the first call on a cold worker if VLM (model load + vLLM warmup) — the cold-start tax is paid once per worker, not per page.
Reading worker logs
Section titled “Reading worker logs”The worker emits one JSON object per line by default (set LOG_FORMAT=text
for human-readable output during local development). RunPod’s log viewer
shows them as-is. To filter in CloudWatch, Loki, Axiom, or any other JSON
log sink, key off the level, msg, and any of the structured fields.
Typical lines on a successful job:
{"ts":"2026-05-25T18:30:42.103Z","level":"info","logger":"mineru-worker","msg":"starting job","job_id":"queued-uuid-abc","backend":"vlm-auto-engine","lang":"en","start_page":0,"end_page":4,"gpu_name":"NVIDIA RTX 4090","compute_capability":"8.9"}{"ts":"2026-05-25T18:30:48.612Z","level":"info","logger":"mineru-worker","msg":"done","job_id":"queued-uuid-abc","elapsed_seconds":6.51,"phase_ms":{"fetch_input":12,"mineru_parse":6420,"package":79},"model_dir":"/root/.cache/huggingface/hub/.../snapshots/<hash>","refresh_worker":false}On failure:
{"ts":"2026-05-25T18:30:42.789Z","level":"error","logger":"mineru-worker","msg":"job failed","job_id":"queued-uuid-abc","error_type":"ValueError","error_message":"input bytes do not match any supported format","phase_ms":{"fetch_input":8}}Key fields:
| Field | Meaning |
|---|---|
level | debug, info, warning, error |
logger | Always mineru-worker for handler emissions |
msg | Stable identifier for the event — safe to alert on |
job_id | RunPod’s job UUID. Use this to correlate all lines from one request, especially when a worker handles multiple jobs in sequence. <unknown> when a sync caller submitted without a queued ID |
phase_ms | Per-phase timings (fetch_input, mineru_parse, package) |
backend, lang, start_page, end_page | Echoed from the job input — handy for correlating with the request |
refresh_worker | true if the worker is recycling after this job (scaling guide) |
Cancellation and worker recycling
Section titled “Cancellation and worker recycling”Symptom: a job appears in RunPod’s queue, then disappears from the
worker logs mid-parse with no done line.
Cause: RunPod sent SIGTERM to the worker — either because the
endpoint scaled down due to idle timeout, you triggered a recycle from
the dashboard, or the worker requested a refresh via the refresh_worker
response flag.
When SIGTERM arrives, the worker logs:
{"ts":"...","level":"warning","logger":"mineru-worker","msg":"sigterm received, draining current job"}What gets honored:
- Cancellations between request phases (
fetch_input→parse→package). The next phase check raisesRuntimeError: worker shutting downand the job returnsok: falsewith that error. - The graceful drain handled by RunPod’s SDK on currently-in-flight jobs.
What does NOT get honored:
- Cancellation mid-
aio_do_parse. The vLLM forward pass is a blocking GPU call from asyncio’s point of view; interrupting it would corrupt the engine state. The worker finishes the current document even after SIGTERM, then exits cleanly.
If you need hard cancellation guarantees mid-parse, that’s an upstream MinerU feature request, not a template-level fix.
Getting more help
Section titled “Getting more help”If your symptom isn’t here:
- Pull the worker logs from RunPod’s dashboard — look for
[mineru-worker]lines and any tracebacks - Check the
debugblock in the response - Open a bug report — the issue template asks for the response and the GPU pool, both of which let you diagnose 90% of issues at a glance
- Parsing accuracy issues (output is structurally fine but wrong content) belong upstream at opendatalab/MinerU — they’re MinerU’s responsibility, not this template’s