Ship MinerU on RunPod logs to Axiom via OpenTelemetry
Last Updated: 2026-05-26
If you’re running a serverless worker on RunPod, you can’t ssh in to read logs and you can’t run a sidecar agent — the worker scales to zero between jobs. You need to ship logs and metrics off the box during the request lifetime, on the worker’s own time. The mineru-runpod template includes an OpenTelemetry exporter for exactly this, and Axiom is the sink I picked for my own deployment.
This post is the exact env-var layout. If you use a different OTLP backend (Honeycomb, Grafana, Datadog, Jaeger, your own OpenTelemetry Collector), the observability guide covers the vendor-neutral setup; come back here only for the Axiom-specific values.
Setup time from a fresh Axiom account to logs flowing: ~10 minutes, dominated by waiting for the worker’s next cold start.
Why use Axiom for serverless worker observability?
Section titled “Why use Axiom for serverless worker observability?”Axiom ingests OTLP/HTTP directly, charges per event rather than per host, and has no agent to install — which matters when your workers scale to zero between jobs. For a mineru-runpod deployment processing a few hundred PDFs a day, the events fit inside Axiom’s free tier, and the metrics dataset is queryable via APL.
Three concrete reasons it fits this workload:
- OTLP-native ingest. No collector to run inside the container, no daemonset, no Fluentbit config. The mineru-runpod worker calls Axiom’s regional edge endpoint (
https://eu-central-1.aws.edge.axiom.co/v1/logsor the US variant) directly via the OpenTelemetry Python SDK. The serverless model rules out anything that needs a long-running sidecar, so “the exporter IS the integration” is the only model that works. - Per-event pricing instead of per-host. A worker serving 500 PDFs a day emits roughly 5,000 log records and 2,500 spans. That fits comfortably inside Axiom’s free 0.5 GB/month tier. Per-host pricing models (the Datadog and NewRelic shape) penalize the ephemeral-worker pattern.
- APL reads like SPL. Axiom Processing Language sits between Splunk SPL and KQL ergonomically. Filter by attribute, group by backend, drill into a span: the queries you actually run during an incident are easy in APL. No tutorial needed if you’ve used either.
I have no affiliation with Axiom. They’re the backend I picked for my own deployment after looking at Honeycomb, Grafana Cloud, and self-hosted Jaeger.
How do I create Axiom datasets for OpenTelemetry?
Section titled “How do I create Axiom datasets for OpenTelemetry?”Create two datasets in the Axiom UI: one Events dataset (holds both traces and logs) and one Metrics dataset. Those are the only two dataset types Axiom exposes in the UI; metrics are kept separate because they use a different storage format under the hood.
- Sign in to Axiom and go to Datasets → New dataset.
- Create
mineru-eventswith Events type. This holds traces and logs together. - Create
mineru-metricswith Metrics type.
Names are arbitrary — substitute whatever fits your naming convention. I prefix everything with the service name so multiple endpoints (staging, prod, experiments) don’t collide in queries.
How do I generate an Axiom API token for OTLP ingest?
Section titled “How do I generate an Axiom API token for OTLP ingest?”In the Axiom UI, go to Settings → API tokens → Generate new token. Use an Advanced API token (prefix xaat-) and explicitly grant the Ingest scope on both datasets you just created. Forgetting the scope on the metrics dataset is one of the common failure modes — it surfaces as 403 Forbidden in the worker logs while events ingest works fine. Copy the resulting xaat- prefixed string and paste it into your RunPod endpoint config in the next section.
Treat the token like any production secret: paste it only into RunPod’s Environment Variables UI (which encrypts at rest), never check it into git, and rotate when employees with access leave.
If your Axiom workspace is in the EU region, the management API lives at https://api.eu.axiom.co (US is https://api.axiom.co). This is the host you query for token CRUD and REST queries, not the OTLP ingest URL — that’s a separate edge-deployment hostname documented in the env-var section below.
What environment variables ship RunPod worker telemetry to Axiom?
Section titled “What environment variables ship RunPod worker telemetry to Axiom?”Set four environment variables on your RunPod endpoint. OTEL_EXPORTER_OTLP_HEADERS covers both traces and logs (they share the Events dataset); OTEL_EXPORTER_OTLP_METRICS_HEADERS overrides for metrics only because Axiom uses a different header for its metrics ingest.
Paste this into your endpoint’s Environment Variables section in the RunPod dashboard, substituting your token and dataset names:
OTEL_EXPORTER_OTLP_ENDPOINT=https://eu-central-1.aws.edge.axiom.coOTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer xaat-YOUR-TOKEN,x-axiom-dataset=mineru-eventsOTEL_EXPORTER_OTLP_METRICS_HEADERS=Authorization=Bearer xaat-YOUR-TOKEN,x-axiom-metrics-dataset=mineru-metricsOTEL_EXPORTER_OTLP_PROTOCOL=http/protobufThe endpoint URL is your Axiom edge deployment, NOT
api.axiom.co/api.eu.axiom.co. This is the single biggest gotcha and the one that cost me hours when I first set this up. Theapi.*hosts are for management API (token creation, queries via REST). OTLP ingest goes to your workspace’s edge deployment hostname. As of writing, the two are:
- US East 1:
https://us-east-1.aws.edge.axiom.co- EU Central 1:
https://eu-central-1.aws.edge.axiom.coUse the one that matches the region you picked when creating the Axiom workspace. The current full list lives at Axiom’s edge deployments doc. If you send OTLP to
api.{eu.}axiom.co, Axiom returns400 mismatched regionor403 forbiddendepending on path — the OTel SDK logs only the HTTP status code, so you’ll seeFailed to export ... code: 400(or 403) with no clue why. Axiom support flagged this for me after I’d been chasing 403s for an hour against the wrong host.
How the SDK routes each signal:
- Traces + logs → use the generic
OTEL_EXPORTER_OTLP_HEADERS, so spans and log records both land in themineru-eventsdataset. - Metrics →
OTEL_EXPORTER_OTLP_METRICS_HEADERSoverrides for metrics only, with the distinctx-axiom-metrics-datasetheader that Axiom’s metrics ingest requires.
The dataset names in the two *-dataset headers must exactly match the names you created in step 1, and your API token must have ingest scope on both. This is the #1 source of “everything is configured but nothing shows up” — mineru-events and mineru-metrics above are example names, not magic strings. If you named your datasets differently, update the headers to match. Mismatches surface as 404 Not Found (wrong dataset name) or 403 Forbidden (token lacks ingest scope on that dataset). Both fail silently from the caller’s side; the only signal is in the worker’s stdout, where the OTel SDK logs each retry with the HTTP status code.
Three details that trip people up:
Base URL, not the full path. Set the endpoint to the edge-deployment root, e.g. https://eu-central-1.aws.edge.axiom.co, NOT https://eu-central-1.aws.edge.axiom.co/v1/traces. The OpenTelemetry Python SDK appends /v1/traces, /v1/logs, /v1/metrics per signal automatically. If you set the full path, the SDK double-appends and Axiom returns 404.
Header case differs by signal. Events ingest uses x-axiom-dataset. Metrics ingest uses x-axiom-metrics-dataset (different header name, with -metrics- in it). Copying the events headers into OTEL_EXPORTER_OTLP_METRICS_HEADERS as-is sends metrics to the events dataset and Axiom’s metrics view stays empty.
Protocol must be protobuf for metrics. Axiom’s metrics ingest accepts only protobuf, not JSON. The http/protobuf value above is the default the SDK ships with; don’t override it to http/json thinking it’s the safer choice. JSON works for logs and traces but quietly drops metrics.
Save the variables, redeploy any active workers (or wait for the next cold start), and Axiom should see records within ~60 seconds of the first request that hits a warm worker. The metric reader flushes every 10 s; traces and logs flush every 500 ms.
How do I verify OpenTelemetry data is reaching Axiom?
Section titled “How do I verify OpenTelemetry data is reaching Axiom?”Send one request to the worker, then check three views in the Axiom UI: Stream and Traces on the events dataset, and Metrics on the metrics dataset.
In the Axiom UI:
- Stream view → set dataset =
mineru-events→ you should see JSON log records flowing as soon as the worker handles a request. Each carriesservice.name=mineru-runpod,runpod.endpoint_id=<your-endpoint>, andjob_idfor correlation. - Traces view → on the same
mineru-eventsdataset → one trace per RunPod job. The root span ismineru.job. Its children aremineru.fetch_input,mineru.parse, andmineru.package. Themineru.warmupspan shows up once per worker boot. - Metrics view → set dataset =
mineru-metrics→ after the first 10-second flush you should seemineru.jobs.total,mineru.job.duration, the GPU memory gauges, and the rest of the metric catalog.
If nothing arrives in any view, open the worker’s stdout (RunPod dashboard → Logs) and grep for Failed to export. The OTel SDK logs each retry with the HTTP status code:
Failed to export ... code: 404, reason: Not Found— the dataset name in one of the*-datasetheaders doesn’t exist in your Axiom workspace. Rename the dataset or update the env var so they match.Failed to export ... code: 403, reason: Forbidden— the dataset exists but your API token doesn’t have ingest scope on it. Open the token in Settings → API tokens, add Ingest scope on the dataset, and update the secret in RunPod.Failed to export ... code: 403, reason: Forbidden— the #1 cause is usingapi.axiom.coorapi.eu.axiom.coas the endpoint instead of the edge-deployment URL. Confirm by runningcurl -v POST https://api.eu.axiom.co/v1/traces -H "Authorization: Bearer xaat-..." -H "x-axiom-dataset: <yours>" -H "Content-Type: application/x-protobuf" --data-binary ""against your endpoint — if you get 403 there but 422 againsthttps://eu-central-1.aws.edge.axiom.co/v1/traces(or the US edge variant), the URL is the issue. Other 403 causes: token genuinely lacks Ingest scope on the dataset.Failed to export ... code: 400, reason: Bad Request— the dataset resolves and auth works, but the payload is being rejected. Causes: region mismatch (workspace is EU but you’re hittingus-east-1.aws.edge.axiom.co, or vice versa — Axiom returnsmismatched regionin the body), wrong header name on metrics (usex-axiom-metrics-dataset, notx-axiom-dataset), or the dataset was created of the wrong type for the signal.Failed to export ... code: 401, reason: Unauthorized— the API token is wrong or expired. Generate a fresh token in Settings → API tokens and update the env var.- No
Failed to exportlines AND no[mineru-telemetry] init failedeither —OTEL_EXPORTER_OTLP_ENDPOINTis empty or the worker hasn’t cold-started since you set it. RunPod env-var changes only take effect on the next cold start; warm workers keep the previous values.
Which APL queries help debug a mineru-runpod worker?
Section titled “Which APL queries help debug a mineru-runpod worker?”APL queries answer the three operational questions that come up most: which errors fired in the last hour, which parses are slowest, and how throughput breaks down by backend or endpoint. All three queries hit the events dataset, where both traces and logs land. The templates below are starting points — adjust attribute paths to match how your Axiom workspace unrolls OTLP resource and span attributes.
Recent errors, grouped by error type:
['mineru-events']| where ['service.name'] == "mineru-runpod"| where ['severity_text'] == "ERROR"| where _time > ago(1h)| summarize count() by tostring(['attributes.error_type'])| sort by count_ descSlowest parses in the last hour:
['mineru-events']| where ['name'] == "mineru.parse"| where _time > ago(1h)| project _time, duration = ['duration'], backend = ['attributes.mineru.backend'], input_format = ['attributes.mineru.input_format']| sort by duration desc| take 20Throughput by endpoint over time:
['mineru-events']| where ['name'] == "mineru.job"| where _time > ago(24h)| summarize jobs = count() by endpoint_id = tostring(['resource.runpod.endpoint_id']), bin(_time, 5m)| render timechartAttribute paths in APL depend on how Axiom unrolls OTLP records — ['resource.runpod.endpoint_id'] works in my workspace but yours may need ['runpod.endpoint_id'] directly. Run a quick | take 5 | project * against the dataset first to see the actual field names your workspace produces.
Does enabling OpenTelemetry slow down cold starts?
Section titled “Does enabling OpenTelemetry slow down cold starts?”Enabling OTel adds roughly 200–500 ms to the first-ever cold start (one-time SDK init plus DNS resolution for the OTLP endpoint). Subsequent FlashBoot snapshot restores on the same host inherit the warm state, so the cost amortizes per host, not per request. For a parse that takes 5–30 seconds wall-clock, the overhead is invisible.
The numbers from my own deployment: on an RTX 4090 with vlm-auto-engine, the bare cold start is ~110 s (image pull + vLLM init + model load + warmup parse). OTel init adds 200–500 ms on top of that. On a FlashBoot-restored boot, the same overhead is 0 — the snapshot captures the initialized SDK along with the rest of the process state. See the FlashBoot mechanism investigation for how the snapshot path actually works.
If you’re cost-sensitive about cold starts and don’t need observability on every deployment, leave OTEL_EXPORTER_OTLP_ENDPOINT unset. The mineru-runpod worker skips the OTel SDK import entirely when that variable is empty — zero overhead, zero behavior change. Flip it on for the endpoints where you actually want the visibility (production, staging) and leave it off for experimentation runs.
Where does this setup fall down?
Section titled “Where does this setup fall down?”Three real limitations: Axiom’s free tier caps ingest at 0.5 GB/month and 30-day retention; the GPU gauges emit one time series per device label per metric and cardinality adds up; and OTLP/HTTP export adds modest latency on cold starts (200–500 ms). None of these are blockers for a small or mid-volume serverless deployment, but they’re the things to watch as volume grows.
Free-tier ceilings. Axiom’s free plan is generous (0.5 GB ingest/month, 30-day retention) but you can blow through it with a chatty worker. A debug-logging worker processing 1,000 PDFs/day at ~30 KB of logs per parse hits ~900 MB/month — past the cap. Either keep the log level at info (the default) or move to Axiom’s paid tier (currently $25/month for 5 GB ingest).
Metric cardinality. The GPU gauges (mineru.gpu.memory_used_bytes, mineru.gpu.utilization_percent) emit one time series per device label per gauge per worker. A multi-GPU worker times multiple worker instances times four GPU metrics multiplies fast. Axiom’s metrics pricing is per-event rather than per-series, so this is a “watch the bill” concern rather than a hard limit. If you scale to dozens of concurrent workers, drop the device label or sample less frequently.
Cold-start latency. OTel init adds 200–500 ms on first-ever boot. For most workloads this is dominated by the existing ~110 s of vLLM init, so it doesn’t matter. If you’re optimizing the cold start specifically (chatbot-style low-latency workloads, for example), benchmark with and without OTel before committing.
Logs are mirrored, not exclusive. The worker still writes stdout JSON to RunPod’s dashboard regardless of OTel. That’s deliberate: RunPod’s UI remains a working fallback when the OTel pipeline misbehaves. The cost is paying twice for log storage if you care about long retention in both places. Most teams don’t, and the duplication is the price of the dashboard fallback.
Does Axiom support OpenTelemetry natively?
Section titled “Does Axiom support OpenTelemetry natively?”Yes. Axiom ingests OTLP/HTTP traces, logs, and metrics on /v1/traces, /v1/logs, /v1/metrics paths — but the base hostname is your region’s edge deployment, not api.axiom.co. The two as of writing are https://us-east-1.aws.edge.axiom.co and https://eu-central-1.aws.edge.axiom.co (full list at Axiom’s edge deployments doc). The OpenTelemetry Python SDK in the mineru-runpod worker speaks this directly with no Collector or agent in between. See Axiom’s OpenTelemetry docs for the full list of supported signals and headers.
Why aren’t my metrics showing up in Axiom?
Section titled “Why aren’t my metrics showing up in Axiom?”Three common causes. First: OTEL_EXPORTER_OTLP_METRICS_HEADERS uses the wrong header key. Axiom needs x-axiom-metrics-dataset (lowercase, with -metrics-dataset), distinct from the x-axiom-dataset header used for logs and traces. Second: the API token doesn’t have ingest scope on the metrics dataset — common when you create the token with only the logs dataset selected, then later add the metrics one. Surfaces as 403 Forbidden in the worker stdout. Third: OTEL_EXPORTER_OTLP_PROTOCOL is set to http/json and Axiom’s metrics endpoint accepts only protobuf.
What’s the cheapest way to observe a RunPod serverless worker?
Section titled “What’s the cheapest way to observe a RunPod serverless worker?”For ≤1,000 jobs/day with structured logs at info level, Axiom’s free tier (0.5 GB/month, 30-day retention) is the cheapest path with real query power. The OTel SDK is already in the mineru-runpod image; setup is four env vars on the endpoint. RunPod’s own log dashboard is free but lacks query language, metrics, and traces — fine for triage, not for SLO tracking.
Can I send traces and logs to the same Axiom dataset?
Section titled “Can I send traces and logs to the same Axiom dataset?”Yes — that’s the standard setup. Axiom exposes only two dataset types in the UI (Events and Metrics), and the Events type holds both traces and logs. The configuration above routes traces + logs to one events dataset via OTEL_EXPORTER_OTLP_HEADERS, and metrics to a separate metrics dataset via OTEL_EXPORTER_OTLP_METRICS_HEADERS. No per-signal traces override is needed.
How much does OTLP/HTTP export add to cold start time?
Section titled “How much does OTLP/HTTP export add to cold start time?”200–500 ms on the first-ever cold start of a worker on a host. That’s one-time SDK init and DNS resolution for the OTLP endpoint. Subsequent FlashBoot snapshot restores on the same host pay zero overhead — the initialized SDK is captured in the process snapshot. On a baseline 110 s cold start (vLLM + model load + warmup), the OTel cost is invisible.
Why isn’t api.axiom.co the OTLP endpoint?
Section titled “Why isn’t api.axiom.co the OTLP endpoint?”Because Axiom splits two concerns onto two hostnames: api.{eu.}axiom.co is the management API (token CRUD, REST queries, dashboards), while OTLP ingest goes to the edge deployment hostname for your workspace’s region. The split isn’t obvious from the OpenTelemetry side because most other backends expose ingest on the same hostname as the management API. Axiom’s own OpenTelemetry guide and edge deployments doc document the edge URLs, but it’s easy to miss if you start from a generic OTel tutorial.
Does enabling OpenTelemetry require an OpenTelemetry Collector?
Section titled “Does enabling OpenTelemetry require an OpenTelemetry Collector?”No. The mineru-runpod worker uses the OpenTelemetry Python SDK with the OTLP/HTTP exporter, talking directly to Axiom’s ingest endpoint. A Collector is useful when you want to fan out to multiple backends, apply sampling rules, or buffer locally — none of which apply to a single-sink serverless worker shipping into Axiom.
Same template, different backend
Section titled “Same template, different backend”The four env vars above are the only Axiom-specific configuration in this whole setup. Swap the URL and headers and the mineru-runpod worker ships the same logs, traces, and metrics to anything that speaks OTLP/HTTP — Honeycomb, Grafana Cloud, Datadog’s OTLP intake, your own OpenTelemetry Collector. Most other backends don’t even need the metrics-headers override that Axiom requires — a single OTEL_EXPORTER_OTLP_HEADERS value covers all three signals. The observability guide covers the vendor-neutral env-var layout and the metric catalog. Different backends, same template.
If this saved you time, the easiest way to say thanks is signing up for RunPod through this link. Star the repo on GitHub for updates.
Disclosure: RunPod links in this post use a referral code that credits me at no cost to you. The post would read the same without it.