Skip to content

JSON Output

--output json is the stable machine-readable output for automation.

Text output is optimized for terminal use and is not treated as a stable report format.

Examples

Default JSON output:

{
  "schema_version": "1",
  "metadata": {
    "generated_at": "2026-06-01T10:30:00+00:00",
    "target": {
      "model_name": "meta-llama/Llama-3.1-8B",
      "since": "5m",
      "client_mode": "prometheus"
    }
  },
  "health": "warning",
  "notices": [],
  "checks": [
    {
      "id": "replica_imbalance",
      "name": "Replica Imbalance",
      "finding": {
        "severity": "warning",
        "confidence": "high",
        "title": "Replica imbalance",
        "summary": "Load is unevenly distributed across replicas — one replica is doing more work than its peers.",
        "evidence": ["running vllm-1=10 vs vllm-0=2; cache 94% vs 41%; waiting vllm-1=7 vs vllm-0=0"],
        "likely_causes": ["Load balancer not distributing requests evenly (sticky sessions or connection reuse)"],
        "recommendations": ["Check the load balancer / service routing and session affinity settings"],
        "related_metrics": ["vllm:num_requests_running"]
      }
    },
    {
      "id": "queue_pressure",
      "name": "Queue Pressure",
      "finding": {
        "severity": "warning",
        "confidence": "low",
        "title": "Queue pressure",
        "summary": "Requests are queuing faster than the server can process them.",
        "evidence": ["Waiting requests: 7"],
        "likely_causes": ["Insufficient replica capacity for current traffic"],
        "recommendations": ["Add replicas or increase concurrency limits"],
        "related_metrics": ["vllm:num_requests_waiting"]
      }
    }
  ]
}

When several deployments share one Prometheus target, replica imbalance evidence is prefixed with the model, e.g. "llama-70b: running vllm-1=10 vs vllm-0=2", and a separate line is emitted per affected model.

Verbose JSON includes observed metrics:

{
  "schema_version": "1",
  "metadata": {
    "generated_at": "2026-06-01T10:30:00+00:00",
    "target": {
      "model_name": null,
      "since": "5m",
      "client_mode": "prometheus"
    }
  },
  "health": "warning",
  "notices": [],
  "checks": [],
  "metrics": {
    "num_requests_running": {
      "value": 12,
      "by": {
        "pod": {
          "vllm-0": 2,
          "vllm-1": 10
        }
      }
    },
    "kv_cache_usage_perc": {
      "value": 0.94,
      "by": {
        "pod": {
          "vllm-0": 0.41,
          "vllm-1": 0.94
        }
      }
    }
  }
}

Note

--output json (one-shot) produces pretty-printed JSON for readability. --output json --watch produces compact JSON (one object per line) for streaming and automation.

Top-level fields

Field Description
schema_version JSON schema version. Current value: 1.
metadata Report metadata, including generation time and target
health Overall health: ok, info, warning, critical
notices Advisory caveats about reading the report (scrape-mode limits, multi-model blending); empty when none apply
checks Rule results, sorted by severity and confidence
metrics Observed metrics; included only with --verbose

Metrics

value is the scalar diagnostic value for the metric. by is present only when vLLM Doctor detects multiple replicas — its single key is the label that distinguishes them (e.g. pod, instance, host, server), and the inner map keys are the per-replica values. Per-replica numbers use the metric's own aggregation (max for KV cache usage and latency percentiles, sum for counters).

Metadata

Field Description
generated_at ISO 8601 timestamp for when the report was built
target.model_name Model name filter, or null
target.since Query window used for Prometheus rates
target.client_mode prometheus or scrape

Checks

Each check has a stable machine-readable id and a human-readable name.

Field Description
id Stable rule ID, such as queue_pressure
name Display name, such as Queue Pressure
finding Finding details, or null when the rule is OK

Finding fields:

Field Description
severity info, warning, or critical
confidence low, medium, or high
title Human-readable finding title
summary Short explanation of the diagnosis
evidence Observed signals supporting the finding
likely_causes Possible causes to investigate
recommendations Suggested next actions
related_metrics Metrics related to the finding

signals are intentionally omitted from JSON findings for now. They remain internal explanatory detail and may be exposed in a later schema version.