JSON Output

--output json is the stable machine-readable output for automation.

Text output is optimized for terminal use and is not treated as a stable report format.

Examples

Default JSON output:

{
  "schema_version": "1",
  "metadata": {
    "generated_at": "2026-06-01T10:30:00+00:00",
    "target": {
      "model_name": "meta-llama/Llama-3.1-8B",
      "since": "5m",
      "client_mode": "prometheus"
    }
  },
  "health": "warning",
  "notices": [],
  "checks": [
    {
      "id": "replica_imbalance",
      "name": "Replica Imbalance",
      "finding": {
        "severity": "warning",
        "confidence": "high",
        "title": "Replica imbalance",
        "summary": "Load is unevenly distributed across replicas — one replica is doing more work than its peers.",
        "evidence": ["running vllm-1=10 vs vllm-0=2; cache 94% vs 41%; waiting vllm-1=7 vs vllm-0=0"],
        "likely_causes": ["Load balancer not distributing requests evenly (sticky sessions or connection reuse)"],
        "recommendations": ["Check the load balancer / service routing and session affinity settings"],
        "related_metrics": ["vllm:num_requests_running"]
      }
    },
    {
      "id": "queue_pressure",
      "name": "Queue Pressure",
      "finding": {
        "severity": "warning",
        "confidence": "low",
        "title": "Queue pressure",
        "summary": "Requests are queuing faster than the server can process them.",
        "evidence": ["Waiting requests: 7"],
        "likely_causes": ["Insufficient replica capacity for current traffic"],
        "recommendations": ["Add replicas or increase concurrency limits"],
        "related_metrics": ["vllm:num_requests_waiting"]
      }
    }
  ]
}

When several deployments share one Prometheus target, replica imbalance evidence is prefixed with the model, e.g. "llama-70b: running vllm-1=10 vs vllm-0=2", and a separate line is emitted per affected model.

Verbose JSON includes observed metrics:

{
  "schema_version": "1",
  "metadata": {
    "generated_at": "2026-06-01T10:30:00+00:00",
    "target": {
      "model_name": null,
      "since": "5m",
      "client_mode": "prometheus"
    }
  },
  "health": "warning",
  "notices": [],
  "checks": [],
  "metrics": {
    "num_requests_running": {
      "value": 12,
      "by": {
        "pod": {
          "vllm-0": 2,
          "vllm-1": 10
        }
      }
    },
    "kv_cache_usage_perc": {
      "value": 0.94,
      "by": {
        "pod": {
          "vllm-0": 0.41,
          "vllm-1": 0.94
        }
      }
    }
  }
}

Note

--output json (one-shot) produces pretty-printed JSON for readability. --output json --watch produces compact JSON (one object per line) for streaming and automation.

Top-level fields

Field	Description
`schema_version`	JSON schema version. Current value: `1`.
`metadata`	Report metadata, including generation time and target
`health`	Overall health: `ok`, `info`, `warning`, `critical`
`notices`	Advisory caveats about reading the report (scrape-mode limits, multi-model blending); empty when none apply
`checks`	Rule results, sorted by severity and confidence
`metrics`	Observed metrics; included only with `--verbose`

Metrics

value is the scalar diagnostic value for the metric. by is present only when vLLM Doctor detects multiple replicas — its single key is the label that distinguishes them (e.g. pod, instance, host, server), and the inner map keys are the per-replica values. Per-replica numbers use the metric's own aggregation (max for KV cache usage and latency percentiles, sum for counters).

Metadata

Field	Description
`generated_at`	ISO 8601 timestamp for when the report was built
`target.model_name`	Model name filter, or `null`
`target.since`	Query window used for Prometheus rates
`target.client_mode`	`prometheus` or `scrape`

Checks

Each check has a stable machine-readable id and a human-readable name.

Field	Description
`id`	Stable rule ID, such as `queue_pressure`
`name`	Display name, such as `Queue Pressure`
`finding`	Finding details, or `null` when the rule is OK

Finding fields:

Field	Description
`severity`	`info`, `warning`, or `critical`
`confidence`	`low`, `medium`, or `high`
`title`	Human-readable finding title
`summary`	Short explanation of the diagnosis
`evidence`	Observed signals supporting the finding
`likely_causes`	Possible causes to investigate
`recommendations`	Suggested next actions
`related_metrics`	Metrics related to the finding

signals are intentionally omitted from JSON findings for now. They remain internal explanatory detail and may be exposed in a later schema version.