Error Rate
Detects elevated server-side errors or client aborts relative to total requests.
Background
vLLM tracks completed requests by finished_reason:
| Reason | Meaning |
|---|---|
stop |
Completed normally |
error |
Server-side failure (OOM, internal error) |
abort |
Client disconnected or request cancelled |
length |
Hit max_tokens limit |
repetition |
Stopped by repetition penalty |
This rule monitors error and abort rates. A high error rate indicates the server is failing requests internally. A high abort rate often means clients are giving up — typically because responses are too slow.
Signals
| Signal | Condition |
|---|---|
| Error rate high | errors / total >= 0.05 (default) |
| Abort rate high | aborts / total >= 0.10 (default) |
Confidence
| Signals matched | Confidence |
|---|---|
| Error high only | Low |
| Abort high only | Low |
| Both error + abort high | High |
Severity
- Critical when error rate is high — server is actively failing requests
- Warning when only abort rate is high — clients are disconnecting
Likely causes
- Server-side OOM or internal errors under high load
- Requests exceeding timeout limits causing client aborts
- High latency causing clients to disconnect before completion
- Resource exhaustion correlating with KV cache pressure
Recommendations
- Inspect vLLM server logs for error details
- Correlate with KV cache pressure and queue pressure findings
- Check client timeout settings relative to observed TTFT and TPOT
- Reduce load or add replicas if errors correlate with traffic spikes
Metrics used
vllm:request_success_total{finished_reason="error"}vllm:request_success_total{finished_reason="abort"}vllm:request_success_total{finished_reason="stop"}
Configuration
from vllm_doctor.rules.error_rate import ErrorRateRule
rule = ErrorRateRule(
high_error_rate=0.02, # default: 0.05
high_abort_rate=0.05, # default: 0.10
)