How Consequence Scoring Works
Traditional accuracy treats all errors equally. A misspelling and a compliance violation both count as "1 error." Consequence scoring fixes this by weighting errors by their real-world impact.
Formula:
Consequence Score = Σ (Error Severity × Frequency × Detection Difficulty)
Step 1: Define Severity Scale
| Score | Severity | Description | Examples |
|---|---|---|---|
| 5 | Critical | Legal liability, safety risk, regulatory violation | Wrong medical advice, GDPR violation, financial misstatement |
| 4 | High | Revenue loss, customer churn, brand damage | Wrong pricing, offensive response, data leak via hallucination |
| 3 | Medium | Bad user experience, support ticket generated | Wrong feature described, outdated info, confusing answer |
| 2 | Low | Minor friction, user can self-correct | Slightly verbose, suboptimal formatting, minor imprecision |
| 1 | Cosmetic | No real impact, polish issue | Typo, awkward phrasing, slightly wrong emoji |
Step 2: Inventory Failure Modes
List every way your AI system can fail. Score each on severity, frequency, and detection difficulty.
| Failure Mode | Category | Severity (1-5) |
Frequency (1-5) |
Detection (1-5) |
Risk Score | Priority |
|---|---|---|---|---|---|---|
| Hallucinated policy details | Accuracy | 5 | 3 | 4 | 60 | P0 |
| Prompt injection compliance | Safety | 5 | 2 | 3 | 30 | P1 |
| Wrong product recommendation | Relevance | 3 | 4 | 3 | 36 | P1 |
| Outdated pricing info | Freshness | 4 | 2 | 2 | 16 | P2 |
| Verbose answers | Tone | 1 | 5 | 1 | 5 | P3 |
| [Add failure mode] | — | — |
Step 3: Priority Matrix
| Priority | Risk Score | Action | Timeline |
|---|---|---|---|
| P0 — Block release | ≥ 50 | Must have eval coverage. Zero tolerance for failure. | Before any release |
| P1 — Fix this sprint | 25 – 49 | Eval coverage required. Monitor closely. | Within 1 sprint |
| P2 — Track & plan | 10 – 24 | Add to golden set. Include in quarterly review. | Within 1 quarter |
| P3 — Backlog | < 10 | Log for awareness. Evaluate if resources allow. | Best effort |