Consequence Scoring Worksheet

Not all AI errors are equal. Map failure modes to business consequences and prioritize what to test first.

Worksheet For: PM, Risk, Eng Est. time: 45 min

How Consequence Scoring Works

Traditional accuracy treats all errors equally. A misspelling and a compliance violation both count as "1 error." Consequence scoring fixes this by weighting errors by their real-world impact.

Formula: Consequence Score = Σ (Error Severity × Frequency × Detection Difficulty)

Step 1: Define Severity Scale

Score Severity Description Examples
5 Critical Legal liability, safety risk, regulatory violation Wrong medical advice, GDPR violation, financial misstatement
4 High Revenue loss, customer churn, brand damage Wrong pricing, offensive response, data leak via hallucination
3 Medium Bad user experience, support ticket generated Wrong feature described, outdated info, confusing answer
2 Low Minor friction, user can self-correct Slightly verbose, suboptimal formatting, minor imprecision
1 Cosmetic No real impact, polish issue Typo, awkward phrasing, slightly wrong emoji

Step 2: Inventory Failure Modes

List every way your AI system can fail. Score each on severity, frequency, and detection difficulty.

Failure Mode Category Severity
(1-5)
Frequency
(1-5)
Detection
(1-5)
Risk Score Priority
Hallucinated policy details Accuracy 5 3 4 60 P0
Prompt injection compliance Safety 5 2 3 30 P1
Wrong product recommendation Relevance 3 4 3 36 P1
Outdated pricing info Freshness 4 2 2 16 P2
Verbose answers Tone 1 5 1 5 P3
[Add failure mode]

Step 3: Priority Matrix

Priority Risk Score Action Timeline
P0 — Block release ≥ 50 Must have eval coverage. Zero tolerance for failure. Before any release
P1 — Fix this sprint 25 – 49 Eval coverage required. Monitor closely. Within 1 sprint
P2 — Track & plan 10 – 24 Add to golden set. Include in quarterly review. Within 1 quarter
P3 — Backlog < 10 Log for awareness. Evaluate if resources allow. Best effort