Quick Answer
Drift monitoring detects shifts in query distribution and embedding space before they cause quality regressions.
TL;DR
- Track cluster distribution shifts and similarity trends.
- Alert on drift thresholds tied to risk.
- Trigger retraining, guardrails, or human review.
FAQ
What is a drift score?
A drift score measures how much the current query distribution diverges from a baseline, often with Jensen-Shannon divergence.
How often should drift be checked?
Daily for high-volume systems; weekly for lower-volume systems or stable domains.
What actions follow an alert?
Investigate the slice, update the dataset, and decide between retrieval fixes, retraining, or escalation.
Types of Drift in AI Systems
Drift is the silent killer of production AI. Your model works great on day one, then slowly degrades as the world changes around it.
Query Distribution Drift
Users start asking different questions than your training data anticipated. Common in RAG systems.
Embedding / Semantic Drift
Your embedding model's understanding of terms doesn't match new content. Common in multi-tenant systems.
1. Detecting Query Distribution Drift
The key insight: you can't just compare today's queries to yesterday's. You need to compare the distribution of semantic clusters.
def compute_drift_score(baseline: dict, new_queries: list[str]) -> float:
# 1. Encode new queries
new_embeddings = encoder.encode(new_queries)
# 2. Assign to baseline clusters
new_clusters = baseline_model.predict(new_embeddings)
new_dist = np.bincount(new_clusters) / len(new_clusters)
# 3. Calculate Jensen-Shannon divergence
return jensenshannon(baseline_dist, new_dist)
2. Detecting Embedding Drift
Monitor retrieval confidence. A gradual decline in average Top-K cosine similarity indicates the embedding model is losing its grasp on the domain.
Alerting & Action
- Escalate low-confidence queries to human support.
- Feed human corrections back into the knowledge base.
- Trigger retraining alerts when drift score > 0.15.