There is a design crisis unfolding in slow motion across every AI product team right now, and almost nobody is talking about it. It is not the hallucination problem. It is not alignment. It is the interface problem — the gap between what a model can do and what a person can understand, trust, and act on.
We have spent thirty years building interfaces for deterministic software. You click a button; something happens. You fill out a form; the database updates. The rules are clear, the feedback is immediate, and the system behaves the same way every time. Designers learned to think in flows, states, and components.
The Shift from Pixels to Probability
Language models are fundamentally different. They are probabilistic. They reason. They fail in ways that look exactly like success — fluent, confident, plausible. A user submitting a document for analysis cannot tell, from the output alone, whether the model read it carefully or confabulated an answer. That uncertainty is the central design problem.
The naive solution is to show the chain of thought — to surface every step the model took on its way to an answer. This is what Anthropic does with Claude's extended thinking, what OpenAI does with o1. It is the right instinct. But raw reasoning traces are not interfaces. They are logs. Showing a user two thousand tokens of internal monologue is not transparency; it is noise.
The question is not how to show the model's work. It is how to make the right amount of uncertainty visible to the right person at the right moment.
Designing for Calibrated Trust
At Anterior, we spent months wrestling with this exact question. We were building tools for insurance case reviewers — clinicians who had to make consequential decisions about whether a treatment would be covered. The model was reading medical records, surfacing relevant policy criteria, and generating recommendations. The stakes were high. Getting the interface wrong meant either over-trusting the model (dangerous) or dismissing it entirely (pointless).
What we learned is that trust is not binary. People do not trust a system completely or distrust it completely — they calibrate. They build a mental model of when the tool is reliable and when it is not. Good interface design accelerates that calibration. It gives people the signals they need to know when to scrutinize and when to accept.
Practically, this means three things. First, link every claim to its source. If the model says the patient meets criterion B, show the passage from the clinical notes that supports that claim. Let the reviewer verify in one click. Second, express uncertainty explicitly — not with percentages (which imply false precision) but with language. "The model found no documentation of a conservative treatment trial" is a useful signal. "Confidence: 73%" is not.
The Audit Trail as Interface
Third, and most importantly, treat the audit trail as a first-class interface element. Every decision a model makes should be reviewable, correctable, and legible. Not buried in a logs panel. Not collapsed behind a disclosure triangle. The reasoning should be navigable — structured the same way a thoughtful analyst would structure their notes.
This is what distinguishes an AI copilot from an AI oracle. The oracle gives you an answer. The copilot shows you its work in a way that makes you a better decision-maker — not despite the assistance, but because of it.
The best AI interfaces make users more capable, not more dependent. The tool should extend judgment, not replace it.
What This Means for Teams
If you are designing an AI product right now, here is my practical advice. Stop treating the model as a black box your design sits on top of. Understand the failure modes. Learn what kinds of inputs produce unreliable outputs. Map the uncertainty landscape of your use case, and then design around it — not by hiding uncertainty, but by making it navigable.
The interface layer is where AI products win or lose. The model is a commodity. The design is the product. And right now, most teams are treating it as an afterthought.