Designing for the ten percent of inputs that break.

Most AI features are designed against the median input. The product manager builds a wireframe against a representative example. The engineering team builds against an eval set drawn from the median. The launch demo features the median. The product ships, the median works, and somewhere around week three the support team starts seeing tickets from users whose inputs are not median. By week eight, the support volume from non-median inputs exceeds the support volume from median inputs by a factor that surprises the product team. The product was designed for the median; the production traffic is not median.

The discipline that addresses this is not better median design. It is explicit design for the tail. The ten percent of inputs that break the median design are where the support tickets live, and the design choices that handle the tail well determine whether the product is loved or merely used. What follows is the design discipline that produces tail-handling features, the patterns that consistently produce tail failures, and the operational properties that make the difference visible.

Why median-only design produces the cliff.

Three structural reasons.

Production traffic is not the design distribution. The eval set was drawn from a sample the team curated. Production traffic is whatever users actually bring. The two distributions diverge along axes the team did not anticipate — input length, format variation, multi-topic inputs, language, jargon density. Each axis adds a tail. The aggregate of the tails is the ten percent.

Median-design features fail silently on the tail. The model returns an output. The output is plausible. The output is wrong in ways that require domain knowledge to spot. The user, encountering the wrong output, either silently abandons or escalates to support. Neither path produces the signal the product team needs to learn from the failure.

The tail is sticky. Users whose inputs sit in the tail tend to be the most engaged users — they are pushing the product harder, using it for the work that matters most, and least willing to settle for a generic output. Their dissatisfaction has more downstream consequences than the median user's. The tail is also the population most likely to influence procurement decisions inside their organization.

What tail-aware design looks like.

Three properties.

Input shape detection at the entry point. The product detects when an input is outside the design distribution and surfaces this to the user explicitly. "This document is longer than we typically optimize for; the output may be less accurate." "This document mixes multiple topics; consider splitting it." The detection is not a quality bar — it is a calibration affordance. Users who know the system is on the edge will adjust their expectations or their input.

Graceful degradation paths. When the system is in the tail, it produces a less ambitious output rather than failing or producing a confidently-wrong output. A summarization that hits an unusually long input might produce a structured outline instead of a free-form summary. A draft that hits a multi-topic input might produce a per-topic draft instead of a single combined draft. The degradation preserves utility while reflecting the system's reduced confidence.

Editor escalation routes. When the input is sufficiently far in the tail, the system routes to an editor — even on workflows where editor-in-the-loop is normally bypassed — because the cost of getting it wrong exceeds the cost of the editor's review. The escalation is automatic, traceable, and visible to the user as a quality affordance, not a system limitation.

How the tail gets characterized.

Tail characterization requires production telemetry. We tag every input on three dimensions: shape (length, structure), topic distribution (single vs multi), and confidence (the model's own internal signal about whether it is operating in distribution). The dimensions are sampled into the eval suite and the design conversation. The team has a real distribution to design against, not the team's prior assumption about what production traffic looks like.

This is closely related to the eval-suite stratification pattern we wrote about elsewhere. The eval discipline produces the data the design discipline needs. Without the eval data, tail-aware design is guesswork.

Three product moments where tail design pays back most.

The first failure surface. When the system fails on a tail input, the failure surface is the most consequential UX moment in the product. A graceful failure that explains itself produces a continued user; a confidently-wrong output that the user has to debug produces a churned one.

The escalation moment. When the system routes to an editor instead of producing autonomously, the explanation matters. "Routing this to a reviewer because it requires regulatory context the model does not handle confidently" reads as quality assurance; "escalated to a human" reads as system limitation. Same operation, different message.

The retry path. When the user provides additional context to retry a tail input, the system has to demonstrate that it used the additional context. A retry that produces a similar-quality output despite better input training the user that retry is futile. A retry that visibly improves trains the user that the system is responsive to better input. The latter is the design we want.

If your product's design conversations are still anchored on the median input, the tail is silently bleeding the product. The remediation is not a redesign. It is a discipline shift — design conversations include the tail by default, telemetry feeds the conversation, and the three tail-handling properties become design table-stakes rather than special cases.

The teams that have made this shift describe it as an attention reallocation rather than as additional work. The total design budget did not increase. The fraction allocated to the tail did. The fraction allocated to the median got smaller, because the median was already being designed for adequately and the marginal return on additional median polish was lower than the marginal return on tail-handling investment. The reallocation produced both the support-ticket reduction described above and a noticeable lift in customer satisfaction among the tail cohorts, who are typically the most influential users in any enterprise procurement decision.

M. OkaforHEAD OF PRODUCT · KNYTE

Shipped the first multi-tenant editor-in-the-loop runtime at Notion. Now designs the surfaces operators actually use. Believes most AI products are toggles in search of a workflow.

RECENT

Onboarding for AI products →

RECENT

Designing AI features that survive a model swap. →

RECENT

The internal-tool antipattern in enterprise AI products. →

KEEP READING

More from the dispatch.

All posts →

FIG. 32DAYS 1–90

The first three days set the year.

PRODUCT

Onboarding for AI products: the first three days that determine the next year.

04.30.2611 MIN · MO

FIG. 1611 PRODUCT TEAMS / 18-MONTH WINDOW

The model is not the feature. The contract is.

PRODUCT

Designing AI features that survive a model swap.

04.19.2611 MIN · MO

FIG. 34DESIGN ARTIFACT MAP

The product still looks like the prototype.

PRODUCT

The internal-tool antipattern in enterprise AI products.

04.16.2610 MIN · MO

Designing for the ten percent of inputs that break.

Why median-only design produces the cliff.

What tail-aware design looks like.

How the tail gets characterized.

Three product moments where tail design pays back most.

More from the dispatch.

Onboarding for AI products: the first three days that determine the next year.

Designing AI features that survive a model swap.

The internal-tool antipattern in enterprise AI products.

Get the dispatch in your inbox.