Skip to main content
POSTMORTEMS

Postmortem: the day our queryable memory returned stale embeddings for forty minutes.

Two indexer instances of different versions, one production traffic split, and a forty-minute window where retrieval returned embeddings the corpus said no longer existed. Here is the timeline, the root cause, and the runtime invariant we added.

By J. ReichertPRINCIPAL ENGINEER · KNYTE
PUBLISHEDAPRIL 11, 2026
READ TIME9 MIN
CATEGORYPOSTMORTEMS

On March 22, 2026, our queryable memory returned stale embeddings — embeddings derived from a document version the corpus had already retired — for thirty-eight minutes between 14:18 UTC and 14:56 UTC. The fault did not produce a customer-visible incident in the literal sense; no customer reported anything, and the editor-in-the-loop layer caught the seven affected outputs before they shipped. It produced a near-miss in the sense that the runtime invariants we believed we were enforcing turned out to have a gap that another set of conditions could have widened. We are publishing the postmortem because the gap is not specific to our deployment; it is specific to the way retrieval pipelines that use multiple indexer instances handle version promotion.

Timeline.

  • 13:42 UTC — Scheduled corpus version 4.2.1 promotion. The promotion completes successfully on the corpus-write path. Document `brief.q1-2026.lifecycle.v6` is retired in favor of `v7`.
  • 13:45 UTC — Indexer instance A picks up the corpus-version change and begins re-embedding the affected chunks. Estimated completion: 8 minutes.
  • 13:47 UTC — Indexer instance B (deployed during the prior week as a canary at five percent traffic) does not pick up the change. The promotion-event subscription on instance B was silently broken by an unrelated config rotation that morning.
  • 13:53 UTC — Indexer instance A finishes re-embedding. New chunks for `v7` are written to the index. Old chunks for `v6` are tombstoned in instance A's view.
  • 14:18 UTC — Production traffic begins arriving with corpus-version pin 4.2.1. Instance A returns the new embeddings. Instance B returns the stale `v6` embeddings, with no awareness that they are stale, because B is unaware the corpus version changed.
  • 14:24 UTC — First editor-in-the-loop catch. The reviewer notices that a draft cites text not present in the current brief. Flag is raised in the operations channel.
  • 14:38 UTC — On-call engineer correlates the editor flag with the trace data. The trace shows two retrievals against the same query returning different results. Hypothesis: indexer drift.
  • 14:52 UTC — Instance B is removed from the rotation. Production traffic concentrates on instance A. Stale results stop.
  • 14:56 UTC — Stale-embedding detector (deployed earlier this year for unrelated reasons) confirms zero residual stale results in production over the trailing 60 seconds.
  • Total stale-result window: 38 minutes. Affected production retrievals: 247. Affected outputs that reached the editor queue: 7. Affected outputs that shipped to a downstream system: 0.

Root cause.

Two indexer instances were running, of two different versions, against the same corpus. The newer instance's promotion-event subscription was broken by a config rotation that morning. The older instance handled the promotion correctly. Production traffic was split across both instances, so a non-trivial fraction of retrievals returned stale results without any single instance being individually wrong.

The runtime invariant we believed we were enforcing was "every indexer instance reflects the latest corpus version within ninety seconds of promotion." This invariant was true for instance A. It was false for instance B because the subscription was silently broken, and the invariant had no health-check that would have caught the failure.

What worked.

Three things worked, and they bounded the impact.

Editor-in-the-loop as the default. Every output that reached the editor queue was reviewed before shipping. The seven affected outputs were caught at the editor layer, not in production. The editor-in-the-loop runtime is what kept this from being a customer-visible incident.

Trace data with corpus-version pinning. The on-call engineer correlated the editor flag to indexer drift in fourteen minutes because the trace data showed the corpus version pin in scope at retrieval time. Without corpus-version pinning, the diagnosis would have taken hours.

Stale-embedding detector. A background job that compares random samples of indexer outputs against the canonical corpus state confirmed the resolution within 60 seconds of the rotation change. The detector had been deployed earlier in the year for a different reason and proved its value here.

What did not work.

Two things did not work, and they are the additions we are making to the runtime.

No health-check on the promotion-event subscription. The newer indexer instance was happily serving requests with no awareness that its subscription was broken. The instance's health endpoint reported healthy because it was healthy by every metric we were checking — request rate, latency, memory, error rate. None of those metrics could have detected that the instance was working from an out-of-date corpus view.

Config rotation did not validate downstream subscription state. The morning's config rotation, which was unrelated to the indexer, rotated a credential the subscription used. The credential rotation succeeded. The subscription's reconnect with the new credential failed silently. The config-rotation runbook had no validation step that would have caught the silent reconnect failure.

Runtime changes.

Three changes have been deployed.

  1. Indexer health-check now includes a corpus-version freshness probe. Each instance reports the last corpus version it has fully reflected. The freshness probe fails the instance from the rotation if its reflected version is more than ninety seconds behind the canonical corpus version.
  2. Config-rotation runbook now includes a downstream-subscription validation step. After every credential rotation, the rotator must observe a successful subscription health-check from each consumer of the rotated credential before declaring the rotation complete.
  3. Promotion runbook now requires a stale-embedding detector pass at corpus-version-plus-two-minutes. The detector was running on a fifteen-minute schedule; the new requirement is to run on-demand within two minutes of every corpus-version promotion. The window in which this incident's drift was undetected is now closed by the detector cadence.

What we are not changing.

We considered, and rejected, three changes that would have been overcorrections.

Single indexer instance. The two-instance setup exists for a reason — canary deployment of indexer changes. Going to a single instance would eliminate this incident class but would reintroduce the deployment-risk class the canary was designed to mitigate. Net negative.

Block all production traffic on indexer drift. The drift was real. The blast radius was bounded. Hard-blocking would have produced a different and possibly larger customer impact than letting the editor-in-the-loop layer absorb the affected outputs. The right level for the blocking decision is at the editor layer, not at the retrieval layer.

Manual approval gate on every promotion. Promotions happen multiple times a day. Adding a manual gate would have slowed the pipeline meaningfully and shifted incident risk from drift to delayed correction. The freshness probe and the post-promotion detector pass produce the same safety with no human-cycle cost.

What this incident demonstrated about the architecture.

Two architectural properties paid for themselves in this incident, and one we will be adding.

Editor-in-the-loop as the default workflow primitive is what bounded the impact. None of the affected outputs reached a downstream system. We have written about why we run this as the default; this incident is a concrete example of the runtime decision earning its keep.

Corpus-version pinning in the trace data is what made the diagnosis tractable. The on-call engineer was able to identify the drift in fourteen minutes because the trace data answered the question "what corpus version was this retrieval working from" without further instrumentation.

The runtime invariant we will be adding is corpus-version freshness as a first-class indexer health metric. The invariant was previously implicit. It is now explicit, monitored, and enforced.

We publish postmortems because the lessons are not specific to our deployment. Any retrieval pipeline running multi-instance indexers against a versioned corpus has the same invariant to enforce. If you are running a similar topology and the freshness check is not in your health probes, the invariant is implicit in your system too.

J. ReichertPRINCIPAL ENGINEER · KNYTE

Twelve years on production retrieval and inference systems. Previously at Stripe (risk infra) and Anthropic (eval tooling). Writes about the boring parts of agentic infra.

SUBSCRIBE

Get the dispatch in your inbox.

Twice a month. We send the essay, the postmortem, and nothing else. No roundups. No tracking pixels pretending to be personalization.

NO SPAM · UNSUBSCRIBE ANYTIME · 4,200 READERS