"We rebuilt our retrieval stack in seven days. Here's what broke." — Frontier-lab engineering lead.

The interview subject runs engineering for a frontier AI lab whose internal tooling is the subject of this conversation. The lab uses retrieval-augmented generation for a specific class of internal workflow — research synthesis, internal-knowledge access, model-behavior debugging — and had been running on a single-layer vector-store architecture for fourteen months. The retrieval stack was rebuilt over seven days in early February and ships to internal users today on a versioned-corpus architecture. We sat down with the engineering lead for thirty-six minutes the week after cutover. The interview subject asked to remain unnamed for compliance reasons.

On what triggered the rebuild.

Knyte: What was the moment.

Engineering lead: It was not one moment. It was three things compounding. First, we had a production incident where the model produced an output citing a corpus document that no longer existed. The document had been deleted by a researcher three weeks earlier, but the embeddings were still in the vector store. We had no version of the document to inspect, no record of why it was deleted, no way to know what the model had been told. Second, we were planning a model migration from one base to another, and the engineering team realized the embedding rebuild was going to require us to re-derive the corpus from scattered source systems because the vector store was the only place the corpus existed in any complete form. Third, we had an internal SOC 2 audit coming up, and the auditor's preliminary checklist included three questions about lineage that we could not answer.

Engineering lead: The three things together meant we were going to have to do this work eventually. The question was whether we did it in a controlled rebuild or in a series of incidents. We chose the rebuild.

On the seven days.

Knyte: Walk me through it.

Engineering lead: Day one was scoping. We sat down with the pattern from the dispatch on versioned corpus, which was, to be honest, the document we had been waiting for. We mapped the four-layer split — corpus, schema, embedding, access policy — against our existing single-layer setup. The corpus did not exist anywhere; the schema was implicit in the vector-store metadata fields; the embedding was the entire system; the access policy was a query-time filter on the vector results. We knew what we were building.

Engineering lead: Days two and three were the corpus stand-up. We provisioned object storage, wrote the corpus-write API, and started backfilling from source systems. The backfill is where we discovered our first surprise: the source systems had documents the vector store had not. They had been added to the vector store and then deleted from the source. Our vector store was the most complete version of our corpus, which is exactly the failure mode the four-layer split is designed to prevent. We had to reconcile carefully.

Engineering lead: Day four was the schema layer and the access policy. The schema was straightforward — document type, owner, classification, lineage to source. The access policy was less straightforward because we had not previously enforced it at the corpus layer, and there was real work to do in defining who could see what. We decided to mirror the source-system access patterns initially and tighten over time. That was the right call; it would have taken weeks to get the policy right from scratch and we did not need that level of refinement on day four.

Engineering lead: Days five and six were the new embedding layer and the shadow-mode comparison. We built the new index from the corpus, ran every production query against both the old vector store and the new index, and diffed the results. Forty percent of queries returned different result sets. Most of the differences were the new index returning results the old store had been missing. Some of them — about three percent — were the old store returning stale results from the deleted-but-not-removed documents. The shadow comparison is what told us the migration was correct.

Engineering lead: Day seven was the cutover. We switched production reads to the new index, kept the old store mirroring for one more week as a fallback, and decommissioned the old store the following Friday. The seven-day clock was the build. The mirror week was an additional safety. Total elapsed time was two weeks; total engineering effort was about thirty person-days across a team of six.

On what surprised the team.

Knyte: What did you not expect.

Engineering lead: Three things. First, the corpus reconciliation. I mentioned this. We had assumed the vector store was a derived artifact from source systems, which meant the source systems were the truth and the vector store was downstream. We were wrong. The vector store had become the primary corpus by accident, because researchers had been adding documents directly to the indexer and never to source. The reconciliation was uncomfortable.

Engineering lead: Second, the shadow-mode forty percent diff. I was expecting maybe ten percent of queries to differ. The reality was that nearly half of our production queries were returning subtly different results between the two systems. The model had been producing outputs based on retrieval results that were demonstrably worse than what the new system would have produced, and we had no way to know because we had no comparison baseline. The shadow comparison was the most valuable hour of the rebuild.

Engineering lead: Third, the audit lineage. After the cutover I sat with the auditor's checklist again. Every question that had been unanswerable a week earlier was now answerable. "Show me which corpus version was in scope for this retrieval." Easy. "Show me which user accessed which subset of the corpus." Easy. "Show me how an editorial decision was applied across related documents." Easy. The compliance posture changed completely, not because we had added compliance features but because we had stopped conflating the index with the corpus.

On what broke.

Knyte: You said you would tell us what broke.

Engineering lead: Two things broke. The first was the policy migration. We had assumed the source-system access patterns would translate cleanly to the new corpus's access policy. They did not. We had a class of internal documents that were technically accessible to a broad audience in the source system but had been gated narrowly in the vector store by a custom query filter. When we mirrored the source pattern, we accidentally widened the access. We caught it in shadow mode, before cutover, and tightened the policy. But it was a real near-miss, and it taught us that mirroring is not always the right starting point for the access layer.

Engineering lead: The second thing that broke was a downstream system that had been parsing the vector store's metadata in a way the new corpus did not produce. The metadata fields were renamed in the new schema for clarity. The downstream system was hardcoded to the old field names. We caught this on day six, two days before cutover, and patched it before the switchover. If the shadow-mode comparison had not been in place, we would have shipped the regression and discovered it in production a few hours later.

On what he would do differently.

Knyte: If you were starting over.

Engineering lead: I would have done the rebuild a year earlier. The single-layer architecture was already the wrong shape eighteen months ago, but the consequences were not yet visible. The cost of the rebuild grew slowly during that period — every additional document indexed, every additional query pattern, every additional researcher onboarded raised the total work to migrate. By the time we did the rebuild, the work was thirty person-days. If we had done it a year earlier, it might have been ten. The lesson is that retrieval-architecture debt compounds, and the right time to rebuild is earlier than feels urgent.

Engineering lead: I would also have run the shadow-mode comparison continuously, not just during the cutover week. The forty-percent diff number is not a one-time observation. It would have been informative every week of the prior fourteen months. We just had no way to see it.

On what he tells peer engineering leads.

Knyte: Final question. What is the takeaway.

Engineering lead: If you are running RAG and your durable storage is a vector index, you are running the same architecture we were. The rebuild is a two-week project, not a quarter. The compounding payback — operational, audit, retrieval-quality — is significant and starts immediately. The hardest part is not the engineering. It is admitting that the system you have been running is the wrong shape, which feels worse than it is, because the fix is bounded.

Engineering lead: I will say one other thing, which is that I do not think this lesson is unique to us. The Knyte team has been writing about this architecture pattern for a year. We knew the pattern. We did not act on it because the existing system was working well enough. The cost of waiting was real, even though it did not show up on a single line item. If you are reading this and recognize the architecture I described, do the rebuild. It will not get cheaper.

R. MoralesFIELD CORRESPONDENT · KNYTE

Interviews the operators running Knyte in production. Twelve years at the Wall Street Journal covering enterprise software before joining Knyte.

RECENT

"We removed every chat surface from the product." — Head of Product, B2B SaaS. →

RECENT

"Our compliance team became the AI team." — General Counsel, regulated industry. →

RECENT

"We shut off seven AI pilots in week eight." — Rhea Morales, Head of Brand Ops, Fortune 100 consumer brand. →

KEEP READING

More from the dispatch.

All posts →

FIG. 015B2B SAAS / SERIES C

We removed every chat surface from the product.

INTERVIEWS

"We removed every chat surface from the product." — Head of Product, B2B SaaS.

04.27.2612 MIN · RM

FIG. 016REGULATED INDUSTRY

Compliance ended up owning the AI program.

INTERVIEWS

"Our compliance team became the AI team." — General Counsel, regulated industry.

04.18.2613 MIN · RM

FIG. 014FORTUNE 100 CONSUMER

We shut off seven AI pilots in week eight.

INTERVIEWS

"We shut off seven AI pilots in week eight." — Rhea Morales, Head of Brand Ops, Fortune 100 consumer brand.

04.12.2615 MIN · RM

"We rebuilt our retrieval stack in seven days. Here's what broke." — Frontier-lab engineering lead.

On what triggered the rebuild.

On the seven days.

On what surprised the team.

On what broke.

On what he would do differently.

On what he tells peer engineering leads.

More from the dispatch.

"We removed every chat surface from the product." — Head of Product, B2B SaaS.

"Our compliance team became the AI team." — General Counsel, regulated industry.

"We shut off seven AI pilots in week eight." — Rhea Morales, Head of Brand Ops, Fortune 100 consumer brand.

Get the dispatch in your inbox.