Why your RAG stack needs a versioned corpus, not a vector store.

The single biggest piece of technical debt we find inside production RAG deployments is the same shape every time: the team treated the vector store as the corpus. Documents were indexed directly into Pinecone or Weaviate or Qdrant, the index became the source of truth, and there is no straightforward way to know what version of which document produced any given retrieval result. The deployment runs. It cannot be debugged. And every time the embedding model needs to change, the team has to make a choice between rebuilding the index from a corpus that does not exist, or keeping the existing index and accepting that the embedding model is now permanent.

A versioned corpus solves this by separating two concerns the vector-store-as-corpus pattern conflates: the durable record of what the deployment knows, and the ephemeral index that makes that record retrievable. The corpus is the durable thing. It is versioned, addressable, and replayable. The vector store is one of possibly several indexes built on top of it. Replacing the vector store does not affect the corpus. Updating the corpus does not require rebuilding the index from scratch.

What follows is the architecture we run in production across twelve installs, the migration pattern that gets a single-layer deployment to it without downtime, and the operational properties that emerge once the split is in place.

What "versioned" means at the corpus layer.

Versioning, at the corpus layer, has three components: document versioning, corpus versioning, and decision versioning.

Document versioning means every document in the corpus has a version history. The Q3 campaign brief that exists today is version seven. Versions one through six exist, are addressable by ID, and can be retrieved if a workflow needs to know what the brief looked like in March. Documents are not overwritten. New versions are written, the previous version is preserved, and the corpus exposes both.

Corpus versioning means the corpus itself has a version. At any point in time, the corpus is at version 4.2.1 (or whatever versioning scheme is in use). A retrieval call can pin to a specific corpus version, which is useful for replay, audit, and eval. The corpus version is incremented atomically when documents are added, removed, or modified.

Decision versioning means every editor decision — every accept, reject, edit, or escalate — is recorded against the corpus version it was made under. This is what makes the corpus a queryable record of institutional judgment, not just a stack of documents. An editor's choice to mark a particular brief as the canonical example of a tone is preserved with the corpus context that produced it.

Why versioning matters in production.

Five operational properties become available once the corpus is versioned, and almost none of them are available without it.

Replay. A retrieval call from three weeks ago can be exactly reproduced. Same corpus version, same query, same policy, same result. This is what makes incident investigation tractable. Without it, every "why did the agent return this" question becomes a guessing game.

Eval against frozen corpus. Eval suites need a stable target. If the corpus is changing between eval runs, every regression looks like a model regression, and you cannot tell whether the model got worse or the corpus changed underneath it. Pinning the eval to a specific corpus version separates the two.

Index rebuild without recall loss. When the embedding model changes, the index has to be rebuilt. With a versioned corpus, the rebuild reads from the corpus, generates new embeddings, and atomically swaps in. The deployment does not lose anything. Without a corpus, the rebuild is a content-recovery exercise.

Editorial undo. An editor who realizes a previous decision was wrong can revert it cleanly because the system knows what the previous state was. Without versioning, undo is reconstruction.

Auditor-grade lineage. A regulator asking "why did this decision get made" can be answered with the corpus version, the document versions cited, the editor decisions in scope, and the policy applied. All of that is queryable. Without versioning, the answer is "we don't know."

What the architecture looks like.

We run the corpus on object storage with a thin metadata layer in Postgres. Each document version is an immutable object. The metadata layer holds the document graph, the version history, the editorial decisions, and the access policy. The vector store sits downstream as one of several indexes — we typically run a hybrid lexical-semantic index per workflow domain, plus a coarse global semantic index for cross-domain retrieval.

The metadata layer is the choke point for every read and write. Every retrieval starts with a metadata query (which documents are policy-visible, what is the schema constraint, what is the corpus version), then dispatches to the index for the semantic step. Every write goes through the metadata layer first, then triggers the embedding update asynchronously.

// Read path with corpus version pin
const corpus = await knyte.corpus.at("4.2.1");
const result = await corpus.query({
  schema: { docType: "campaign-brief", status: "approved" },
  semantic: { text: queryText, k: 12 },
  policy: { actor, intent: "draft.generate" },
});

// Write path: corpus first, index reconciles async
const v = await corpus.documents.append({
  docId: "brief.q4-2026.lifecycle",
  payload: docPayload,
  metadata: { author, status: "draft" },
});
// Returns { documentVersion, corpusVersion } immediately;
// embedding update is non-blocking.

The migration pattern that works without downtime.

If you are reading this because your current deployment has the vector-store-as-corpus problem, the migration is achievable in a two-week sprint without taking the deployment down. The pattern we use:

Stand up the corpus and metadata layer in parallel to the existing vector store. Do not touch production traffic.
Backfill the corpus from the source systems that originally produced the indexed documents. Do not back-fill from the vector store; the original sources have content the vector store has lost.
Run the production retrieval pipeline against both the old vector store and a new index built from the corpus, in shadow mode. Compare results. Tune until parity.
Cut production reads to the new index. Keep the old vector store running as a fallback for one week, with traffic mirrored.
Decommission the old vector store. Backfill any editorial decisions that happened during the cutover into the corpus's decision log.

The shadow-mode period is the part teams skip and regret. Almost every backfill misses something — a document type that was never in the source system, a metadata field that was added inside the vector store and never persisted upstream, a stale embedding that was the actual signal a retrieval was returning. The shadow comparison surfaces these discrepancies before they become production incidents.

What this connects to.

Corpus versioning is one piece of the broader queryable-memory architecture we run on every install. It is also the piece that, when missing, makes the other pieces impossible to build well. You cannot run a real eval suite against a corpus that is not versioned. You cannot honor an editor undo against a corpus that is not versioned. You cannot defend a regulator's lineage question against a corpus that is not versioned. The other layers depend on this one.

If you are building RAG and your durable storage is a vector index, the deployment is structurally one model swap away from a content-recovery project. The remediation is not a rewrite. It is an extraction. The two-week sprint is a small price for the eighteen months of operational stability that follow.

D. ChoSTAFF ENGINEER · KNYTE

Built distributed retrieval at Pinterest and Databricks. Spends most days inside trace viewers and the rest writing about why your eval suite lies to you.

RECENT

Inference cost as a first-class engineering concern. →

RECENT

Multi-tenant fine-tunes without cross-tenant leakage. →

RECENT

Postmortem →

KEEP READING

More from the dispatch.

All posts →

FIG. 27PROD / 2026

The cost curve is steeper than most teams have measured.

ENGINEERING

Inference cost as a first-class engineering concern.

04.26.2612 MIN · DC

FIG. 28PROD / 2026

The stream is for the user. The gate is for the system.

ENGINEERING

Streaming model outputs without losing the editorial gate.

04.23.2610 MIN · JR

FIG. 29ARCHITECTURE / 2026

Either weights are tenant-scoped or leakage is possible.

ENGINEERING

Multi-tenant fine-tunes without cross-tenant leakage.

04.20.2613 MIN · DC

Why your RAG stack needs a versioned corpus, not a vector store.

What "versioned" means at the corpus layer.

Why versioning matters in production.

What the architecture looks like.

The migration pattern that works without downtime.

What this connects to.

More from the dispatch.

Inference cost as a first-class engineering concern.

Streaming model outputs without losing the editorial gate.

Multi-tenant fine-tunes without cross-tenant leakage.

Get the dispatch in your inbox.