Migrating from Shared DB to Schema-Per-Tenant

Moving a live SaaS off a single shared table set onto one PostgreSQL schema per tenant is a data-migration problem, not a config change — every row has to land in the right schema while production keeps writing. This guide is part of the Schema-Per-Tenant Architecture section of the broader Multi-Tenant Database Isolation Models reference.

Problem Framing

A shared database leans on a tenant_id column for separation. Isolation is enforced in application code or row-level security policies, so a single missed WHERE clause leaks another customer's data. Schema-per-tenant moves the boundary down into the database: each tenant gets its own namespace (acme.users, globex.users), and search_path decides which one a connection sees. The hard part is getting there without downtime and without dropping writes that arrive mid-migration.

The risk is that the source table is hot. While you copy ten million existing rows into per-tenant schemas, new INSERTs, UPDATEs, and DELETEs keep hitting the legacy table. A naive INSERT ... SELECT snapshot is stale the moment it finishes. The standard answer is a dual-write window: start mirroring every change to the new schemas first, then backfill history underneath the live stream, reconcile, flip reads, and only then stop writing to the legacy table.

Ordering matters because it determines what you can lose. If you backfill before dual-write is live, every write that lands during the copy is dropped from the new schema. If you flip reads before reconciliation passes, the first query against a half-populated schema returns wrong results to a customer. The phases below are sequenced so the legacy table stays the authoritative copy until the very end — which means rollback at any point is a read re-pointing, not a data-recovery exercise. That property is the whole reason the migration is safe to run against production.

Two decisions shape the rest of the work. First, where does isolation live: a database trigger keeps dual-write inside the engine so an application deploy can never desync it, whereas application-level dual-write is easier to reason about but couples correctness to every code path that writes. Second, how do you scope a connection to a tenant: search_path is the lightweight choice that works with PgBouncer transaction pooling, while a dedicated role per schema adds a stronger GRANT-based wall at the cost of more roles to manage. This guide uses triggers and search_path because they migrate the most workloads with the fewest moving parts.

Step-by-Step Guide

1. Provision per-tenant schemas idempotently

Create one schema per tenant and clone the table structure from the shared tables. Make it re-runnable so a partial failure can be retried.

CREATE OR REPLACE FUNCTION provision_tenant_schema(p_schema text)
RETURNS void AS $$
BEGIN
  EXECUTE format('CREATE SCHEMA IF NOT EXISTS %I', p_schema);
  EXECUTE format('GRANT USAGE ON SCHEMA %I TO app_user', p_schema);
  EXECUTE format(
    'CREATE TABLE IF NOT EXISTS %I.users (LIKE public.users INCLUDING ALL)',
    p_schema);
  EXECUTE format(
    'CREATE TABLE IF NOT EXISTS %I.orders (LIKE public.orders INCLUDING ALL)',
    p_schema);
END;
$$ LANGUAGE plpgsql;

SELECT provision_tenant_schema(schema_name)
FROM tenant_registry
WHERE migrated_at IS NULL;

INCLUDING ALL carries indexes, defaults, and NOT NULL/CHECK constraints, but it deliberately does not copy foreign keys — clone them and you would point at public. Recreate foreign keys explicitly afterward so they reference the tenant-local table. The function is idempotent on purpose: CREATE SCHEMA IF NOT EXISTS and CREATE TABLE IF NOT EXISTS let a crashed provisioning run be re-executed across the full registry without touching tenants that already completed. Drive it from a registry column so you can provision in batches and resume exactly where a failure stopped.

2. Turn on dual-write

Mirror every change on the shared table into the matching tenant schema. A trigger keeps this in the database so application deploys cannot get out of sync.

CREATE OR REPLACE FUNCTION sync_to_tenant_schema()
RETURNS trigger AS $$
DECLARE
  target_schema text;
BEGIN
  SELECT schema_name INTO target_schema
  FROM tenant_registry WHERE tenant_id = NEW.tenant_id;

  IF TG_OP = 'INSERT' THEN
    EXECUTE format('INSERT INTO %I.%I SELECT ($1).*',
                   target_schema, TG_TABLE_NAME)
    USING NEW;
  ELSIF TG_OP = 'UPDATE' THEN
    EXECUTE format('UPDATE %I.%I t SET (t.*) = (($1).*) WHERE t.id = $2',
                   target_schema, TG_TABLE_NAME)
    USING NEW, NEW.id;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_dual_write
AFTER INSERT OR UPDATE ON public.orders
FOR EACH ROW EXECUTE FUNCTION sync_to_tenant_schema();

Handle DELETE the same way if your workload issues hard deletes. From this point on, both copies stay current for new traffic. The trigger fires AFTER the legacy write commits its row-level effect, so the legacy table remains the source of truth and a failure inside the mirror does not abort the customer's transaction unless you want it to. If you do want strict coupling, raise inside the function and let the parent transaction roll back; if you prefer the legacy path to always succeed, wrap the mirror in an exception handler and log the failed key for the reconciliation pass to repair. Measure replication lag here — the gap between a legacy write and its mirror should stay near zero, and a growing lag means the trigger is contending for locks on the new schema.

3. Backfill history in chunks

With dual-write live, copy the existing rows underneath the stream. Page by primary key to keep transactions short and avoid long locks.

INSERT INTO acme.orders
SELECT * FROM public.orders
WHERE tenant_id = 'acme'
  AND id > :last_id
ORDER BY id
LIMIT 10000
ON CONFLICT (id) DO NOTHING;

ON CONFLICT (id) DO NOTHING makes the backfill safe to interleave with dual-write: rows the trigger already copied are skipped instead of erroring. Loop, advancing :last_id to the highest id seen, until no rows return. Keep each chunk small — 10k rows commits in well under a second on most hardware — so you never hold a long transaction that blocks autovacuum or bloats the WAL. Run one backfill worker per tenant rather than one giant job, and cap the number of concurrent workers to protect the connection pool; provisioning thousands of schemas in parallel is the classic way to exhaust max_connections mid-migration. Because the keyset pagination is deterministic and the upsert is idempotent, a worker that dies can simply restart from its last committed :last_id.

4. Route reads through search_path

Resolve the tenant per request and scope the connection to its schema. Use SET LOCAL inside a transaction so the setting cannot leak to the next checkout from the connection pool.

app.use(async (req, res, next) => {
  const tenantId = req.headers["x-tenant-id"] ?? req.user?.tenantId;
  const schema = await tenantRegistry.getSchema(tenantId); // validated, server-side
  if (!schema) return res.status(404).json({ error: "tenant not provisioned" });

  await req.db.query("BEGIN");
  // schema comes from the registry, never from raw user input
  await req.db.query(`SET LOCAL search_path TO ${schema}, public`);
  res.on("finish", () => req.db.query("COMMIT"));
  next();
});

Roll this out behind a per-tenant flag so reads switch one tenant at a time. The legacy table is still being written via dual-write, so a misbehaving tenant can be flipped back instantly. Start with a low-volume internal tenant, watch p95 latency and error rate against the pre-migration baseline, then widen the rollout in waves. Resolve the schema from a centralized, highly available tenant registry rather than computing it from the request; a renamed or newly provisioned tenant only needs a registry update, and you avoid baking naming rules into application code. Treat a registry miss as a hard 404, never a silent fallback to public, or you reopen the cross-tenant hole the migration is meant to close.

5. Stop dual-write and drop legacy

Once every tenant reads from its schema and reconciliation is clean, drop the triggers, then archive and drop the shared tables.

DROP TRIGGER orders_dual_write ON public.orders;
-- after a retention window with backups verified:
DROP TABLE public.orders;

Do not drop the legacy tables the moment reads flip. Keep dual-write running and the shared tables intact for a retention window — long enough that any latent bug surfaces and short enough that the duplicate storage cost is bearable. During that window the legacy copy is your fastest rollback: re-point reads, and the still-current shared table serves traffic again. Only once you have a verified backup and a clean run of the verification checks below should you stop dual-write, VACUUM, and drop the shared tables to reclaim storage.

Verification

Before flipping any tenant, prove parity. Compare row counts and a content hash between the legacy slice and the new schema.

WITH legacy AS (
  SELECT count(*) AS n, md5(string_agg(id::text, ',' ORDER BY id)) AS h
  FROM public.orders WHERE tenant_id = 'acme'
),
migrated AS (
  SELECT count(*) AS n, md5(string_agg(id::text, ',' ORDER BY id)) AS h
  FROM acme.orders
)
SELECT legacy.n = migrated.n AS counts_match,
       legacy.h = migrated.h AS ids_match
FROM legacy, migrated;

Both columns must be t. The count check catches missing or extra rows; the ordered-id hash catches the subtler case where counts match but a row was overwritten with stale data during the dual-write window. Run the comparison per tenant immediately before flipping that tenant, and re-run it on a sample after the cutover to confirm dual-write is still keeping pace.

Then confirm hard isolation — querying one tenant's schema must never surface another's rows:

SET search_path TO acme, public;
SELECT count(*) FROM orders
WHERE tenant_id <> 'acme';  -- expect 0

A non-zero result means a backfill or trigger wrote the wrong tenant's rows into this schema, and you must stop the cutover and reconcile before any customer reads from it. For ongoing assurance, fold both queries into a scheduled job that samples a handful of tenants and alerts on any mismatch, so a regression in the routing layer is caught by monitoring rather than by a support ticket.

Failure Modes & Gotchas

FAQ

Do I need downtime for this migration? No. Dual-write keeps the legacy table and the new schemas in sync while you backfill and flip reads per tenant, so traffic never stops. The only brief lock-sensitive step is recreating constraints, which you schedule per tenant during low traffic.

How is this different from row-level security? Schema-per-tenant enforces isolation at the namespace boundary, while RLS keeps everything in one table behind policies. The trade-offs — blast radius, per-tenant migration cost, and connection overhead — are compared in schema-per-tenant vs row-level security for HIPAA.

What is the practical schema ceiling in PostgreSQL? Tens of thousands of schemas work, but catalog bloat and pg_dump time degrade past roughly 10k–50k. Beyond that, shard tenants across multiple database clusters or adopt a hybrid model that groups small tenants into shared schemas.