Managing Per-Tenant Encryption Keys with KMS
Per-tenant encryption only delivers isolation if each tenant's data is sealed under a key that other tenants — and a single leaked credential — cannot use to decrypt across boundaries. This page sits under Per-Tenant Encryption & Key Management for Multi-Tenant SaaS and shows how to operate per-tenant data keys with AWS KMS (or an equivalent like GCP Cloud KMS or HashiCorp Vault): envelope encryption with GenerateDataKey, caching plaintext data keys without leaking them, choosing between a customer master key per tenant and a shared master key bound to tenant context, and rotating both layers without rewriting ciphertext.
Problem Framing
The naive design calls KMS Encrypt on every record. That works until you hit throughput limits — KMS request quotas are shared across the account, and a busy tenant will throttle every other tenant. It is also expensive: each Encrypt/Decrypt is a billed, network-round-trip API call. The correct shape is envelope encryption. KMS holds a long-lived key encryption key (the customer master key, or CMK). Your application asks KMS for a short-lived data encryption key (DEK) via GenerateDataKey, uses the DEK locally to encrypt the payload with AES-256-GCM, and stores the KMS-wrapped DEK next to the ciphertext. KMS never sees your plaintext, and bulk crypto happens in your process at memory speed.
The decision that defines isolation is whether each tenant gets its own CMK or whether one shared CMK wraps every tenant's DEKs with the tenant id carried in the encryption context. A per-tenant CMK gives you a per-tenant kill switch and a clean cryptographic boundary; a shared CMK with encryption context scales to far more tenants but relies on every call passing — and verifying — the right tenant id. Get the encryption context wrong on a shared CMK and a DEK wrapped for tenant A can be unwrapped while serving tenant B, which is silent cross-tenant decryption that looks like a normal read. The same kill-switch instinct that drives per-tenant key separation also underpins rotating tenant-specific JWT signing keys and feeds directly into per-tenant data deletion workflows, where destroying a tenant's key crypto-shreds its data.
Step-by-Step Guide
1. Choose per-tenant CMK or shared CMK with encryption context
Decide the key topology before writing code, because it dictates your IAM policy, your cost curve, and your blast radius. Use a per-tenant CMK when you have a few hundred high-value tenants and need an auditable, revocable boundary per customer. Use a shared CMK plus encryption context when you have thousands of tenants and cannot afford a CMK each (KMS keys are billed monthly and have account quotas).
| Dimension | Per-tenant CMK | Shared CMK + encryption context |
|---|---|---|
| Boundary | Cryptographic, per key | Logical, enforced by context match |
| Kill switch | Disable one CMK | Deny via key policy / delete DEKs |
| Scale ceiling | Hundreds of tenants | Tens of thousands |
| Monthly cost | One CMK fee per tenant | One CMK fee total |
| Audit clarity | Per-tenant CloudTrail | Filter CloudTrail by context |
2. Generate a data key scoped to the tenant
Call GenerateDataKey with the tenant id in the encryption context. KMS returns both the plaintext DEK (use it now) and the wrapped DEK (persist it). The encryption context is authenticated additional data — it is not secret, but KMS will refuse to unwrap unless the exact same context is supplied on decrypt.
import { KMSClient, GenerateDataKeyCommand } from "@aws-sdk/client-kms";
const kms = new KMSClient({});
export async function newTenantDataKey(keyId: string, tenantId: string) {
const res = await kms.send(new GenerateDataKeyCommand({
KeyId: keyId, // per-tenant CMK arn, or the shared CMK arn
KeySpec: "AES_256",
EncryptionContext: { tenant_id: tenantId },
}));
return {
plaintext: res.Plaintext as Uint8Array, // use, then zero
wrapped: Buffer.from(res.CiphertextBlob!).toString("base64"),
};
}
3. Encrypt the payload locally and zero the plaintext key
Run AES-256-GCM in your process with a fresh 96-bit IV per record. Persist the IV, the auth tag, the ciphertext, and the base64 wrapped DEK. Overwrite the plaintext DEK buffer the moment you are done so it cannot linger in a heap dump or a crash core.
import { createCipheriv, randomBytes } from "node:crypto";
export function sealRecord(plaintextDek: Uint8Array, payload: Buffer) {
const iv = randomBytes(12);
const cipher = createCipheriv("aes-256-gcm", plaintextDek, iv);
const ciphertext = Buffer.concat([cipher.update(payload), cipher.final()]);
const tag = cipher.getAuthTag();
plaintextDek.fill(0); // wipe the DEK from memory immediately
return {
iv: iv.toString("base64"),
tag: tag.toString("base64"),
ciphertext: ciphertext.toString("base64"),
};
}
4. Cache plaintext DEKs safely under a strict TTL
Calling KMS on every read reintroduces the throttling you avoided. Cache the unwrapped DEK in memory keyed by the wrapped blob, but only briefly (60–300 seconds) and never on disk or in Redis. The AWS Encryption SDK ships a local cryptographic materials cache that enforces a max age and a max-bytes/max-messages limit per DEK; reuse it instead of hand-rolling.
from aws_encryption_sdk import EncryptionSDKClient, StrictAwsKmsMasterKeyProvider
from aws_encryption_sdk.caches.local import LocalCryptoMaterialsCache
from aws_encryption_sdk.materials_managers.caching import CachingCryptoMaterialsManager
client = EncryptionSDKClient()
cache = LocalCryptoMaterialsCache(capacity=1000)
def cmm_for(cmk_arn: str):
provider = StrictAwsKmsMasterKeyProvider(key_ids=[cmk_arn])
return CachingCryptoMaterialsManager(
master_key_provider=provider,
cache=cache,
max_age=120.0, # seconds a DEK may be reused
max_messages_encrypted=4096,
)
def encrypt(cmk_arn: str, tenant_id: str, data: bytes) -> bytes:
ciphertext, _ = client.encrypt(
source=data,
materials_manager=cmm_for(cmk_arn),
encryption_context={"tenant_id": tenant_id},
)
return ciphertext
5. Decrypt with the same encryption context and verify it
On read, unwrap the DEK by passing the identical EncryptionContext. If you use the raw KMS API rather than the Encryption SDK, KMS does not return the context to you automatically on the high-level path — assert that the request's tenant matches before you hand back plaintext, so a wrapped DEK from one tenant cannot be replayed under another.
import { DecryptCommand } from "@aws-sdk/client-kms";
export async function openRecord(wrappedB64: string, tenantId: string) {
const res = await kms.send(new DecryptCommand({
CiphertextBlob: Buffer.from(wrappedB64, "base64"),
EncryptionContext: { tenant_id: tenantId }, // must match GenerateDataKey
}));
if (res.EncryptionContext?.tenant_id !== tenantId) {
throw new Error("CROSS_TENANT_DEK"); // refuse, do not return plaintext
}
return res.Plaintext as Uint8Array; // the plaintext DEK; use then zero
}
6. Rotate the key encryption key and the data keys independently
Enable automatic annual rotation on each CMK so the backing key material rolls without changing the key id; existing wrapped DEKs stay decryptable because KMS keeps prior versions. That rotates the outer layer only. To rotate the inner layer — the DEK that actually encrypts a record — you must re-wrap or re-encrypt the data, which is the lever you pull after a suspected key compromise or to honor a crypto-erasure request.
# Outer rotation: roll CMK backing material yearly, key id unchanged
aws kms enable-key-rotation --key-id "$TENANT_CMK_ARN"
# Inner rotation: re-wrap a tenant's DEKs under the current CMK version
aws kms re-encrypt \
--ciphertext-blob fileb://wrapped_dek.bin \
--source-encryption-context tenant_id=acme \
--destination-key-id "$TENANT_CMK_ARN" \
--destination-encryption-context tenant_id=acme \
--query CiphertextBlob --output text | base64 -d > wrapped_dek.new
Verification
Prove the two properties that make per-tenant KMS safe: a DEK unwraps only under its own tenant context, and a mismatched context is rejected by KMS itself rather than by your code.
import pytest
import aws_encryption_sdk
from botocore.exceptions import ClientError
def test_dek_is_bound_to_tenant_context():
ct = encrypt(CMK_ARN, "acme", b"patient record")
# correct context decrypts
pt, header = aws_encryption_sdk.EncryptionSDKClient().decrypt(
source=ct, key_provider=StrictAwsKmsMasterKeyProvider(key_ids=[CMK_ARN])
)
assert pt == b"patient record"
assert header.encryption_context["tenant_id"] == "acme"
def test_wrong_tenant_context_is_refused():
blob = generate_wrapped_dek(CMK_ARN, "acme")
with pytest.raises(ClientError) as e:
raw_kms_decrypt(blob, encryption_context={"tenant_id": "globex"})
assert "InvalidCiphertext" in str(e.value)
Then confirm at the audit layer that every decrypt carried a tenant context. CloudTrail records the encryptionContext on each KMS call, so a decrypt without one is a bug worth alerting on:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=Decrypt \
--query 'Events[].CloudTrailEvent' --output text \
| jq 'fromjson | .requestParameters.encryptionContext // "MISSING_CONTEXT"'
Failure Modes & Gotchas
- Plaintext DEK left in memory or logs. Symptom: a heap dump or debug log exposes a usable data key. Cause: the plaintext DEK was never zeroed after use. Fix: overwrite the buffer (
buf.fill(0)) immediately after encrypt/decrypt and never serialize it. - Empty or wrong encryption context on a shared CMK. Symptom: tenant A's DEK unwraps while serving tenant B. Cause: the tenant id was omitted or defaulted on decrypt. Fix: make
EncryptionContextmandatory on bothGenerateDataKeyandDecrypt, and assert the returned context matches the request tenant. - KMS throttling under load. Symptom:
ThrottlingExceptionspikes for all tenants when one tenant is busy. Cause: per-record KMS calls with no DEK caching. Fix: adopt envelope encryption with a short-TTL local materials cache and a max-messages cap per DEK. - Confusing CMK rotation with data rotation. Symptom: after "rotating keys," old ciphertext is still encrypted with the compromised DEK. Cause:
enable-key-rotationonly rolls the CMK, not the DEKs. Fix: re-wrap or re-encrypt the affected records to retire the inner key.
FAQ
Should I use one CMK per tenant or a shared CMK with encryption context?
Use a per-tenant CMK when you have hundreds of high-value tenants needing a revocable, independently auditable boundary, and a shared CMK with a tenant_id encryption context when you have thousands of tenants where a CMK each is too costly. Both are valid; the shared model only stays safe if every call passes and verifies the tenant context.
How does this help with GDPR erasure? If a tenant's data is encrypted under keys that only that tenant's DEKs can open, destroying or disabling those keys renders the ciphertext permanently unreadable — crypto-shredding. That makes deletion fast and provable even across backups, which is why key destruction is a step in tenant data deletion workflows.
Is it safe to cache the plaintext data key at all? Yes, in memory only, under a strict max age (a couple of minutes) and a max-messages cap, using a vetted cache like the AWS Encryption SDK's. Never write a plaintext DEK to disk, Redis, or logs, and always zero it when the entry expires.