Tenant Billing & Usage Metering

Usage metering is the discipline of counting what every tenant consumes — API calls, compute seconds, seats, storage bytes, events — with enough accuracy that the number can appear on an invoice and survive a dispute. Billing sync is the contract that turns those counts into recurring revenue without double-charging, dropping usage, or letting a tenant exceed a plan it has not paid for.

These two systems share one non-negotiable property: every measurement is bound to exactly one tenant, and that binding is never lost between the moment an event is emitted and the moment a charge lands in Stripe. A metering pipeline that merges two tenants' counts produces wrong invoices, and a wrong invoice is a revenue-recognition incident, a support escalation, and in regulated industries an audit finding. This guide treats metering and billing as one event-driven system spanning four stages: emission, ingestion, aggregation, and sync.

The accuracy of this system depends on the boundary work done elsewhere. Usage events inherit their tenant tag from the resolved request context established by tenant-aware data routing, and the per-tenant aggregates are stored under the same isolation guarantees as the rest of your data, governed by your database isolation model. Get those wrong and no amount of pipeline correctness will save the invoice.

Key architectural imperatives:

Stamp every usage event with the tenant ID at the point of emission, from trusted context — never reconstruct it downstream.
Make ingestion idempotent so retries, redeliveries, and at-least-once transport never inflate a count.
Store usage as a tenant-partitioned time-series, never as a mutable counter, so any total is reproducible from raw events.
Enforce plan limits and quotas at request time against near-real-time usage, not at invoice time.
Reconcile metered totals against the billing provider continuously, so drift is caught before it becomes a refund.

Overview: Metering Models at a Glance

How you meter follows from what you sell. A seat-based plan needs an accurate count of active principals per tenant; a consumption plan needs a high-volume, append-only event stream. The model you pick determines the ingestion rate you must absorb, the latency at which quotas can be enforced, and how hard reconciliation is.

Metering Model	Boundary Enforcement	Tenant Density	Query Latency	Operational Overhead	Compliance Fit
Seat / subscription count	Tenant-scoped membership table	Very high (one row/seat)	Lowest (point lookup)	Low (no event stream)	Standard SaaS, SOC 2
Aggregated usage counters	Tenant-keyed counter rows	High (one row/tenant/meter)	Low (read counter)	Medium (lost auditability)	SaaS without dispute exposure
Event-sourced time-series	Tenant-partitioned event log	Moderate (millions of events)	Moderate (windowed rollup)	High (pipeline + storage)	Regulated, dispute-heavy, FinOps
Hybrid (events + materialized rollup)	Partitioned log + tenant rollup table	Moderate–high	Low (read rollup)	High (two stores to reconcile)	Most production metered SaaS

The event-sourced model is the only one that lets you answer "why was this tenant charged this amount" by replaying raw events, which is why dispute-heavy and regulated platforms adopt it despite the operational cost. Most mature platforms run the hybrid: an append-only event log as the source of truth, plus a materialized per-tenant rollup that serves quota checks and dashboards cheaply. The sections below trace a single usage event from emission through that pipeline to a Stripe invoice line, and show where the tenant boundary is enforced at each hop.

One event, one tenant: the tag is set at emission, deduplicated at ingestion, partitioned in the log, and the rollup feeds both real-time quota checks and the reconciled push to Stripe.

Core Architecture & Pattern Variants

The metering pipeline has four stages, and each enforces the tenant boundary differently. Emission stamps it, ingestion validates and deduplicates it, the log partitions on it, and aggregation refuses to count anything without it.

Emission happens inside the application, at the point where the billable action occurs. The event carries the tenant ID pulled from request context, a stable meter name (api.request, compute.seconds, seats.active), a quantity, a timestamp, and — critically — a client-generated idempotency key. Emitting from context, not from a parameter the caller could forge, is what makes the count trustworthy. The full set of emission and transport patterns is covered in usage metering event pipelines.

Ingestion is the chokepoint where correctness is won or lost. Transport between the application and the pipeline is at-least-once: events get retried, replayed after a crash, and occasionally duplicated by the broker. The ingestion gateway must therefore treat the idempotency key as a uniqueness constraint and drop anything it has already accepted. Without this, a single retry storm inflates a tenant's bill. The deduplication mechanics — storage choice, key design, and the trade-off between exactly-once and effectively-once — are detailed in idempotent usage event ingestion.

The event log is the append-only source of truth, partitioned by tenant so that one tenant's volume never starves another's reads and so that compaction, retention, and replay operate per tenant. Whether the log is Kafka, Kinesis, or a partitioned table, the partition key is the tenant ID. This is also what makes a single tenant's usage replayable in isolation when an invoice is disputed.

Aggregation rolls raw events into per-tenant, per-meter, per-window totals. It rejects any event missing a tenant tag — an untagged event is a bug, not a default-bucket candidate — and writes the result to a rollup store keyed by (tenant_id, meter, window). Storing aggregates as a tenant-partitioned time-series rather than a single mutable counter is what lets you recompute any historical window after a late-arriving event or a corrected event; the storage design is covered in tenant-partitioned time-series for metering.

Plan enforcement and Stripe sync are the two consumers of the rollup. Enforcement reads it to decide whether a tenant may proceed; sync reads it to assemble invoice line items. Both are covered in their own sections below.

Tenant Routing & Context Propagation

A usage event is only as trustworthy as the tenant tag it carries, and that tag must be established once, server-side, from the verified identity produced by your cross-tenant access control layer — exactly as request routing does it. The emitter pulls tenant_id from the same immutable request context that scopes the database query, so a metered action and the data it touched always agree on whose tenant it was.

The hard cases are the ones where the original request context is gone. A nightly batch job that meters storage bytes per tenant has no inbound HTTP request; a fan-out worker processing a queue handles many tenants in sequence. In both, the tenant ID must travel inside the work item and be rehydrated before any event is emitted, never read from a process global. These propagation rules are the same ones that govern query scoping, detailed in tenant context injection strategies.

Pipeline Layer	Responsibility	Tenant Signal	Failure Action
Emitter	Stamp event with tenant + idempotency key	Request/job context tenant ID	Refuse to emit untagged event
Ingestion gateway	Validate tag, deduplicate	Event `tenant_id` field	400 on missing tenant; drop on dup key
Event log	Durable, ordered, partitioned transport	Partition key = tenant ID	Quarantine unpartitionable events
Aggregator	Roll up per tenant/meter/window	Event tenant ID	Dead-letter untagged events
Billing sync	Map rollup to plan; push to provider	Rollup tenant ID	Halt sync on tenant mismatch

Validate the tenant tag at ingestion, not only at emission. A defense-in-depth gateway that rejects any event without a resolvable tenant catches both client bugs and replayed events whose tenant has since been deleted. Treat an untagged event as a hard failure routed to a dead-letter queue for investigation — never silently attribute it to a default tenant, because that is how one tenant's usage lands on another's invoice.

Compliance & Auditability Alignment

Billing data is financial data, and metering pipelines inherit financial-grade audit expectations on top of the multi-tenant isolation requirements. Regulators and auditors want to see that a charge traces back to discrete, immutable, tenant-attributed events, and that no tenant's usage can be observed by or attributed to another.

Framework	Metering/Billing Requirement	Enforcement	Validation
GDPR	Usage data is personal data; erasure must extend to it	Tenant-scoped event log with deletion workflow	Deletion receipts; retention policy audit
SOC 2	Accurate billing, change control, audit trail	Immutable event log + tenant-tagged sync logs	Reconciliation evidence per control
HIPAA	No PHI in usage events; segregated metering	Meter on opaque IDs; per-entity isolation	Event-schema review; access audit
FedRAMP	Auditable, isolated billing within boundary	Per-tenant pipeline in authorized boundary	Continuous monitoring; integrity checks

The immutable, append-only event log is the same artifact that satisfies SOC 2's "accurate billing" criterion and serves as audit evidence — which is why event sourcing pays for itself in regulated contexts. Keep usage events free of personal and protected data: meter on opaque tenant and resource identifiers, not on user emails or record contents, so the billing pipeline never becomes a second copy of regulated data. The broader control mapping — audit log architecture, residency, and erasure — lives under multi-tenant compliance and data governance, and a tenant's right to erasure must reach the metering store too, as covered in per-tenant data deletion workflows.

Billing Sync & Metering Architecture

Billing sync is the last mile: it reads per-tenant rollups, maps them onto each tenant's subscription and plan, and pushes the result to the billing provider as metered usage records or invoice items. The risk concentrated here is double-billing and drift — the metered total in your system disagreeing with what the provider charged.

Component	Role	Tenant Boundary Control
Plan catalog	Define meters, tiers, prices, included quantities	Map subscription → tenant
Rollup reader	Pull finalized per-tenant/window totals	Read keyed by tenant ID
Usage reporter	Push metered records to provider	Idempotency key per tenant/period
Webhook handler	Ingest provider invoice/payment events	Resolve tenant from subscription ID
Reconciler	Compare internal totals vs provider charges	Per-tenant diff, alert on drift

Push usage to Stripe with an idempotency key derived from (tenant_id, meter, period) so a retried report never creates a second usage record for the same window. Stripe's metered billing accepts usage records against a subscription item; the mapping from your internal tenant to that subscription item is the boundary that must never cross. The full reporting flow, subscription-item mapping, and aggregation modes are covered in billing sync with Stripe.

The reverse direction — Stripe's webhooks notifying you of paid invoices, failed payments, and subscription changes — is where tenant attribution most often breaks, because the webhook carries a Stripe customer or subscription ID that you must resolve back to a tenant before acting. Resolve it through a stored mapping, never by parsing metadata you cannot trust, and make the handler idempotent against redelivered webhooks. These reconciliation patterns are detailed in reconciling Stripe webhooks per tenant.

Usage flows out with idempotency keys, webhooks flow back resolved to a tenant, and the reconciler compares both sides so drift is caught before an invoice finalizes.

Run the reconciler on a schedule that closes each billing period before invoices finalize. It pulls the finalized internal rollup and the provider's recorded usage for the same tenant and window, diffs them, and alerts on any non-zero delta. A discrepancy means a dropped event, a duplicate, or a mapping error — all of which are cheaper to fix before the customer sees the charge than after.

Migration & Hybrid Strategies

Most teams do not start with event sourcing. They begin with a counter incremented in the application database, discover it cannot answer disputes or survive concurrent writes, and migrate to an event-driven pipeline. Do this without a billing gap by dual-writing: keep the legacy counter authoritative while the new pipeline runs in shadow, compare the two totals per tenant per period, and cut over to the pipeline as the billing source only once the diff is consistently zero across a full cycle.

Backfill carefully. If you have historical raw logs, replay them through the idempotent ingestion path so the new time-series reconstructs past windows; the idempotency keys prevent the backfill from colliding with live traffic. Where no raw history exists, seed the rollup store with the legacy counter's closing values as an opening balance per tenant, and mark those windows as non-replayable so reconciliation does not flag them.

Hybrid metering is the steady state, not a transition artifact. The append-only log is the source of truth for audit and replay; the materialized rollup serves quota checks and dashboards at low latency. Reconciliation between the two — log totals versus rollup totals — runs alongside the Stripe reconciliation, because a divergence there is the earliest signal that the aggregator dropped or double-counted an event.

Implementation Reference

The snippets below show the boundary enforced at each stage: emission from context, idempotent ingestion, time-series rollup, quota enforcement, and metered reporting to Stripe. Each is runnable against the stated stack.

Emit a tenant-tagged, idempotent usage event from a TypeScript service. The tenant ID and idempotency key come from trusted context, never from caller input:

import { randomUUID } from 'node:crypto';

interface UsageEvent {
  eventId: string;      // idempotency key, stable across retries
  tenantId: string;     // from request context, never client input
  meter: string;        // 'api.request' | 'compute.seconds' | ...
  quantity: number;
  occurredAt: string;   // ISO-8601, set at emission
}

export async function emitUsage(
  ctx: { tenantId: string },
  meter: string,
  quantity: number,
): Promise<void> {
  const event: UsageEvent = {
    eventId: randomUUID(),
    tenantId: ctx.tenantId,
    meter,
    quantity,
    occurredAt: new Date().toISOString(),
  };
  if (!event.tenantId) throw new Error('refusing to emit untagged usage event');
  await publishToBus(event); // at-least-once transport; dedup happens downstream
}

Idempotent ingestion in Go: insert the event keyed by its idempotency key, treating a unique-violation as a no-op so retries never inflate the count:

func IngestUsage(ctx context.Context, db *sql.DB, e UsageEvent) error {
	if e.TenantID == "" {
		return fmt.Errorf("usage event missing tenant_id: %s", e.EventID)
	}
	const q = `
		INSERT INTO usage_events (event_id, tenant_id, meter, quantity, occurred_at)
		VALUES ($1, $2, $3, $4, $5)
		ON CONFLICT (event_id) DO NOTHING`
	res, err := db.ExecContext(ctx, q, e.EventID, e.TenantID, e.Meter, e.Quantity, e.OccurredAt)
	if err != nil {
		return err
	}
	if n, _ := res.RowsAffected(); n == 0 {
		// duplicate delivery: already counted, safe to ack
		return nil
	}
	return nil
}

PostgreSQL time-series schema and a windowed rollup. The events table is partitioned by tenant and the rollup is keyed per tenant, meter, and hour so any window is recomputable:

CREATE TABLE usage_events (
  event_id    uuid PRIMARY KEY,
  tenant_id   uuid NOT NULL,
  meter       text NOT NULL,
  quantity    numeric NOT NULL,
  occurred_at timestamptz NOT NULL
) PARTITION BY LIST (tenant_id);

-- Materialized per-tenant hourly rollup, idempotently recomputable:
INSERT INTO usage_rollup (tenant_id, meter, window_start, total)
SELECT tenant_id, meter, date_trunc('hour', occurred_at), SUM(quantity)
FROM usage_events
WHERE occurred_at >= $1 AND occurred_at < $2
GROUP BY tenant_id, meter, date_trunc('hour', occurred_at)
ON CONFLICT (tenant_id, meter, window_start)
DO UPDATE SET total = EXCLUDED.total;

Enforce a plan limit at request time in Python by reading the near-real-time rollup before allowing the metered action:

def check_quota(tenant_id: str, meter: str, requested: int, period_start) -> None:
    plan = get_plan(tenant_id)                       # includes per-meter limits
    used = read_rollup_total(tenant_id, meter, period_start)
    limit = plan.limits.get(meter)
    if limit is not None and used + requested > limit:
        raise QuotaExceeded(
            tenant_id=tenant_id, meter=meter, limit=limit, used=used
        )

Report metered usage to Stripe with a deterministic idempotency key so a retried sync never double-charges the tenant for the same window:

import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function reportUsage(
  subscriptionItemId: string,
  tenantId: string,
  meter: string,
  period: string,
  quantity: number,
) {
  await stripe.subscriptionItems.createUsageRecord(
    subscriptionItemId,
    { quantity, timestamp: 'now', action: 'set' },
    { idempotencyKey: `${tenantId}:${meter}:${period}` },
  );
}

Pitfalls & Anti-Patterns

Reconstructing the tenant ID downstream. Any pipeline stage that infers which tenant an event belongs to — from the topic, the source IP, or a lookup that "should" be unique — will eventually attribute usage to the wrong tenant. Stamp the tenant ID at emission from trusted context and carry it as an immutable field; reject, never guess, an event that arrives without one.

Non-idempotent ingestion. At-least-once transport guarantees duplicates: brokers redeliver, clients retry on timeout, and crash recovery replays. An ingestion path without an idempotency key turns every duplicate into extra billable usage. Key every event and treat a repeat key as an already-counted no-op.

Mutable counters as the source of truth. A single UPDATE ... SET count = count + 1 row cannot answer "why was this the total," loses history on correction, and serializes concurrent writes into a hotspot. Store raw, append-only events and materialize counters from them, so any total is reproducible and any window recomputable after a late event.

Enforcing quotas at invoice time. Discovering at month-end that a tenant exceeded its plan is too late — the resource was already consumed for free or, worse, the overage is disputed. Check usage against the near-real-time rollup at request time and block or meter overage before the action completes.

Losing tenant context in batch and async jobs. A nightly storage-metering job or a queue worker has no inbound request to read the tenant from. If it falls back to a process global, it meters whichever tenant ran last. Serialize the tenant ID into the job or message and rehydrate it before emitting any event.

Webhook handlers that trust untrusted attribution. Resolving a Stripe webhook to a tenant by parsing free-form metadata, or assuming one customer maps to one tenant without checking, mis-routes payment and subscription events. Resolve through a stored, authoritative customer-to-tenant mapping and make the handler idempotent against redelivery.

FAQ

Should I store usage as counters or as events? Store raw events as the source of truth and materialize counters from them. Counters alone cannot answer a billing dispute, lose history when a value is corrected, and become a write hotspot under concurrency; an append-only event log plus a materialized per-tenant rollup gives you both auditability and cheap reads.

How do I stop retries from double-billing a tenant? Make ingestion idempotent. Give every event a client-generated idempotency key, enforce it as a uniqueness constraint at the ingestion gateway, and treat a repeated key as an already-counted no-op. Apply the same principle when reporting to Stripe by deriving the idempotency key from the tenant, meter, and billing period.

When should I enforce plan limits — at request time or at billing time? At request time. Read the tenant's near-real-time usage rollup before allowing a metered action and block or meter the overage immediately. Enforcing at invoice time means the resource is already consumed, which either gives it away free or creates a disputed overage charge.

How do I keep metering accurate in background jobs? Carry the tenant ID inside the job payload or message and rehydrate it in the worker before emitting any event, exactly as request middleware sets it. Never read the tenant from a process global or thread-local, because a single worker processes many tenants in sequence and will mis-attribute usage.

How do I detect when my metering and Stripe disagree? Run a per-tenant reconciler each period that diffs your finalized internal rollup against the provider's recorded usage for the same tenant and window, and alert on any non-zero delta. A discrepancy signals a dropped event, a duplicate, or a mapping error, and it is far cheaper to fix before the invoice finalizes than to refund afterward.