Caching & State Management
TL;DR
The gateway uses a multi-layer cache hierarchy to keep subscription, routing, and billing decisions off the database hot path. Redis acts as the authoritative enforcement layer for write-sensitive state, while PostgreSQL is treated as a delayed, derived ledger. Per-request database writes are avoided by batching persistence and relying on usage logs for recovery, explicitly trading immediate durability for predictable latency and reconstructable correctness under failure.
Caching Invariant
Cached state must never cause the gateway to admit a request that would be rejected by authoritative enforcement logic. When cached data is stale, missing, or inconsistent, the system falls back to conservative evaluation rather than risking incorrect access or billing decisions.
Problem
The gateway must make subscription, routing, and billing decisions on every request without querying the primary database.
These decisions depend on control-plane data (applications, endpoints, subscriptions) and limited mutable state (wallet balance, usage), both of which must be fast to access and consistent across distributed gateway instances.
What Is Cached (and What Isn't)
The system does not cache vendor API responses or third-party payloads. Caching is limited to gateway-owned metadata and enforcement state.
- Application configuration
- Endpoint and endpoint-group definitions
- Subscription state and limits
- Wallet balance and usage counters (write-sensitive)
Cache Hierarchy
A three-layer cache hierarchy is used to separate latency-sensitive request handling from durable persistence.
- L1: In-memory per-instance cache for ultra-hot reads
- L2: Redis as a shared, authoritative cache
- L3: PostgreSQL as the source of truth
Requests fall through the hierarchy on misses. L3 fetches are protected by single-flight to avoid thundering herd effects.
Write Path Design
Write-sensitive state (wallet balance and usage) is updated on the hot path using Redis with custom Lua scripts to ensure atomicity.
Durable persistence to PostgreSQL is intentionally delayed and batched. This avoids per-request database writes while preserving correctness for billing and reconciliation.
Durability & Recovery Model
Wallet and usage updates are not persisted synchronously to PostgreSQL on every request. Instead, PostgreSQL is treated as a derived ledger rather than the real-time enforcement layer.
Enforcement correctness is maintained using Redis-backed state on the hot path, while API usage logs are recorded independently for durable reconstruction and reconciliation.
In the event of process or host failure before batched persistence completes, authoritative state can be reconstructed from usage logs without reintroducing per-request database writes.
Deferred Complexity
Deferred: Size-based eviction for the in-memory (L1) cache.
L1 cache entries are currently invalidated via explicit events and time-based expiry. Size-based eviction (e.g., LRU or ARC) has been deferred because current access patterns do not justify the added complexity or memory accounting overhead.
Cache Invalidation
Control-plane updates emit invalidation events through a message broker. Both L1 and L2 entries are evicted explicitly on change.
Time-based TTLs exist as a safety net. L1 currently lacks size-based eviction, which is an acknowledged limitation.
Tradeoffs Accepted
Delayed durability: Postgres writes are batched to protect the hot path, accepting short windows of reconstruction risk.
Manual eviction over LRU: Event-driven invalidation simplifies correctness but risks memory growth under skewed access patterns.
What I'd Change at Scale
Introduce bounded L1 eviction (LRU or ARC) and versioned cache entries to reduce reliance on global invalidation events as configuration and traffic scale.