API Gateway Architecture

TL;DR

The gateway acts as a centralized policy and economic enforcement layer on the request hot path. It authorizes requests using client identity and subscription metadata, enforcing access rules, routing, and usage limits before traffic reaches upstream vendors. Requests and responses are treated as opaque, keeping vendor integrations simple while guaranteeing atomicity between admission, quota enforcement, and billing decisions. The design prioritizes predictable behavior, operational simplicity, and cost constraints over heavyweight orchestration or managed gateways.

Gateway Invariant

Request admission, routing eligibility, and usage allowance are evaluated atomically within the gateway. A request is either fully admitted and forwarded with all enforcement decisions applied, or it is rejected without producing partial side effects.

Problem

The gateway must act as a single enforcement point for access control, routing, and usage limits while remaining largely transparent to request and response payloads.

This creates a tension: the gateway cannot own business workflows or response composition, yet it must still make high-impact decisions on every request - authorization, endpoint eligibility, and usage allowance - before forwarding traffic upstream.

  • Requests must be validated and authorized without understanding domain logic
  • Routing decisions depend on client state, not just request paths
  • Rate limiting and usage allowance must be enforced consistently across distributed gateway instances
  • Failures must be surfaced without exposing internal gateway state

Solution

Non-goal: The gateway does not attempt to validate or interpret business payloads beyond structural correctness.

The gateway is designed as a policy execution layer rather than a business-aware service. It owns request admission decisions - who can call what, and whether the call is allowed at that moment - but treats request and response payloads as opaque.

Each incoming request is evaluated against authentication state, subscription rules, and usage limits before routing is resolved. Requests that fail policy checks are rejected early; successful requests are forwarded without embedding workflow or domain logic in the gateway.

This separation allows the gateway to enforce consistency and safety across clients while keeping vendor integrations and business behavior outside its responsibility.

Routing and enforcement decisions are derived from identity and subscription metadata rather than payload inspection.

This allows client-specific behavior while keeping the request boundary stateless and performant.

Constraints

Cost sensitivity: Infrastructure spend needed to remain low and predictable, limiting the use of heavyweight orchestration, managed gateways, or per-request compute.

Operational bandwidth: A small team with limited capacity for on-call load and system babysitting, favoring simpler deployment and failure modes.

Performance envelope: The gateway sits on the hot path for all client traffic, requiring low memory overhead and predictable latency under concurrent load.

Why Not a Prebuilt Gateway

Existing gateways and reverse proxies (e.g., NGINX, Kong) are optimized for traffic management and operational concerns, not for subscription-aware policy enforcement or usage-based economics.

While they support basic rate limiting and authentication, these mechanisms are typically route- or consumer-scoped and static. Modeling plan-based access, endpoint groups with shared quotas, or pay-as-you-go usage requires significant external state and custom plugins.

This externalization breaks atomicity between request admission, quota enforcement, and billing decisions - exactly the boundary the gateway is expected to guarantee.

Additionally, prebuilt gateways treat request and response logs as operational telemetry rather than first-class data. Persisting per-request usage and responses for downstream analytics, auditing, and vendor visibility would require parallel infrastructure regardless.

Deferred Complexity

A DFA-based matcher becomes justified once endpoint count and request diversity grow enough that routing cost is no longer negligible relative to upstream latency.

Problem: Request path matching and endpoint resolution currently rely on regex-based matching against registered endpoint templates.

While functional, this approach introduces measurable overhead on the hot path and scales linearly with the number of registered endpoints, making worst-case matching cost harder to bound as configuration grows.

Why it's messy: Regex matching happens per request and couples routing correctness with runtime behavior rather than a pre-validated structure.

Why we accepted it: Regex-based matching was the fastest way to support flexible endpoint definitions using the existing routing stack. A DFA-based matcher was identified as a better long-term solution, but replacing the routing layer early would have slowed overall delivery without immediate user-facing benefit.

This limitation directly motivated the DFA-based routing design described in the Routing deep dive.

Tradeoffs Accepted

Gin over fasthttp: fasthttp offers higher raw throughput, but Gin provides a more mature ecosystem, better middleware compatibility, and stable OpenTelemetry integration already used across the application.

Regex-based routing (temporary): Simpler to integrate with existing endpoint definitions, at the cost of higher per-request overhead and weaker guarantees around matching performance.

Gateway-centric complexity: Centralizing routing and enforcement logic simplifies client behavior but increases the burden on the gateway's correctness and performance.

What I'd Change at Scale

1. DFA-based path matching

Compile endpoint templates into a deterministic finite automaton at build or configuration load time. This removes regex matching from the hot path and provides predictable O(n) matching cost per request.

2. Precompiled validation plans

Precompute validation and routing decisions per endpoint to reduce conditional logic during request handling.

3. Stronger separation between config and execution

Move more logic into compile-time or load-time phases so runtime behavior becomes simpler and easier to reason about under load.