Distributed Caching
HyperRoute includes a built-in distributed caching layer that accelerates query plan resolution, entity lookups, and persisted query retrieval. Three backends are available — use them individually or combined in a layered configuration.
Cache Backends
Memory Cache
In-process, zero-latency cache. Ideal for single-instance deployments or as the L1 layer in a layered setup.
cache:
backend: memory
max_entries: 10000
Redis Cache
Shared cache across multiple router instances. Ensures consistency in horizontally-scaled deployments.
cache:
backend: redis
url: redis://redis:6379
pool_size: 20
connection_timeout: 5s
Layered Cache (L1 + L2)
The recommended production configuration. Combines memory speed with Redis consistency:
cache:
backend: layered
l1:
type: memory
max_entries: 10000
l2:
type: redis
url: ${REDIS_URL}
pool_size: 20
connection_timeout: 5s
How it works:
Request → L1 (Memory) → HIT → Return instantly (~0ms)
→ MISS → L2 (Redis) → HIT → Populate L1, return (~1ms)
→ MISS → Execute, populate L1+L2
| Layer | Speed | Shared | Survives Restart |
|---|---|---|---|
| L1 (Memory) | ~0ms | No (per-instance) | No |
| L2 (Redis) | ~1ms | Yes (all instances) | Yes |
What Gets Cached
| Cache Type | Key | Default TTL | Description |
|---|---|---|---|
| Query Plan | Query hash | 3600s (1h) | Parsed and optimized execution plans |
| Entity | Entity key | 300s (5m) | Resolved entity data from subgraphs |
| APQ | Hash | 86400s (24h) | Automatic Persisted Queries |
TTL Configuration
cache:
ttl:
query_plan: 3600 # 1 hour
entity: 300 # 5 minutes
apq: 86400 # 24 hours
In-Flight Deduplication
Separate from caching, HyperRoute deduplicates identical in-flight requests. When 10,000 identical requests arrive simultaneously:
10,000 identical requests → 1 upstream call → response shared to all 10,000 clients
This protects your subgraphs from thundering herd events during traffic spikes. The hyperroute_inflight_dedup_hits_total metric tracks how often deduplication fires.
Cache Metrics
Monitor cache effectiveness with built-in Prometheus metrics:
| Metric | Type | Description |
|---|---|---|
hyperroute_cache_hits_total | Counter | Cache hits (plan + entity) |
hyperroute_cache_misses_total | Counter | Cache misses |
hyperroute_inflight_dedup_hits_total | Counter | In-flight dedup savings |
Cache hit ratio formula:
hit_ratio = hyperroute_cache_hits_total / (hyperroute_cache_hits_total + hyperroute_cache_misses_total)
A healthy production deployment typically sees >95% cache hit rates for query plans.
Complete Cache Config
cache:
backend: layered
l1:
type: memory
max_entries: 10000
l2:
type: redis
url: ${REDIS_URL}
pool_size: 20
connection_timeout: 5s
ttl:
query_plan: 3600
entity: 300
apq: 86400
Best Practices
- Always use layered caching in multi-instance deployments — L1 handles hot data at memory speed, L2 provides cross-instance consistency
- Size L1 appropriately —
max_entries: 10000covers most workloads; increase for APIs with many unique queries - Monitor cache hit ratios — if hit rates drop below 90%, consider increasing TTLs or L1 capacity
- Tune entity TTL carefully — shorter TTLs mean fresher data but more subgraph load; longer TTLs reduce load but increase staleness
Next Steps
- Observability — Monitor cache metrics with Prometheus
- Configuration — Full cache configuration reference
- Deployment — Redis setup in production