eRPC Finalized Cache Postgres Volume Full
When an eRPC finalized cache uses ttl: 0, the Postgres volume can grow until writes fail. Do not fix it by blindly deleting active-chain cache; first separate retained value, retired-chain rows, alarm thresholds, and the true disk ceiling.
Keep active finalized cache, but make disk growth bounded and visible.
Use this for eRPC, RPC gateways, indexers, or Postgres-backed response caches where no-expiry finalized data is valuable but an unbounded volume can take the service offline.
measure volume -> classify chains -> prune retired rows -> alert before 100%
Measure volume use and cache ownership before deleting rows.
These checks stay public-safe if table names, chain names, and counts are enough. Do not paste credentials, RPC keys, private logs, or customer request payloads.
df -h; pg size by table; chain/method/finality row counts
Runbook: No-Expiry Cache Needs A Disk Contract
- Confirm the outage boundary first: volume at 100%, Postgres offline, DNS or connection failures, and whether writes fell through to paid upstream RPC.
- Keep
ttl: 0only for active finalized data with clear value, such as cold-resync speedup or paid-RPC avoidance. - Do not apply a blanket TTL if it evicts deep-history cache that is expensive to rebuild. Use chain and finality scope instead.
- Identify retired or decommissioned-chain rows. Delete those first, then run the correct vacuum strategy for the hosting environment.
- Add a warning at about 75% and a hard floor by absolute free GB. A percentage threshold alone can be misleading on small volumes.
- Track growth by chain, method, and finality. The policy should estimate when the larger volume will fill again.
- Record the fallback cost. If cache misses trigger paid RPC, the disk policy is also a cost-control policy.
Use this when a no-expiry eRPC cache fills Postgres.
This frames the durable fix around scoped retention and alarms, not blind cache deletion.
I would keep the fix centered on a cache-retention contract rather than a blanket TTL.
Before pruning, I would capture:
- current volume used/free and Postgres table sizes
- row counts by chain, method, and finality
- which chains are active versus retired/decommissioned
- whether cache misses fall through to paid RPC or raw upstream
- the alarm threshold and the estimated days until the next full-volume date
For the durable fix, I would keep active-chain finalized ttl:0 only where it has clear resync/cost value, prune retired-chain rows first, VACUUM after the scoped delete, and add a 70-80% volume alarm plus an absolute free-GB floor. That preserves the useful finalized cache while making the next disk cliff visible.
Turn one cache outage into a reusable retention policy.
The $99 policy is for RPC gateways, indexers, Postgres-backed caches, and self-hosted app teams that need chain-aware retention, alarm thresholds, prune boundaries, and acceptance checks for one representative cache-volume incident.
Do Not Delete First
- Active-chain finalized cache that protects resync speed or avoids paid upstream RPC.
- Any rows before recording chain, method, finality, and row-count evidence.
- Postgres data files or volumes without a backup, rollback, and hosting-specific vacuum plan.
- The alarm and cost evidence that proves why the cache policy is worth keeping.