Sparse Volume Overcommit Host ENOSPC

Sparse VM or container volumes can each pass a free-space check and still promise more bytes than the host owns. When tenants later fill those promises, the failure is host-wide ENOSPC, not one isolated noisy volume.

$99 capacity policy

Turn the sparse-volume ENOSPC case into one reusable admission rule.

Use this when a platform creates sparse ext4 images, qcow2 files, local PVs, overlays, or VM data volumes and only checks current host free space at creation time.

sum(provisioned) + overlay reserve <= capacity * overcommit ratio

Read-only evidence

Capture promised capacity, actual host usage, and shared-failure domains.

These checks are safe to share publicly. They do not need tenant data, disk images, database files, secrets, or private logs; the useful signal is capacity accounting and placement policy.

host capacity -> provisioned sum -> reserve -> actual used -> largest promised tenants

Request $99 capacity policy Request $29 incident read

Runbook: Sparse Is A Placement Promise

Do not rely on a point-in-time free-space check. It proves the host can create the file today, not that the host can satisfy all promised bytes later.
Track provisioned bytes per host as a first-class placement metric: every volume size, resize, snapshot reserve, overlay reserve, image cache reserve, and emergency free-space band.
Reject create and resize requests that would push provisioned bytes beyond capacity multiplied by the configured overcommit ratio.
Default strict before launch. An overcommit ratio of 1.0 is easier to relax later than a silent host-wide corruption mode is to repair.
Keep actual free-space checks separate from admission. Admission protects promises; runtime guards protect the emergency reserve when real usage spikes.
Expose metrics for provisioned bytes, actual used bytes, reserve bytes, overcommit ratio, and largest promised tenants.
Offer thick allocation or stricter storage classes for database volumes, write-heavy queues, and other guests where ENOSPC can corrupt state.

Copy-ready issue reply

Use this when sparse volumes can overcommit the host.

This keeps the fix scoped to admission control, telemetry, and an explicit thick-volume escape hatch for stateful workloads.

I would treat this as an admission-control bug rather than only a disk-free check.

The failure mode is that point-in-time host free space is true at create time, but the sum of sparse promises is false over time. Acceptance checks I would add:

- Track provisioned bytes per host: sum(volume.size_mb) plus overlay/snapshot reserve.
- Refuse create/resize when provisioned bytes would exceed capacity * overcommit_ratio minus reserve.
- Keep a separate free-space guard for actual host bytes so existing volumes cannot consume the emergency band.
- Emit metrics for provisioned bytes, actual used bytes, reserve bytes, and largest tenants by promised size.
- Add a regression test with two sparse volumes that each fit individually but exceed host capacity together.
- Offer thick/fallocate mode or stricter class policy for database volumes where correlated ENOSPC is least tolerable.

Request policy review

Paid scope

Turn one sparse-volume incident into a reusable capacity policy.

The $99 policy is for VM platforms, local PV schedulers, CI hosts, developer environments, and self-hosted platforms where sparse volume promises, overlays, snapshots, and database data can share the same host filesystem.

Do Not Treat As One Noisy Tenant

Volumes, overlays, snapshots, image caches, and logs that share the same host filesystem.
Database guests where guest-level ENOSPC can corrupt WAL, page files, or transaction state.
Platforms without a reserve band for host agent logs, cleanup jobs, and emergency repair commands.
Any create/resize path that checks actual free bytes but ignores total promised bytes.