Database Low Disk Alert Severity
A low-disk alert is not informational when the volume holds database files, WAL/binlogs, temp space, or backup staging. Severity should reflect remaining write runway, not just the metric name.
Route low disk by remaining runway: warning first, critical near the cliff.
Use this when a database dashboard, SQL Server monitor, webhook, Teams alert, or ops card labels low free space as INFO even when the remaining disk could stop writes.
WARNING below policy threshold; CRITICAL below hard floor or short runway
Capture volume free space, write owners, and routing behavior.
These checks stay public-safe. They do not need table contents, credentials, or private logs; only the volume, threshold, current free space, and the alert path.
volume free -> absolute GB -> percent -> write owners -> routing tier
Runbook: Severity Is A Routing Contract
- Do not key severity only by metric name. Low disk needs the current level: threshold breach, hard floor breach, and write runway.
- Use both free percent and absolute GB. A 4% free value can be safe on a large archive volume but critical on a small log volume; 2 GB free can be critical even when percent looks acceptable.
- Define at least two thresholds: WARNING for policy breach, CRITICAL for near-write-failure or less than one maintenance window of runway.
- Do not suppress the first transition into CRITICAL, even if the low-disk gate only notifies on worsening breaches.
- Route severity consistently: INFO to trend dashboards, WARNING to tickets/team channel, CRITICAL to paging or urgent incident queue.
- Include owner context in the alert: data, logs/WAL/binlogs, temp, backups, snapshots, or monitor-generated logs.
- Test with example volumes so downstream webhooks, Teams/Slack cards, and filters see the right severity.
Use this when a low-disk alert falls through to INFO.
This keeps the fix scoped: map the metric, pass the severity level, and protect critical transitions from suppression.
I agree this should not fall through to INFO. For a database volume, low disk is a capacity and availability risk, and severity should reflect the current level rather than only the metric name.
Acceptance checks I would add:
- Volume Free Space below the configured threshold renders at least WARNING.
- A second hard floor, such as 2-3% or 1-2 GB free, renders CRITICAL.
- The alert includes both percent free and absolute GB free.
- The first transition into CRITICAL is emitted even if the low-disk gate suppresses repeated non-worsening breaches.
- Downstream webhook/card routing receives the same severity that the UI displays.
- Tests cover the current example: 4% free / 66 GB on a thresholded volume should not be INFO.
Turn one low-disk alert into a reusable database capacity policy.
The $99 policy is for SQL Server, Postgres, MySQL, backup, monitoring, and self-hosted app teams that need severity thresholds, routing rules, and acceptance tests before disk-full turns into a write outage.
Do Not Treat As Info
- Database log, WAL, binlog, or temp volumes that can stop writes.
- Backup staging volumes where the next job can publish a partial or failed artifact.
- Monitoring-generated logs that can fill the same host they are supposed to protect.
- Any volume whose free space is below one maintenance window of write runway.