AI agent storage incident
Codex SQLite WAL Disk Full On Linux
If `~/.codex/logs_2.sqlite-wal` grows to GBs or hundreds of GBs, deleting the visible WAL may not free space. Stale or suspended Codex TUI sessions can keep deleted WAL file descriptors open, so `du ~/.codex` looks small while `df -h` still says the filesystem is full.
Prove whether space is held by a deleted WAL inode.
The first pass is read-only: compare `du` and `df`, list deleted open files, identify stale Codex PIDs, then checkpoint only after stale readers are gone.
du -xsh ~/.codex; df -h "$HOME"; lsof -nP +L1 | sort -nr
What is happening
SQLite WAL files are normal, but they need checkpointing. Long-lived readers can prevent truncation. If a stale Codex process still holds a deleted `logs_2.sqlite-wal`, the directory size may look fixed while the filesystem remains full because the deleted inode is still allocated.
Safe recovery order
- Stop starting new Codex sessions while the filesystem is near full.
- Use `lsof +L1` to prove whether deleted WAL or SHM files are still open.
- Identify stale or suspended Codex processes with `ps`, `tmux ls`, and `fuser`.
- Exit stale sessions cleanly if possible; otherwise terminate only the stale PIDs that hold the deleted WAL.
- Run a SQLite checkpoint only after readers release the database.
- Re-check `df -h`, `du -xsh ~/.codex`, and `lsof +L1` before deleting anything else.
Use this when `du` and `df` disagree.
The runbook separates visible file size from deleted-open-inode allocation, then checkpoints the WAL after stale readers are gone.
du -xsh "$HOME/.codex" 2>/dev/null
df -h "$HOME"
lsof -nP +L1 2>/dev/null | awk 'NR>1 && $7 ~ /^[0-9]+$/ && $7 > 1000000000 {print $7, $2, $1, $4, $9}' | sort -nr | head -40 | numfmt --field=1 --to=iec --suffix=B
fuser "$HOME/.codex/logs_2.sqlite" 2>/dev/null
sqlite3 "$HOME/.codex/logs_2.sqlite" "PRAGMA wal_checkpoint(TRUNCATE);"
Do Not Delete First
- Do not keep deleting visible WAL files if `lsof +L1` shows deleted inodes are still open.
- Do not remove all of `~/.codex` before exporting or backing up session state you care about.
- Do not run broad cache cleaners while the issue is actually a live file descriptor problem.
- Do not kill every shell or tmux process; target the stale Codex PIDs that hold the deleted WAL.
Turn this recovery into a team-safe agent storage policy.
The $99 policy is for teams running Codex, agent CLIs, tmux sessions, or long-lived SQLite-backed tools on shared Linux workstations and build hosts. You get the deleted-inode recovery runbook, WAL checkpoint rules, stale-process guardrails, and monitoring thresholds for one representative environment.
No mail app or GitHub login? Send this directly from any inbox.
liuminsheng3@gmail.com - SafeDisk Codex WAL Recovery Payment Link