WAL & Crash Resilience
MuroDB uses a write-ahead log (.wal) for crash recovery.
All commits are WAL-first: durable intent is recorded before data-file flush.
.wal Binary Layout
WAL constants are in src/wal/mod.rs:
- magic:
"MUROWAL1"(8 bytes) - version:
u32(current1) - header size: 12 bytes
File layout:
- Header:
[magic:8][version:4] - Repeating frames:
[frame_len: u32][encrypted_payload: frame_len bytes]
frame_len is bounded by MAX_WAL_FRAME_LEN (PAGE_SIZE + 1024).
Encrypted payload format before encryption (src/wal/writer.rs):
record_bytes = WalRecord::serialize(...)payload = record_bytes || crc32(record_bytes)
Encryption uses PageCipher; frame nonce context is (lsn, 0).
WAL Record Types
WalRecord (src/wal/record.rs) variants:
| Record | Payload |
|---|---|
Begin | txid |
PagePut | txid, page_id, full page image bytes |
MetaUpdate | txid, catalog_root, page_count, freelist_page_id, epoch |
Commit | txid, lsn |
Abort | txid |
Record tags on wire:
1=Begin,2=PagePut,3=Commit,4=Abort,5=MetaUpdate
Write Path
Read-Only Query Path (Database::query)
Database::query(sql):
- Acquire shared lock
- Parse/validate read-only statement
- Execute directly on pager/catalog (no implicit WAL transaction)
query is a &mut self API because the session may refresh pager/catalog state from disk before read execution.
For concurrent readers in one process, open additional read-only handles (Database::open_reader) and query from each handle.
If an explicit transaction is active, read statements are executed in the transaction context (execute_in_tx) so uncommitted writes remain visible to that session.
Auto-Commit Mode (no explicit BEGIN)
Session::execute_auto_commit(stmt):
- Create implicit transaction + dirty-page buffer.
- Execute statement against transactional page store.
tx.commit(...)writes:Begin- all dirty
PagePut - freelist
PagePutpages (if needed) MetaUpdateCommit
wal.sync()(fsync) establishes durability boundary.- Flush pages + metadata to main DB file.
Explicit Transaction (BEGIN … COMMIT)
Explicit transaction (BEGIN ... COMMIT) follows the same commit primitive.
ROLLBACK discards dirty state without WAL append (rollback_no_wal in session path).
Commit Point
Durability commit point is WAL fsync:
- before
wal.sync(): commit may be lost on crash - after
wal.sync(): commit must be recoverable even if DB flush fails
If post-sync DB flush fails, transaction returns CommitInDoubt, session is poisoned, and next open recovers from WAL.
Recovery (Database::open)
Database::open(path, master_key)
1. If WAL file exists, run recovery::recover()
→ Scan WAL and validate per-tx state machine
(Begin -> PagePut/MetaUpdate* -> Commit/Abort)
→ Collect latest page images from committed transactions
→ Replay to data file
2. Truncate WAL file (empty it)
→ fsync WAL file
→ best-effort fsync parent directory
3. Build Session with Pager + Catalog + WalWriter
Validation is implemented in src/wal/recovery.rs with explicit skip/error codes.
Recovery Modes
- strict (default): Fails on any WAL protocol violation
- permissive: Skips invalid transactions, recovers only valid committed ones
See Recovery for user-facing documentation.
In permissive mode, if invalid transactions were skipped, WAL can be quarantined (*.quarantine.<ts>.<pid>) before reopening a clean WAL stream.
Inspect-WAL JSON Contract
murodb-wal-inspect --format json returns machine-readable diagnostics with a stable schema contract:
schema_version=1for the current contractstatus:ok/warning/fatalexit_code: mirrors CLI exit code semantics (0,10,20)skipped[].code: stable machine-readable skip classification- On fatal failures,
fatal_errorandfatal_error_codeare included
Secondary Index Consistency
All index updates happen within the same transaction as the data update:
INSERT
- Insert row into data B-tree
- Insert entry into each secondary index (column_value → PK)
- Check UNIQUE constraint before insertion
DELETE
- Scan for rows to delete (collect PK + all column values)
- Delete entries from each secondary index
- Delete row from data B-tree
UPDATE
- Scan for rows to update (collect PK + old column values)
- Compute new values
- Check UNIQUE constraints (for changed values)
- Update secondary indexes (delete old entry + insert new entry)
- Write new row data to data B-tree
Remaining Constraints
fsync granularity
Pager::write_page_to_disk() does not call sync_all() individually. Only flush_meta() calls sync_all(). WAL sync() guarantees data durability, so this is safe in normal operation.
allocate_page counter
Pager::allocate_page() increments in-memory page_count, which is not persisted until flush_meta() after WAL commit.
WAL file size
After successful commits and explicit ROLLBACK, the Session auto-checkpoints the WAL according to policy. Checkpoint is best-effort and does not affect commit success.
Default policy is per-transaction (MURODB_CHECKPOINT_TX_THRESHOLD=1), and can be tuned with:
MURODB_CHECKPOINT_TX_THRESHOLDMURODB_CHECKPOINT_WAL_BYTES_THRESHOLDMURODB_CHECKPOINT_INTERVAL_MS
The same knobs are available as session-scoped SQL runtime options:
SET checkpoint_tx_threshold = <u64>SET checkpoint_wal_bytes_threshold = <u64>SET checkpoint_interval_ms = <u64>
When checkpoint truncate fails, MuroDB emits a warning with wal_path and wal_size_bytes so operators can detect and triage WAL growth.
TLA+ Correspondence
See Formal Verification for the TLA+ model and its mapping to implementation.
| TLA+ Intent | Implementation | Regression Test |
|---|---|---|
| Only valid state transitions are recovered | State transition validation in recovery.rs | test_recovery_rejects_pageput_before_begin |
| Commit/Abort is terminal | Reject duplicate terminal / post-terminal records | test_recovery_rejects_duplicate_terminal_record_for_tx |
| Commit has consistent terminal info | Validate Commit.lsn == actual LSN | test_recovery_rejects_commit_lsn_mismatch |
| Commit requires metadata | Reject Commit without MetaUpdate | test_recovery_rejects_commit_without_meta_update |
| PagePut matches target page | Validate PagePut.page_id vs page header | test_recovery_rejects_pageput_page_id_mismatch |
| Tail corruption tolerated, mid-log rejected | Reader tolerates tail only | test_tail_truncation_tolerated, test_mid_log_corruption_is_error |
| Oversized frames handled safely | Frame length limit in Reader/Writer | test_oversized_tail_frame_tolerated |
| Freelist recovered from committed MetaUpdate | freelist_page_id in WAL MetaUpdate | test_freelist_wal_recovery |