Roadmap
Implemented
- Basic CRUD (INSERT, SELECT, UPDATE, DELETE)
- CREATE TABLE (PRIMARY KEY, UNIQUE, NOT NULL)
- CREATE INDEX / CREATE UNIQUE INDEX (single column)
- CREATE FULLTEXT INDEX (bigram, BM25, NATURAL/BOOLEAN mode, snippet)
- MySQL-compatible integer types (TINYINT, SMALLINT, INT, BIGINT)
- VARCHAR(n), VARBINARY(n), TEXT with size validation
- UUID type (16-byte native, UUID_V4/UUID_V7 generation)
- Hex literal (
X'...') for VARBINARY data - WHERE with comparison operators (=, !=, <, >, <=, >=)
- AND, OR logical operators
- ORDER BY (ASC/DESC, multi-column), LIMIT
- JOIN (INNER, LEFT, CROSS) with table aliases
- BEGIN / COMMIT / ROLLBACK
- SHOW TABLES
- Multi-row INSERT
- Hidden _rowid auto-generation for tables without explicit PK
- AES-256-GCM-SIV encryption, Argon2 KDF
- WAL-based crash recovery
- CLI with REPL
- DROP TABLE / DROP TABLE IF EXISTS
- DROP INDEX
- IF NOT EXISTS for CREATE TABLE / CREATE INDEX
- SHOW CREATE TABLE
- DESCRIBE / DESC table
- LIKE / NOT LIKE (% and _ wildcards)
- IN (value list)
- BETWEEN … AND …
- IS NULL / IS NOT NULL
- NOT operator (general)
- OFFSET (SELECT … LIMIT n OFFSET m)
- DEFAULT column values
- AUTO_INCREMENT
- Arithmetic operators in expressions (+, -, *, /, %)
- BOOLEAN type (alias for TINYINT)
- CHECK constraint
Phase 2 — Built-in Functions ✓
MySQL-compatible scalar functions.
- String: LENGTH, CHAR_LENGTH, CONCAT, SUBSTRING/SUBSTR, UPPER, LOWER
- String: TRIM, LTRIM, RTRIM, REPLACE, REVERSE, REPEAT
- String: LEFT, RIGHT, LPAD, RPAD, INSTR/LOCATE
- String: REGEXP / REGEXP_LIKE
- Numeric: ABS, CEIL/CEILING, FLOOR, ROUND, MOD, POWER/POW
- NULL handling: COALESCE, IFNULL, NULLIF, IF
- Type conversion: CAST(expr AS type)
- CASE WHEN … THEN … ELSE … END
Phase 3 — Aggregation & Grouping ✓
- COUNT, SUM, AVG, MIN, MAX
- COUNT(DISTINCT …)
- GROUP BY (single and multiple columns)
- HAVING
- SELECT DISTINCT
Phase 4 — Schema Evolution ✓
- ALTER TABLE ADD COLUMN
- ALTER TABLE DROP COLUMN
- ALTER TABLE MODIFY COLUMN / CHANGE COLUMN
- RENAME TABLE
- Composite PRIMARY KEY
- Composite UNIQUE / composite INDEX
Phase 5 — Advanced Query ✓
- Subqueries (WHERE col IN (SELECT …), scalar subquery)
- UNION / UNION ALL
- EXISTS / NOT EXISTS
- INSERT … ON DUPLICATE KEY UPDATE
- REPLACE INTO
- EXPLAIN (query plan display)
- RIGHT JOIN
- Shared-lock read path (
Database::query) with CLI auto routing
Phase 6 — Types & Storage
- FLOAT / DOUBLE
- DATE, DATETIME, TIMESTAMP
- Scope: fully align parser/executor/CAST/default/literal behavior and edge-case validation.
- Done when:
- Temporal literals and string casts behave consistently across INSERT/UPDATE/WHERE.
- Arithmetic and comparison semantics are defined/documented for mixed temporal expressions.
- Timezone handling policy is explicit (especially TIMESTAMP input/output normalization).
- Invalid dates/times reject with deterministic errors.
- Date/time functions: NOW, CURRENT_TIMESTAMP, DATE_FORMAT, etc.
- UUID type with UUID_V4() and UUID_V7() generation functions
- DECIMAL(p,s) / NUMERIC(p,s) fixed-point exact numeric type
- 96-bit mantissa via
rust_decimal, precision 1-28, 16-byte storage - Full arithmetic, comparison, CAST, aggregation (SUM/AVG/MIN/MAX), ORDER BY, GROUP BY, INDEX support
- MySQL-compatible: NUMERIC alias, default DECIMAL(10,0), DECIMAL+INT→DECIMAL, DECIMAL+FLOAT→FLOAT
- 96-bit mantissa via
- BLOB (skipped for now)
- Decision (2026-02-22): defer and move focus to Phase 7 performance work.
- Why skipped now:
- Current product priorities are query/index performance and planner improvements, not large-object type expansion.
BLOBadds non-trivial storage/operational surface area (limits, indexing semantics, comparison behavior) with low near-term user impact.- Existing
VARBINARY(n)/TEXTcoverage is sufficient for current workloads.
- Revisit when:
- There is a concrete workload requiring large binary payloads that cannot be handled acceptably by current types.
- The performance roadmap items in Phase 7 are complete or no longer the bottleneck.
- Overflow pages (posting list > 4096B)
- Scope: support values/postings that exceed single-page capacity.
- Progress:
- Implemented FTS segment overflow chains (
__segovf__) with typed page format (OFG1). - Read/write/delete + vacuum path now reclaims overflow pages without orphaning.
- Covered by unit/integration tests (
cargo testgreen as of 2026-02-22). - Added WAL recovery integration tests for overflow chains (torn WAL tail and post-sync partial-write replay paths).
- Benchmarked on 2026-02-22 (
murodb_bench, commit829ad18145c2) with no severe small-record regression signal. - Implemented B-tree value overflow pages (2026-02-23): large row values (>~4073 bytes) now spill to overflow page chains transparently. Format version bumped to 5 (backward-compatible with v4).
- Implemented FTS segment overflow chains (
- Done when:
- Overflow chain format is versioned and crash-safe.
- WAL/recovery covers partial-write and torn-tail scenarios for overflow chains.
- Vacuum/reclaim path correctly frees overflow pages without orphaning.
- Benchmarks show no severe regressions for small records.
Phase 7 — Performance & Internals
- Auto-checkpoint (threshold-based WAL)
- Composite index range scan
- Progress:
- Added planner/executor support for composite-index range seek on the last key part (e.g.
(a,b)witha = ?andbrange). - EXPLAIN now reports
type=rangefor this access path. - EXPLAIN now reports estimated cardinality via
rows.
- Added planner/executor support for composite-index range seek on the last key part (e.g.
- Done when:
- Multi-column prefix ranges (
(a,b)with predicates ona, optional range onb) use index scan. - EXPLAIN shows index-range choice and estimated cardinality.
- Fallback path remains correct for unsupported predicate shapes.
- Multi-column prefix ranges (
- Progress:
- Query optimizer improvements (cost-based)
- Progress:
- Added deterministic heuristic cost hints for
PkSeek/IndexSeek/IndexRangeSeek/FullScan. - Planner now compares index candidates by cost instead of choosing the first matching index.
- EXPLAIN now reports a
costcolumn for the chosen plan. - Added persisted stats via
ANALYZE TABLE(table_rows,index_distinct_keys) in catalog metadata. - EXPLAIN row estimation now prefers persisted
table_rowswhen available. - Planner cost model now incorporates persisted
table_rows/index_distinct_keyswhen available, with conservative fallback selectivity when stats are missing. - EXPLAIN
rows/costnow uses the same planner estimation logic (with table-row fallback), so estimates reflect planner tradeoffs. - JOIN loop-order choice for
INNER/CROSSnow uses planner-side estimated row counts (stats-aware with runtime fallback) and keeps row shape (left + right) stable. ANALYZE TABLEnow persists numeric min/max bounds and equal-width histogram bins for single-column numeric B-tree indexes; range row estimation uses these stats when available.- EXPLAIN for JOIN now reports nested-loop outer-side choice with estimated left/right row counts in
Extra.
- Added deterministic heuristic cost hints for
- Done when:
- Planner compares at least full-scan vs single-index vs join-order alternatives.
- Basic column stats/histograms are persisted and refreshable.
- Plan choice is deterministic under identical stats.
- Progress:
- FTS stop-ngram filtering
- Progress:
- Added FULLTEXT options
stop_filterandstop_df_ratio_ppm(ppm threshold). - NATURAL LANGUAGE MODE now supports skipping high-DF ngrams when enabled.
- Default remains OFF for exact-behavior compatibility.
- Recall/precision tradeoff example documented in Full-Text Search guide.
- Added FULLTEXT options
- Done when:
- Frequent low-information ngrams are skipped using configurable thresholds.
- Recall/precision tradeoff is documented with benchmark examples.
- Toggle exists for exact behavior compatibility.
- Progress:
- fts_snippet acceleration (pos-to-offset map)
- Progress:
- Replaced snippet byte/char conversion loops with a UTF-8 position-to-offset map plus binary search.
- Snippet assembly now slices by byte ranges instead of repeatedly collecting char vectors.
- Added dedicated benchmark runner (
murodb_snippet_bench) with legacy-vs-new comparison and offset-map memory estimate. - On 2026-02-22 (local, release build), long-text tail-hit case showed small p50 improvement (legacy
1245.52us-> new1228.43us).
- Done when:
- Snippet generation avoids repeated UTF-8 rescans for long docs.
- Latency improvement is measured and documented on representative datasets.
- Memory overhead remains bounded and observable.
- Progress:
Phase 8 — Security (Future)
- Key rotation (epoch-based re-encryption)
- Implemented API-based rekey (
Database::rekey_with_password) for full page re-encryption. - New random salt generated on each rotation; epoch incremented.
- Crash-safe via
.rekeymarker file with automatic recovery on next open. - Rejects inside transactions and on plaintext databases.
- Implemented API-based rekey (
Phase 9 — Practical Embedded DB (Next)
Real-world deployment features to make MuroDB easier to embed and operate.
- Encryption OFF mode
- Motivation: some embedded deployments prefer CPU savings and rely on disk/host-level protection.
- Done when:
- DB format can be created/opened in explicit plaintext mode.
- File header clearly records mode to avoid accidental mis-open.
- CLI/API require explicit opt-in (no silent downgrade from encrypted DB).
- Pluggable encryption suite
- Motivation: allow policy-driven algorithm choice without forking storage engine.
- Done when:
- Algorithm + KDF are selected by explicit config at DB creation.
- Supported suites are versioned, discoverable, and recorded in metadata.
- Wrong-suite open errors are deterministic and actionable.
- Rekey / algorithm migration
- Rekey implemented via API (
Database::rekey_with_password) and dedicated CLI (murodb-rekey). - Crash-recoverable via
.rekeymarker file. - Algorithm migration (cipher suite change) deferred to future work.
- Rekey implemented via API (
- Backup API + consistent snapshot
- Decision (2026-02-22):
- Prioritize early in Phase 9 so embedded apps can take consistent backups without full writer quiesce windows.
- Why now:
- File-copy backup while writes are active is error-prone operationally.
- A first-class API can provide deterministic snapshot semantics and simpler restore contracts.
- Done when:
- Online consistent backup without long writer stalls.
- Restore path validated by integration tests.
- Snapshot metadata includes format/security parameters.
- Decision (2026-02-22):
- Operational limits and safeguards
- Done when:
- Configurable caps for DB file size, WAL size, statement timeout, and memory budget.
- Error surfaces are clear and machine-parseable for host applications.
- Default limits are documented with recommended profiles (edge device / server / CI).
- Done when: