Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Full-Text Search

MuroDB provides MySQL-compatible full-text search with bigram tokenization.

Creating a fulltext index

CREATE FULLTEXT INDEX t_body_fts ON t(body)
  WITH PARSER ngram
  OPTIONS (n=2, normalize='nfkc', stop_filter=off, stop_df_ratio_ppm=200000);

WITH PARSER / OPTIONS syntax is available so parser variants can be expanded in future releases.

FTS uses an internal doc_id mapping, so it works with non-BIGINT primary keys too. If a table has no explicit primary key, MuroDB’s hidden _rowid is used.

Supported options:

  • n: ngram size (2 only for now)
  • normalize: normalization mode ('nfkc' only for now)
  • stop_filter: on/off (or 1/0, 'true'/'false')
  • stop_df_ratio_ppm: document-frequency threshold in ppm (0..=1000000)

stop_filter applies to NATURAL LANGUAGE MODE only. BOOLEAN MODE behavior is unchanged.

Query semantics

NATURAL LANGUAGE MODE

  • BM25-based relevance ranking.
SELECT id, MATCH(body) AGAINST('東京タワー' IN NATURAL LANGUAGE MODE) AS score
FROM t
WHERE MATCH(body) AGAINST('東京タワー' IN NATURAL LANGUAGE MODE) > 0
ORDER BY score DESC
LIMIT 20;

With stop-ngram filtering enabled:

CREATE FULLTEXT INDEX t_body_fts ON t(body)
  WITH PARSER ngram
  OPTIONS (n=2, normalize='nfkc', stop_filter=on, stop_df_ratio_ppm=500000);

This skips very frequent low-information ngrams during scoring.

BOOLEAN MODE

Supports +term (required), -term (excluded), and "phrase" (exact phrase).

SELECT id
FROM t
WHERE MATCH(body) AGAINST('"東京タワー" +夜景 -混雑' IN BOOLEAN MODE) > 0;
OperatorMeaningExample
+termTerm must be present+東京
-termTerm must not be present-混雑
"phrase"Exact phrase match"東京タワー"
termOptional (score contribution)夜景

Snippet helper

Use fts_snippet() for highlighted excerpts.

SELECT id,
  fts_snippet(body, '"東京タワー"', '<mark>', '</mark>', 30) AS snippet
FROM t
WHERE MATCH(body) AGAINST('"東京タワー"' IN BOOLEAN MODE) > 0
LIMIT 10;

Recall/precision tradeoff example

Dataset:

  • doc1: 東京タワー
  • doc2: 東京駅
  • doc3: 東京大学
  • doc4: 東京ドーム

Query:

MATCH(body) AGAINST('東京タワー' IN NATURAL LANGUAGE MODE)

Observed behavior:

  • stop_filter=off: broad recall (multiple 東京* docs can match)
  • stop_filter=on, stop_df_ratio_ppm=500000: higher precision (mostly 東京タワー doc remains)

Internal design

See FTS Internals for implementation details.