前言
「搜尋」是幾乎每個應用程式都需要的功能,但很多開發者對搜尋的理解還停留在 LIKE '%keyword%' 的層次。當你的使用者輸入「如何部署 Docker 容器」,你的系統能不能找到一篇標題叫「Docker Container 部署指南」的文章?這就是全文搜尋(Full-Text Search, FTS)要解決的問題。
這篇文章我會比較三個常見的全文搜尋方案:PostgreSQL 內建的 FTS、Elasticsearch 和 MeiliSearch。它們各有不同的定位和適用場景,希望能幫你做出合適的選擇。
為什麼 LIKE 不夠用?
先看一個例子:
-- 傳統的 LIKE 搜尋
SELECT * FROM articles WHERE title LIKE '%Docker%' OR content LIKE '%Docker%';
這個查詢的問題:
- 效能差:
LIKE '%keyword%'無法使用索引,必須全表掃描 - 不支援語言處理:搜尋 “running” 找不到 “run”(詞幹提取)
- 沒有相關性排序:結果沒有優先順序
- 不支援同義詞:搜尋 “container” 找不到 “Docker”
- 對中文尤其無力:沒有分詞能力
PostgreSQL Full-Text Search
PostgreSQL 內建的全文搜尋功能比大多數人想像的要強大得多。如果你已經在用 PostgreSQL,這是最低成本的方案。
基本概念
PostgreSQL FTS 的核心是兩個資料類型:
- tsvector:文件的「搜尋向量」,儲存分詞和位置資訊
- tsquery:搜尋查詢的結構化表示
-- 看看 tsvector 長什麼樣
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 結果: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
-- 注意:停用詞 (the, over) 被移除了,jumps → jump, lazy → lazi(詞幹提取)
-- tsquery
SELECT to_tsquery('english', 'quick & fox');
-- 結果: 'quick' & 'fox'
-- 比對
SELECT to_tsvector('english', 'The quick brown fox') @@
to_tsquery('english', 'quick & fox');
-- 結果: true
建立全文搜尋
-- 建立文章表
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
author TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
-- 儲存預先計算的搜尋向量
search_vector tsvector
);
-- 用觸發器自動更新搜尋向量
CREATE OR REPLACE FUNCTION articles_search_trigger() RETURNS trigger AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', COALESCE(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(NEW.content, '')), 'B');
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER trig_articles_search
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();
-- 建立 GIN 索引
CREATE INDEX idx_articles_search ON articles USING gin(search_vector);
-- 插入測試資料
INSERT INTO articles (title, content, author) VALUES
('Docker Container Deployment Guide',
'This guide covers how to deploy applications using Docker containers. Docker provides a consistent environment for running applications.',
'Alice'),
('Kubernetes Orchestration Basics',
'Learn about Kubernetes and how it orchestrates Docker containers at scale. Kubernetes manages container deployment and scaling.',
'Bob'),
('PostgreSQL Performance Tuning',
'Tips for optimizing PostgreSQL database performance including indexing strategies and query optimization.',
'Carol');
搜尋與排名
-- 基本搜尋
SELECT id, title,
ts_rank(search_vector, query) AS rank
FROM articles,
to_tsquery('english', 'docker & container') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- 片語搜尋
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'docker containers');
-- 模糊搜尋(OR 語意)
SELECT title,
ts_rank_cd(search_vector, query) AS rank
FROM articles,
plainto_tsquery('english', 'docker deployment kubernetes') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- 搜尋結果高亮
SELECT
ts_headline('english', title, query,
'StartSel=<b>, StopSel=</b>, MaxWords=50') AS highlighted_title,
ts_headline('english', content, query,
'StartSel=<b>, StopSel=</b>, MaxFragments=2, MaxWords=30') AS snippet
FROM articles,
to_tsquery('english', 'docker & deploy') AS query
WHERE search_vector @@ query;
中文支援
PostgreSQL 預設不支援中文分詞,需要安裝擴充:
-- 方案一:使用 zhparser(基於 SCWS)
CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION chinese
ADD MAPPING FOR n,v,a,i,e,l WITH simple;
-- 使用中文配置
SELECT to_tsvector('chinese', '全文搜尋引擎的效能比較');
-- 方案二:使用 pg_jieba(基於 jieba 分詞)
CREATE EXTENSION pg_jieba;
SELECT to_tsvector('jiebacfg', '如何使用 Docker 部署應用程式');
Elasticsearch
Elasticsearch 是全文搜尋的業界標準,基於 Apache Lucene 引擎。功能最強大,但也最重。
安裝
# docker-compose.yml
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
ports:
- "9200:9200"
environment:
discovery.type: single-node
xpack.security.enabled: "false"
ES_JAVA_OPTS: "-Xms512m -Xmx512m"
volumes:
- es_data:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:8.11.3
ports:
- "5601:5601"
depends_on:
- elasticsearch
volumes:
es_data:
建立索引和寫入文件
from elasticsearch import Elasticsearch
es = Elasticsearch("http://localhost:9200")
# 建立索引(定義 mapping)
es.indices.create(index="articles", body={
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop", "snowball"]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "custom_analyzer",
"boost": 2.0 # 標題權重較高
},
"content": {
"type": "text",
"analyzer": "custom_analyzer"
},
"author": {
"type": "keyword" # 精確匹配
},
"tags": {
"type": "keyword"
},
"created_at": {
"type": "date"
}
}
}
})
# 寫入文件
docs = [
{
"title": "Docker Container Deployment Guide",
"content": "This guide covers how to deploy applications using Docker containers.",
"author": "Alice",
"tags": ["docker", "devops"],
"created_at": "2024-01-15"
},
{
"title": "Kubernetes Orchestration Basics",
"content": "Learn about Kubernetes and how it orchestrates Docker containers at scale.",
"author": "Bob",
"tags": ["kubernetes", "docker"],
"created_at": "2024-01-20"
},
{
"title": "PostgreSQL Performance Tuning",
"content": "Tips for optimizing PostgreSQL database performance.",
"author": "Carol",
"tags": ["postgresql", "database"],
"created_at": "2024-02-01"
}
]
for i, doc in enumerate(docs):
es.index(index="articles", id=i+1, body=doc)
es.indices.refresh(index="articles")
搜尋查詢
# 全文搜尋
result = es.search(index="articles", body={
"query": {
"multi_match": {
"query": "docker deployment",
"fields": ["title^2", "content"], # 標題權重 x2
"type": "best_fields",
"fuzziness": "AUTO" # 模糊搜尋
}
},
"highlight": {
"fields": {
"title": {},
"content": {
"fragment_size": 150,
"number_of_fragments": 2
}
}
}
})
for hit in result["hits"]["hits"]:
print(f"[{hit['_score']:.2f}] {hit['_source']['title']}")
if "highlight" in hit:
for fragment in hit["highlight"].get("content", []):
print(f" ...{fragment}...")
# 複合查詢:全文搜尋 + 過濾條件
result = es.search(index="articles", body={
"query": {
"bool": {
"must": {
"multi_match": {
"query": "docker",
"fields": ["title", "content"]
}
},
"filter": [
{"term": {"author": "Alice"}},
{"range": {"created_at": {"gte": "2024-01-01"}}}
]
}
},
"sort": [
{"_score": "desc"},
{"created_at": "desc"}
]
})
# 聚合查詢:按標籤統計文章數
result = es.search(index="articles", body={
"size": 0,
"aggs": {
"tag_counts": {
"terms": {
"field": "tags",
"size": 20
}
}
}
})
for bucket in result["aggregations"]["tag_counts"]["buckets"]:
print(f"{bucket['key']}: {bucket['doc_count']} articles")
中文搜尋
# Elasticsearch 中文搜尋需要安裝 IK 分詞器
# 在 Dockerfile 中:
# RUN elasticsearch-plugin install analysis-ik
# 建立中文索引
es.indices.create(index="articles_zh", body={
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word", # 寫入時細粒度分詞
"search_analyzer": "ik_smart" # 搜尋時智能分詞
},
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}
})
MeiliSearch
MeiliSearch 是近年竄起的輕量級搜尋引擎,主打「即時搜尋」和「開箱即用」。用 Rust 寫成,效能好、設定簡單。
安裝
# docker-compose.yml
version: '3.8'
services:
meilisearch:
image: getmeili/meilisearch:v1.5
ports:
- "7700:7700"
environment:
MEILI_MASTER_KEY: "masterKey123"
volumes:
- meili_data:/meili_data
volumes:
meili_data:
使用 Python SDK
import meilisearch
client = meilisearch.Client("http://localhost:7700", "masterKey123")
# 建立索引
index = client.index("articles")
# 設定搜尋屬性
index.update_settings({
"searchableAttributes": ["title", "content", "tags"],
"filterableAttributes": ["author", "tags", "created_at"],
"sortableAttributes": ["created_at"],
"rankingRules": [
"words", # 匹配的詞數
"typo", # 容錯
"proximity", # 詞的接近程度
"attribute", # 屬性權重
"sort", # 排序
"exactness" # 精確度
]
})
# 寫入文件
documents = [
{
"id": 1,
"title": "Docker Container Deployment Guide",
"content": "This guide covers how to deploy applications using Docker.",
"author": "Alice",
"tags": ["docker", "devops"],
"created_at": "2024-01-15"
},
{
"id": 2,
"title": "Kubernetes Orchestration Basics",
"content": "Learn about Kubernetes and container orchestration.",
"author": "Bob",
"tags": ["kubernetes", "docker"],
"created_at": "2024-01-20"
},
{
"id": 3,
"title": "PostgreSQL Performance Tuning",
"content": "Tips for optimizing PostgreSQL database performance.",
"author": "Carol",
"tags": ["postgresql", "database"],
"created_at": "2024-02-01"
}
]
index.add_documents(documents)
搜尋
# 基本搜尋(自帶模糊匹配和容錯)
results = index.search("dokcer deplment") # 故意打錯,MeiliSearch 會自動修正
print(f"Hits: {results['estimatedTotalHits']}")
for hit in results["hits"]:
print(f" {hit['title']}")
# 帶過濾和排序的搜尋
results = index.search("docker", {
"filter": "author = 'Alice'",
"sort": ["created_at:desc"],
"limit": 10,
"attributesToHighlight": ["title", "content"],
"highlightPreTag": "<em>",
"highlightPostTag": "</em>"
})
for hit in results["hits"]:
formatted = hit.get("_formatted", {})
print(f" {formatted.get('title', hit['title'])}")
# 多條件過濾
results = index.search("", {
"filter": "tags = 'docker' AND created_at > '2024-01-15'"
})
# 分面搜尋(faceted search)
results = index.search("docker", {
"facets": ["author", "tags"]
})
print(results["facetDistribution"])
# {'author': {'Alice': 1, 'Bob': 1}, 'tags': {'docker': 2, 'devops': 1, ...}}
MeiliSearch 的殺手級功能是 typo tolerance(容錯搜尋),使用者打錯字也能找到正確結果,而且幾乎不需要設定。
三者比較
| 面向 | PostgreSQL FTS | Elasticsearch | MeiliSearch |
|——|—————|—————|————-|
| 架構 | 資料庫內建功能 | 獨立搜尋引擎 | 獨立搜尋引擎 |
| 底層引擎 | PostgreSQL | Lucene | 自建(Rust) |
| 設定難度 | 低 | 高 | 極低 |
| 資源佔用 | 低 | 高(JVM) | 低 |
| 搜尋品質 | 中 | 高 | 高 |
| 容錯搜尋 | 弱 | 需設定 | 開箱即用 |
| 中文支援 | 需擴充 | 需插件(IK) | 內建(尚可) |
| 聚合分析 | 用 SQL | 非常強大 | 基本 facet |
| 即時搜尋 | 較慢 | 可以 | 最佳 |
| 叢集支援 | 不支援 | 完整 | MeiliSearch Cloud |
| 適合資料量 | < 百萬 | 任意 | < 千萬 |
選擇建議
- 已用 PostgreSQL、資料量不大、不想多一個服務 → PostgreSQL FTS
- 大量資料、需要複雜聚合分析、企業級需求 → Elasticsearch
- 前端即時搜尋、重視使用者體驗、快速上線 → MeiliSearch
- 中文搜尋品質是重點 → Elasticsearch + IK 分詞器
實戰:為部落格加上搜尋功能
以下是一個用 PostgreSQL FTS 為部落格加搜尋的完整範例:
from flask import Flask, request, jsonify
import psycopg2
from psycopg2.extras import RealDictCursor
app = Flask(__name__)
def get_db():
return psycopg2.connect(
"dbname=blog user=blogadmin password=secret",
cursor_factory=RealDictCursor
)
@app.route("/api/search")
def search():
query = request.args.get("q", "").strip()
page = int(request.args.get("page", 1))
per_page = int(request.args.get("per_page", 10))
if not query:
return jsonify({"results": [], "total": 0})
conn = get_db()
cur = conn.cursor()
# 將使用者輸入轉成 tsquery
# plainto_tsquery 會自動用 AND 連接多個詞
cur.execute("""
WITH search AS (
SELECT
id, title, content, created_at,
ts_rank(search_vector, query) AS rank,
ts_headline('english', content, query,
'StartSel=<mark>, StopSel=</mark>,
MaxFragments=2, MaxWords=30') AS snippet
FROM articles, plainto_tsquery('english', %s) AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT %s OFFSET %s
)
SELECT , (SELECT COUNT() FROM articles, plainto_tsquery('english', %s) AS q
WHERE search_vector @@ q) AS total
FROM search
""", (query, per_page, (page - 1) * per_page, query))
results = cur.fetchall()
total = results[0]["total"] if results else 0
return jsonify({
"query": query,
"results": [{
"id": r["id"],
"title": r["title"],
"snippet": r["snippet"],
"rank": float(r["rank"]),
"created_at": r["created_at"].isoformat()
} for r in results],
"total": total,
"page": page,
"per_page": per_page
})
if __name__ == "__main__":
app.run(debug=True)
小結
全文搜尋是一個看似簡單、實則充滿細節的領域。好的搜尋體驗需要考慮分詞、相關性排名、容錯、同義詞、中文處理等等因素。
我的務實建議是:先用 PostgreSQL FTS 把搜尋功能做出來。如果搜尋品質或效能不能滿足需求,再考慮引入 MeiliSearch(輕量)或 Elasticsearch(重量)。不要一開始就上最重的方案——很多時候,PostgreSQL FTS 搭配好的索引和權重設定,就已經足夠好了。
延伸閱讀
- PostgreSQL Full Text Search 官方文件
- Elasticsearch 官方文件
- MeiliSearch 官方文件
- Typesense — 另一個值得關注的輕量搜尋引擎
- OpenSearch — Elasticsearch 的開源分支