SQL 效能調校 — EXPLAIN ANALYZE 實戰

前言

「這個 API 怎麼這麼慢？」——後端工程師的日常噩夢。十次裡面有七次，問題出在資料庫查詢。而要找出慢查詢的根本原因，EXPLAIN ANALYZE 是你最強的武器。

我剛入行的時候，看到 EXPLAIN 的輸出只覺得一頭霧水——什麼 Seq Scan、Hash Join、Bitmap Heap Scan，每個字都認識但組合在一起完全看不懂。後來花了不少時間才慢慢摸清楚每個運算子的意義和它們之間的關係。

這篇文章會用大量的實際範例，帶你從零學會解讀 EXPLAIN ANALYZE 的輸出。雖然主要以 PostgreSQL 為例，但概念在 MySQL 也是相通的。

EXPLAIN 基本語法

PostgreSQL

-- 基本用法：只顯示查詢計畫，不會真的執行
EXPLAIN
SELECT * FROM orders WHERE customer_id = 123;

-- ANALYZE：實際執行查詢並顯示真實的時間和列數
-- 注意：這會真的執行查詢！對 UPDATE/DELETE 要小心
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

-- 完整版本：包含所有資訊
EXPLAIN (ANALYZE, BUFFERS, TIMING, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;

-- JSON 格式（方便程式解析）
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 123;

-- 對 UPDATE/DELETE 安全地使用 EXPLAIN ANALYZE
BEGIN;
EXPLAIN ANALYZE DELETE FROM temp_logs WHERE created_at < '2024-01-01';
ROLLBACK;  -- 回滾，不會真的刪除資料

MySQL

-- MySQL 8.0 的 EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

-- 傳統 EXPLAIN（不執行，只看計畫）
EXPLAIN
SELECT * FROM orders WHERE customer_id = 123;

-- JSON 格式
EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE customer_id = 123;

解讀執行計畫

一個完整的範例

假設我們有以下的表結構和查詢：

-- 表結構
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    status VARCHAR(20),
    amount NUMERIC(10, 2),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);

-- 假設 customers 有 10 萬筆，orders 有 500 萬筆

-- 查詢：找出 2024 年訂單金額前 10 名的客戶
EXPLAIN (ANALYZE, BUFFERS)
SELECT
    c.name,
    c.email,
    COUNT(o.id) AS order_count,
    SUM(o.amount) AS total_amount
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.created_at >= '2024-01-01'
  AND o.status = 'completed'
GROUP BY c.id, c.name, c.email
ORDER BY total_amount DESC
LIMIT 10;

輸出可能長這樣：

Limit  (cost=45230.12..45230.14 rows=10 width=82) (actual time=1523.456..1523.461 rows=10 loops=1)
  Buffers: shared hit=12340 read=5678
  ->  Sort  (cost=45230.12..45350.45 rows=48133 width=82) (actual time=1523.453..1523.456 rows=10 loops=1)
        Sort Key: (sum(o.amount)) DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  HashAggregate  (cost=43210.56..43691.89 rows=48133 width=82) (actual time=1456.789..1510.123 rows=48133 loops=1)
              Group Key: c.id
              Batches: 1  Memory Usage: 8209kB
              ->  Hash Join  (cost=3245.00..41678.90 rows=204888 width=50) (actual time=45.678..1234.567 rows=204888 loops=1)
                    Hash Cond: (o.customer_id = c.id)
                    ->  Bitmap Heap Scan on orders o  (cost=789.12..38901.23 rows=204888 width=22) (actual time=12.345..1100.234 rows=204888 loops=1)
                          Recheck Cond: (status = 'completed')
                          Filter: (created_at >= '2024-01-01')
                          Rows Removed by Filter: 95112
                          Heap Blocks: exact=15678
                          ->  Bitmap Index Scan on idx_orders_status  (cost=0.00..737.90 rows=300000 width=0) (actual time=8.901..8.901 rows=300000 loops=1)
                                Index Cond: (status = 'completed')
                    ->  Hash  (cost=1456.00..1456.00 rows=100000 width=36) (actual time=32.456..32.456 rows=100000 loops=1)
                          Buckets: 131072  Batches: 1  Memory Usage: 7890kB
                          ->  Seq Scan on customers c  (cost=0.00..1456.00 rows=100000 width=36) (actual time=0.012..15.678 rows=100000 loops=1)
Planning Time: 0.456 ms
Execution Time: 1523.789 ms

逐層解讀

讓我把上面的執行計畫拆解：

每一行的格式： Node Type (cost=startup..total rows=estimated width=bytes) (actual time=startup..total rows=actual loops=N) cost：估計成本（無單位，相對值） - startup cost: 在輸出第一筆結果前的成本 - total cost: 輸出所有結果的總成本 actual time：實際執行時間（毫秒） - startup time: 開始輸出前等了多久 - total time: 完成所有輸出花了多久 rows： - cost 那邊的 rows = 估計的列數 - actual 那邊的 rows = 實際的列數 - 如果兩者差距很大，可能是統計資訊過時了

loops：這個節點被執行了幾次（Nested Loop 的內層會 > 1）

Seq Scan vs Index Scan

Sequential Scan（全表掃描）

-- 沒有索引或查詢條件範圍太大時，會走 Seq Scan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE amount > 10;
-- 如果大部分資料都符合條件，Seq Scan 反而比 Index Scan 快

-- 輸出：
-- Seq Scan on orders (cost=0.00..125000.00 rows=4500000 width=48)
--   Filter: (amount > 10)
--   Rows Removed by Filter: 500000

Index Scan

-- 精確查詢，走 Index Scan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

-- 輸出：
-- Index Scan using idx_orders_customer_id on orders
--   (cost=0.43..24.56 rows=50 width=48)
--   (actual time=0.023..0.089 rows=47 loops=1)
--   Index Cond: (customer_id = 123)

Bitmap Index Scan + Bitmap Heap Scan

-- 當符合條件的資料量不大不小時，PostgreSQL 可能選擇 Bitmap Scan
-- 它先用索引收集所有符合條件的資料頁面位置，然後按頁面順序讀取
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';

-- 輸出：
-- Bitmap Heap Scan on orders (cost=567.89..23456.78 rows=50000 width=48)
--   Recheck Cond: (status = 'pending')
--   Heap Blocks: exact=12345
--   ->  Bitmap Index Scan on idx_orders_status
--         Index Cond: (status = 'pending')

掃描方式選擇的經驗法則：
┌─────────────────┬──────────────────────────┐
│ 符合條件的比例   │ 通常選擇的掃描方式        │
├─────────────────┼──────────────────────────┤
│ < 5%            │ Index Scan               │
│ 5% ~ 20%       │ Bitmap Index/Heap Scan   │
│ > 20%           │ Sequential Scan          │
└─────────────────┴──────────────────────────┘
（這只是粗略的經驗值，實際取決於很多因素）

JOIN 策略

PostgreSQL 有三種 JOIN 策略：

Nested Loop Join

-- 適合：外表小、內表有索引
-- 概念：for each row in outer_table: scan inner_table
EXPLAIN ANALYZE
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.id = 123;

-- 輸出：
-- Nested Loop (actual time=0.034..0.156 rows=47 loops=1)
--   ->  Index Scan using customers_pkey on customers c (rows=1)
--         Index Cond: (id = 123)
--   ->  Index Scan using idx_orders_customer_id on orders o (rows=47)
--         Index Cond: (customer_id = 123)

Hash Join

-- 適合：兩表都比較大，沒有特別好的索引
-- 概念：先把較小的表建成 hash table，然後掃描較大的表去 probe
EXPLAIN ANALYZE
SELECT c.name, SUM(o.amount)
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

-- 輸出：
-- HashAggregate
--   ->  Hash Join (actual time=45.678..2345.678 rows=5000000)
--         Hash Cond: (o.customer_id = c.id)
--         ->  Seq Scan on orders o (rows=5000000)
--         ->  Hash (actual time=32.456..32.456 rows=100000)
--               Buckets: 131072  Memory Usage: 7890kB
--               ->  Seq Scan on customers c (rows=100000)

Merge Join

-- 適合：兩表都已經按 JOIN key 排序（例如都有對應的索引）
-- 概念：兩邊同時掃描排序好的資料，像 merge sort 的合併步驟
EXPLAIN ANALYZE
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.id
ORDER BY c.id;

-- 可能的輸出：
-- Merge Join (actual time=0.123..3456.789 rows=5000000)
--   Merge Cond: (c.id = o.customer_id)
--   ->  Index Scan using customers_pkey on customers c
--   ->  Index Scan using idx_orders_customer_id on orders o

常見效能問題與解法

問題一：估計列數嚴重偏差

-- 當 EXPLAIN 估計的 rows 和 actual rows 差距超過 10 倍，就要注意 -- 這通常表示統計資訊過時了 -- 解法：更新統計資訊 ANALYZE orders; -- 更新單一表 ANALYZE; -- 更新整個資料庫

-- 查看目前的統計資訊 SELECT attname, n_distinct, most_common_vals, most_common_freqs FROM pg_stats WHERE tablename = 'orders' AND attname = 'status';

問題二：不必要的 Seq Scan

-- 明明有索引卻走 Seq Scan？常見原因：

-- 1. 型別不匹配
-- 錯誤：customer_id 是 INT，但傳了字串
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = '123';  -- 可能不走索引！

-- 正確：
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

-- 2. 函式包裹導致索引失效
-- 錯誤：
EXPLAIN ANALYZE
SELECT * FROM orders WHERE LOWER(status) = 'completed';

-- 解法：建立表達式索引
CREATE INDEX idx_orders_status_lower ON orders (LOWER(status));

-- 3. LIKE 前綴模糊查詢可以走索引，但後綴不行
EXPLAIN ANALYZE
SELECT * FROM customers WHERE email LIKE 'john%';    -- 走索引
SELECT * FROM customers WHERE email LIKE '%@gmail%';  -- 不走索引

-- 解法：用 pg_trgm 擴展
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_customers_email_trgm ON customers USING GIN (email gin_trgm_ops);

問題三：排序瓶頸

-- Sort 節點如果顯示 "Sort Method: external merge  Disk: XXkB"
-- 表示記憶體不夠用了，排序溢出到磁碟

-- 查看當前設定
SHOW work_mem;  -- 預設通常只有 4MB

-- 臨時調高（只影響當前 session）
SET work_mem = '256MB';
EXPLAIN ANALYZE
SELECT * FROM orders ORDER BY created_at DESC LIMIT 100;

-- 更好的解法：建立覆蓋索引，避免排序
CREATE INDEX idx_orders_created_desc ON orders (created_at DESC);

問題四：N+1 查詢

# 這不是 SQL 層面的問題，而是應用層的問題
# 但它是最常見的效能殺手

# 壞的寫法（N+1）：
customers = db.query("SELECT * FROM customers LIMIT 100")
for customer in customers:
    # 每個客戶都查一次，共 101 次查詢！
    orders = db.query(
        "SELECT * FROM orders WHERE customer_id = %s", customer.id
    )

# 好的寫法（1 次查詢）：
results = db.query("""
    SELECT c.*, o.id as order_id, o.amount, o.status
    FROM customers c
    LEFT JOIN orders o ON o.customer_id = c.id
    WHERE c.id IN (SELECT id FROM customers LIMIT 100)
""")

# 或者用兩次查詢（batch loading）：
customers = db.query("SELECT * FROM customers LIMIT 100")
customer_ids = [c.id for c in customers]
orders = db.query(
    "SELECT * FROM orders WHERE customer_id = ANY(%s)", [customer_ids]
)

問題五：鎖等待

-- 查看當前的鎖等待情況
SELECT
    blocked.pid AS blocked_pid,
    blocked_activity.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking_activity.query AS blocking_query,
    blocked_activity.wait_event_type,
    now() - blocked_activity.query_start AS blocked_duration
FROM pg_catalog.pg_locks blocked
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked.pid = blocked_activity.pid
JOIN pg_catalog.pg_locks blocking ON blocking.locktype = blocked.locktype
    AND blocking.database IS NOT DISTINCT FROM blocked.database
    AND blocking.relation IS NOT DISTINCT FROM blocked.relation
    AND blocking.page IS NOT DISTINCT FROM blocked.page
    AND blocking.tuple IS NOT DISTINCT FROM blocked.tuple
    AND blocking.transactionid IS NOT DISTINCT FROM blocked.transactionid
    AND blocking.pid != blocked.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking.pid = blocking_activity.pid
WHERE NOT blocked.granted;

實用的效能監控查詢

-- 找出最慢的查詢（需要 pg_stat_statements 擴展）
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SELECT
    substring(query, 1, 100) AS short_query,
    calls,
    round(total_exec_time::numeric, 2) AS total_time_ms,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    round(max_exec_time::numeric, 2) AS max_time_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 找出未使用的索引
SELECT
    schemaname, tablename, indexname,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelid NOT IN (SELECT conindid FROM pg_constraint)  -- 排除 PK/UNIQUE 約束
ORDER BY pg_relation_size(indexrelid) DESC;

-- 找出 cache hit ratio
SELECT
    sum(heap_blks_read) AS heap_read,
    sum(heap_blks_hit) AS heap_hit,
    round(sum(heap_blks_hit)::numeric / (sum(heap_blks_hit) + sum(heap_blks_read)) * 100, 2) AS cache_hit_ratio
FROM pg_statio_user_tables;
-- 理想值：> 99%

小結

EXPLAIN ANALYZE 是後端工程師的必備技能，核心心得：

先看 actual time 和 actual rows：找到最耗時的節點
比較 estimated rows 和 actual rows：差距大就 ANALYZE 更新統計
注意掃描類型：不該 Seq Scan 的地方出現 Seq Scan，通常是索引問題
關注 Buffers：shared hit 是快取命中，read 是磁碟讀取，read 太多就要想辦法減少
從外到內分析：執行計畫是從最內層開始執行的，但分析時從外層看起比較容易理解整體結構

養成習慣：每次寫完查詢，先跑一次 EXPLAIN ANALYZE 看看，不要等上線後才發現問題。

延伸閱讀建議：

PostgreSQL EXPLAIN 官方文件
depesz EXPLAIN 視覺化工具 — 貼上 EXPLAIN 輸出就能看到漂亮的視覺化
Use The Index, Luke — SQL 效能調校的經典教材
pgMustard — AI 輔助分析 EXPLAIN 輸出

前言