AI 程式碼審查自動化 — CI/CD 整合 AI Review

前言

Code review 是軟體開發中最重要但也最耗時的環節之一。每次開 PR，等資深工程師空下來 review，快則幾小時，慢則幾天。而且人工 review 的品質不穩定——忙的時候可能隨便看看就 approve 了。

最近一年我在團隊裡導入了 AI 輔助程式碼審查，用 GitHub Actions 在每次 PR 建立或更新時自動呼叫 LLM API，讓 AI 先做第一輪 review，把明顯的問題、潛在的 bug、以及風格不一致的地方標出來。人工 reviewer 就可以把精力放在架構設計和業務邏輯上。

這篇文章會完整分享我的做法，包括 GitHub Actions 設定、API 呼叫、prompt 設計、以及自動在 PR 上留 comment 的實作。

整體架構

PR 建立/更新
    ↓
GitHub Actions 觸發
    ↓
取得 PR diff（透過 GitHub API）
    ↓
將 diff 切分成可處理的 chunk
    ↓
呼叫 LLM API（Claude/GPT）進行審查
    ↓
解析 AI 回覆
    ↓
在 PR 上自動留下 review comment

GitHub Actions 設定

基本 Workflow 檔案

# .github/workflows/ai-review.yml name: AI Code Review on: pull_request: types: [opened, synchronize] # PR 建立或有新 commit 時觸發 permissions: contents: read pull-requests: write # 需要寫入權限才能留 comment jobs: ai-review: runs-on: ubuntu-latest # 避免對 dependabot 的 PR 做 review（通常是版本更新，沒必要） if: github.actor != 'dependabot[bot]' steps: - name: Checkout code uses: actions/checkout@v4 with: fetch-depth: 0 # 需要完整歷史才能算 diff - name: Setup Python uses: actions/setup-python@v5 with: python-version: "3.11" - name: Install dependencies run: pip install anthropic pygithub

- name: Run AI Review env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} PR_NUMBER: ${{ github.event.pull_request.number }} REPO_NAME: ${{ github.repository }} run: python .github/scripts/ai_review.py

設定 Secrets

在你的 GitHub repo 設定中加入：

# 用 GitHub CLI 設定 secrets
gh secret set ANTHROPIC_API_KEY --body "sk-ant-api03-xxxx"

GITHUB_TOKEN 是 GitHub Actions 自動提供的，不需要額外設定。

AI Review 主程式

這是核心的 Python 腳本，負責串接所有邏輯：

#!/usr/bin/env python3
"""
AI Code Review Script
自動取得 PR diff，呼叫 Claude API 審查，並在 PR 上留下 comment。
"""

import os
import json
import re
from github import Github
from anthropic import Anthropic

# === 環境變數 ===
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
PR_NUMBER = int(os.environ["PR_NUMBER"])
REPO_NAME = os.environ["REPO_NAME"]

# === 初始化客戶端 ===
gh = Github(GITHUB_TOKEN)
repo = gh.get_repo(REPO_NAME)
pr = repo.get_pull(PR_NUMBER)
anthropic = Anthropic(api_key=ANTHROPIC_API_KEY)

def get_pr_diff():
    """取得 PR 的 diff 內容，並過濾不需要審查的檔案。"""
    files = pr.get_files()

# 過濾規則：跳過這些類型的檔案
    skip_patterns = [
        r"\.lock$",           # lock files
        r"\.min\.(js|css)$",  # minified files
        r"^vendor/",          # vendor 目錄
        r"^node_modules/",
        r"\.(png|jpg|gif|svg|ico)$",  # 圖片
        r"\.generated\.",     # 自動生成的檔案
    ]

diffs = []
    for file in files:
        # 檢查是否應該跳過
        if any(re.search(p, file.filename) for p in skip_patterns):
            continue

# 跳過太大的檔案（超過 500 行 diff 的通常需要人工看）
        if file.changes > 500:
            diffs.append({
                "filename": file.filename,
                "patch": f"[檔案變更超過 500 行，建議人工重點審查]",
                "status": file.status,
            })
            continue

if file.patch:
            diffs.append({
                "filename": file.filename,
                "patch": file.patch,
                "status": file.status,  # added, modified, removed
            })

return diffs

def chunk_diffs(diffs, max_chars=12000):
    """
    將 diff 切分成多個 chunk，避免超過 API token 限制。
    每個 chunk 盡量包含完整的檔案 diff。
    """
    chunks = []
    current_chunk = []
    current_size = 0

for diff in diffs:
        diff_text = f"=== {diff['filename']} ({diff['status']}) ===\n{diff['patch']}"
        diff_size = len(diff_text)

if current_size + diff_size > max_chars and current_chunk:
            chunks.append("\n\n".join(current_chunk))
            current_chunk = []
            current_size = 0

current_chunk.append(diff_text)
        current_size += diff_size

if current_chunk:
        chunks.append("\n\n".join(current_chunk))

return chunks

def review_chunk(chunk, pr_title, pr_description):
    """呼叫 Claude API 審查一段 diff。"""
    system_prompt = """你是一位資深軟體工程師，正在進行 code review。
請仔細審查以下程式碼變更，並提供具體、有建設性的回饋。

審查重點：

Bug 與邏輯錯誤：可能導致 runtime error 或不正確行為的問題
安全性問題：SQL injection、XSS、敏感資訊洩漏、不安全的 deserialization
效能問題：N+1 查詢、不必要的迴圈、記憶體洩漏
可讀性：命名不清楚、過度複雜的邏輯、缺少必要的註解
最佳實踐：錯誤處理不足、缺少輸入驗證、hard-coded 值


回覆格式要求（JSON 陣列）：
[
  {
    "file": "檔案路徑",
    "line": "相關的程式碼行（原文）",
    "severity": "critical|warning|suggestion",
    "comment": "具體的問題描述與建議修正方式"
  }
]

規則：

只回報真正有價值的問題，不要吹毛求疵
severity 說明：critical = 必須修正的 bug/安全問題，warning = 應該修正的問題，suggestion = 可以改進但非必要
如果沒有發現問題，回覆空陣列 []
只回覆 JSON，不要加其他文字"""


user_message = f"""PR 標題: {pr_title}
PR 描述: {pr_description or '（無描述）'}

程式碼變更：
{chunk}"""

response = anthropic.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=system_prompt,
        messages=[{"role": "user", "content": user_message}],
        temperature=0.2,  # 低溫度讓回覆更穩定
    )

# 解析 JSON 回覆
    try:
        content = response.content[0].text
        # 處理可能的 markdown code block 包裹
        content = re.sub(r"^

json\s*”, “”, content.strip())
content = re.sub(r”\s*“$", "", content.strip()) return json.loads(content) except (json.JSONDecodeError, IndexError) as e: print(f"解析 AI 回覆失敗: {e}") print(f"原始回覆: {response.content[0].text}") return []


def post_review_comments(all_issues):

    """將審查結果以 PR comment 的形式發佈。"""

    if not all_issues:

        # 沒有問題，留一個簡短的 comment

        pr.create_issue_comment(

            "<strong>AI Code Review</strong> :robot:\n\n"

            "自動審查完成，未發現明顯問題。\n"

            "（這是自動化審查，仍建議進行人工 review）"

        )

        return
# 按嚴重程度排序

    severity_order = {"critical": 0, "warning": 1, "suggestion": 2}

    all_issues.sort(key=lambda x: severity_order.get(x.get("severity", "suggestion"), 3))
# 組裝 comment

    severity_emoji = {

        "critical": ":red_circle:",

        "warning": ":yellow_circle:",

        "suggestion": ":blue_circle:",

    }
lines = ["<strong>AI Code Review</strong> :robot:\n"]
# 統計

    counts = {}

    for issue in all_issues:

        s = issue.get("severity", "suggestion")

        counts[s] = counts.get(s, 0) + 1
summary_parts = []

    for s in ["critical", "warning", "suggestion"]:

        if s in counts:

            summary_parts.append(f"{severity_emoji[s]} {s}: {counts[s]}")

    lines.append(" | ".join(summary_parts) + "\n")
# 詳細問題列表

    for issue in all_issues:

        severity = issue.get("severity", "suggestion")

        emoji = severity_emoji.get(severity, ":white_circle:")

        file = issue.get("file", "unknown")

        comment = issue.get("comment", "")

        code_line = issue.get("line", "")

lines.append(f"### {emoji} {file}") if code_line: lines.append(f"`\n{code_line}\n`") lines.append(f"{comment}\n")


lines.append("---\n<em>此為 AI 自動審查結果，僅供參考。請以人工審查為準。</em>")
pr.create_issue_comment("\n".join(lines))
def main():

    print(f"開始審查 PR #{PR_NUMBER}: {pr.title}")
# 1. 取得 diff

    diffs = get_pr_diff()

    if not diffs:

        print("沒有需要審查的檔案變更")

        return
print(f"共 {len(diffs)} 個檔案需要審查")
# 2. 切分 chunk

    chunks = chunk_diffs(diffs)

    print(f"切分為 {len(chunks)} 個 chunk")
# 3. 逐一審查

    all_issues = []

    for i, chunk in enumerate(chunks):

        print(f"審查 chunk {i+1}/{len(chunks)}...")

        issues = review_chunk(chunk, pr.title, pr.body)

        all_issues.extend(issues)

        print(f"  發現 {len(issues)} 個問題")
print(f"總共發現 {len(all_issues)} 個問題")
# 4. 發佈 review comment

    post_review_comments(all_issues)

    print("審查結果已發佈到 PR")
if __name__ == "__main__":

    main()

<pre><code>## Prompt 設計心得
Prompt 的品質直接決定 AI review 的實用性。以下是我迭代了很多版本後的心得：
<h3>避免誤報的技巧</code></pre>python</h3>

# 在 system prompt 中加入反例，降低誤報率

ANTI_PATTERNS = """

不要回報以下類型的問題（這些通常是誤報）：

import 順序（交給 linter 處理）
單純的命名風格偏好（除非真的很難讀）
缺少 type hints（除非是公開 API）
測試檔案中的 magic number
TODO/FIXME 註解（這些是有意留下的）

"""

<pre><code>### 根據檔案類型調整 prompt</code></pre>python

def get_context_prompt(filename):

    """根據檔案類型提供額外的審查重點。"""

    if filename.endswith((".sql", ".migration")):

        return """

額外注意：

是否有 SQL injection 風險
Migration 是否可逆（有對應的 rollback）
大表操作是否會鎖表
是否缺少索引

"""

    elif filename.endswith(("Dockerfile", "docker-compose.yml")):

        return """

額外注意：

是否使用了 latest tag（應該固定版本）
是否以 root 身份執行
是否暴露了敏感的環境變數
多階段建構是否最佳化

"""

    elif filename.endswith((".py",)):

        return """

額外注意：

是否有未處理的例外
async 函式是否正確 await
是否有資源未關閉（file handle、db connection）

"""

    return ""

<pre><code>### 控制回覆品質</code></pre>python

# 加入 few-shot examples 來校準回覆格式和品質

FEW_SHOT = """

以下是好的審查回饋範例：
輸入 diff:

<pre><code>+    user = db.query(User).filter(User.id == user_id).first()

+    orders = db.query(Order).filter(Order.user_id == user.id).all()</code></pre>
好的回饋:

{

  "file": "services/order.py",

  "line": "orders = db.query(Order).filter(Order.user_id == user.id).all()",

  "severity": "critical",

  "comment": "user 可能為 None（當 user_id 不存在時），直接存取 user.id 會拋出 AttributeError。建議加入 None 檢查：if not user: raise NotFoundError()"

}
不好的回饋（過於瑣碎）:

{

  "file": "services/order.py",

  "line": "user = db.query(...)",

  "severity": "suggestion",

  "comment": "變數名稱可以更具描述性，例如 target_user"

}

"""

<pre><code>## 進階功能
<h3>增量 Review（只審查新的 commit）</code></pre>python</h3>

def get_incremental_diff():

    """只取得最新 push 的 commit 變更，避免重複審查。"""

    # 取得上次 AI review 的 commit SHA

    comments = pr.get_issue_comments()

    last_review_sha = None
for comment in comments:

        if "AI Code Review" in comment.body:

            # 從 comment 中解析 SHA

            match = re.search(r"reviewed up to: ([a-f0-9]+)", comment.body)

            if match:

                last_review_sha = match.group(1)
if last_review_sha:

        # 只取得新的 diff

        comparison = repo.compare(last_review_sha, pr.head.sha)

        return comparison.files

    else:

        return pr.get_files()

<pre><code>### 成本控制</code></pre>python

# 追蹤 API 使用量，設定每月預算上限

import sqlite3

from datetime import datetime
def track_usage(input_tokens, output_tokens, model="claude-sonnet"):

    """記錄 API 使用量。"""

    conn = sqlite3.connect("ai_review_usage.db")

    conn.execute("""

        CREATE TABLE IF NOT EXISTS usage (

            date TEXT, model TEXT,

            input_tokens INTEGER, output_tokens INTEGER,

            cost_usd REAL

        )

    """)
# Claude Sonnet 定價（2024）

    cost = (input_tokens <em> 3 / 1_000_000) + (output_tokens </em> 15 / 1_000_000)
conn.execute(

        "INSERT INTO usage VALUES (?, ?, ?, ?, ?)",

        (datetime.now().isoformat(), model, input_tokens, output_tokens, cost)

    )

    conn.commit()
# 檢查月度預算

    month_start = datetime.now().strftime("%Y-%m-01")

    row = conn.execute(

        "SELECT SUM(cost_usd) FROM usage WHERE date >= ?", (month_start,)

    ).fetchone()

    monthly_cost = row[0] or 0
MONTHLY_BUDGET = 50.0  # USD

    if monthly_cost > MONTHLY_BUDGET:

        raise Exception(f"月度 AI review 預算已超支: ${monthly_cost:.2f} / ${MONTHLY_BUDGET}")
return cost

<pre><code>### 用 GitHub Actions Matrix 平行審查</code></pre>yaml

# 對大型 PR，可以用 matrix strategy 平行處理多個檔案群組

jobs:

  prepare:

    runs-on: ubuntu-latest

    outputs:

      chunks: ${{ steps.split.outputs.chunks }}

    steps:

      - uses: actions/checkout@v4

      - id: split

        run: |

          # 將檔案分組並輸出為 JSON matrix

          python .github/scripts/split_files.py >> "$GITHUB_OUTPUT"

review: needs: prepare runs-on: ubuntu-latest strategy: matrix: chunk: ${{ fromJson(needs.prepare.outputs.chunks) }} steps: - uses: actions/checkout@v4 - run: python .github/scripts/ai_review.py --chunk "${{ matrix.chunk }}"“

小結

導入 AI code review 半年後的實際成果：

Review 等待時間：從平均 6 小時降到 15 分鐘（AI 先看，人再看）
Bug 發現率：AI 抓到了約 15% 人工 reviewer 漏掉的問題
每月成本：大約 $30-50 USD（團隊 8 人，每天約 5-10 個 PR）
誤報率：初期約 30%，經過 prompt 調整後降到 10% 以下

幾個重要提醒：

AI review 是輔助，不是取代：永遠不要讓 AI 成為唯一的 reviewer
Prompt 需要持續迭代：根據團隊的回饋不斷調整
控制成本：設定預算上限，大檔案跳過或只做摘要
透明度：讓團隊知道哪些 comment 是 AI 產生的

延伸閱讀建議：

GitHub Actions 官方文件
Anthropic API 文件
Danger.js — 另一個 PR 自動化工具，可以和 AI review 搭配使用

前言

整體架構

GitHub Actions 設定

基本 Workflow 檔案

設定 Secrets

AI Review 主程式

小結

Related Articles