前言

你有沒有想過,Facebook 怎麼知道「你可能認識的人」?Google 搜尋結果右邊的知識面板是怎麼產生的?LinkedIn 怎麼計算你和某個人之間隔了幾度人脈?

這些問題的背後都有一個共通點:關聯性 才是資料中最有價值的部分。而傳統的關聯式資料庫在處理「多層關聯查詢」時,JOIN 的效能會急速下降。圖資料庫就是為了解決這個問題而生的。

今天這篇文章,我會帶你從零開始認識 Neo4j——目前最成熟的圖資料庫,用 Cypher 查詢語言來建模社交網路和知識圖譜。


為什麼需要圖資料庫?

關聯式資料庫的痛點

想像一個社交網路的「朋友的朋友」查詢。在 MySQL 中:

-- 找出 Alice 的朋友的朋友(2 度關聯)
SELECT DISTINCT u3.name
FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id
JOIN users u2 ON f1.friend_id = u2.id
JOIN friendships f2 ON u2.id = f2.user_id
JOIN users u3 ON f2.friend_id = u3.id
WHERE u1.name = 'Alice'
  AND u3.id != u1.id;

這才兩層 JOIN。如果要查 6 度分隔呢?JOIN 會爆炸,效能會崩潰。

圖資料庫的優勢

在圖資料庫中,同樣的查詢:

MATCH (alice:Person {name: 'Alice'})-[:FRIEND*2]->(fof:Person)
WHERE fof <> alice
RETURN DISTINCT fof.name

一行搞定,而且效能不會隨著關聯層數增加而急遽下降。這就是圖資料庫的核心優勢:遍歷關聯的效能是常數級別的


Neo4j 基礎概念

資料模型

圖資料庫的資料模型由三個核心元素組成:

  • 節點(Node):實體,例如一個人、一篇文章、一個城市
  • 關係(Relationship):節點之間的連線,有方向和類型,例如 FRIEND、WROTE、LIVES_IN
  • 屬性(Property):節點和關係上的鍵值對,例如 name: “Alice”, since: 2020

安裝 Neo4j

# docker-compose.yml
version: '3.8'
services:
  neo4j:
    image: neo4j:5.15-community
    ports:
      - "7474:7474"   # Browser UI
      - "7687:7687"   # Bolt protocol
    environment:
      NEO4J_AUTH: neo4j/secret123456
      NEO4J_PLUGINS: '["apoc"]'
    volumes:
      - neo4j_data:/data

volumes: neo4j_data:

docker compose up -d
# 開啟瀏覽器前往 http://localhost:7474
# 用 neo4j / secret123456 登入

Neo4j 的 Browser UI 是它的一大亮點,可以視覺化地看到節點和關係的圖形,對於除錯和理解資料結構非常有幫助。


Cypher 查詢語言

Cypher 是 Neo4j 專屬的查詢語言,語法直覺到讓人驚艷。它用 ASCII art 的方式來描述圖形模式。

基本語法

-- 節點用圓括號 ()
-- 關係用方括號 [] 和箭頭 --> 或 <--
-- 標籤用冒號 :Label

-- 建立節點 CREATE (alice:Person {name: 'Alice', age: 30, city: 'Taipei'}) CREATE (bob:Person {name: 'Bob', age: 28, city: 'Tokyo'}) CREATE (carol:Person {name: 'Carol', age: 32, city: 'Taipei'})

-- 建立關係 CREATE (alice)-[:FRIEND {since: 2020}]->(bob) CREATE (bob)-[:FRIEND {since: 2021}]->(carol) CREATE (alice)-[:FRIEND {since: 2019}]->(carol)

查詢

-- 找出所有 Alice 的朋友
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend:Person)
RETURN friend.name, friend.city

-- 找出住在 Taipei 的人 MATCH (p:Person {city: 'Taipei'}) RETURN p.name, p.age

-- 雙向查詢(不管關係方向) MATCH (alice:Person {name: 'Alice'})-[:FRIEND]-(friend:Person) RETURN friend.name

進階查詢

-- 朋友的朋友(排除自己和直接朋友)
MATCH (me:Person {name: 'Alice'})-[:FRIEND]->(friend)-[:FRIEND]->(fof)
WHERE fof <> me
  AND NOT (me)-[:FRIEND]->(fof)
RETURN DISTINCT fof.name AS recommendation

-- 最短路徑 MATCH path = shortestPath( (alice:Person {name: 'Alice'})-[:FRIEND*]-(dave:Person {name: 'Dave'}) ) RETURN path, length(path) AS distance

-- 可變長度路徑(1 到 5 跳) MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..5]->(connected:Person) RETURN DISTINCT connected.name, min(length(shortestPath((alice)-[:FRIEND*]-(connected)))) AS distance ORDER BY distance


實戰一:社交網路建模

讓我們建一個稍微複雜的社交網路模型:

建立資料

// 建立人物
CREATE (alice:Person {name: 'Alice', age: 30, interests: ['coding', 'hiking']})
CREATE (bob:Person {name: 'Bob', age: 28, interests: ['coding', 'gaming']})
CREATE (carol:Person {name: 'Carol', age: 32, interests: ['design', 'hiking']})
CREATE (dave:Person {name: 'Dave', age: 25, interests: ['coding', 'music']})
CREATE (eve:Person {name: 'Eve', age: 29, interests: ['design', 'cooking']})
CREATE (frank:Person {name: 'Frank', age: 35, interests: ['gaming', 'music']})

// 建立公司 CREATE (techcorp:Company {name: 'TechCorp', industry: 'Technology'}) CREATE (designlab:Company {name: 'DesignLab', industry: 'Design'})

// 建立技能 CREATE (python:Skill {name: 'Python'}) CREATE (javascript:Skill {name: 'JavaScript'}) CREATE (figma:Skill {name: 'Figma'})

// 建立關係 CREATE (alice)-[:FRIEND {since: 2019}]->(bob) CREATE (alice)-[:FRIEND {since: 2020}]->(carol) CREATE (bob)-[:FRIEND {since: 2021}]->(dave) CREATE (carol)-[:FRIEND {since: 2020}]->(eve) CREATE (dave)-[:FRIEND {since: 2022}]->(frank)

CREATE (alice)-[:WORKS_AT {role: 'Backend Engineer', since: 2020}]->(techcorp) CREATE (bob)-[:WORKS_AT {role: 'Frontend Engineer', since: 2021}]->(techcorp) CREATE (carol)-[:WORKS_AT {role: 'UX Designer', since: 2019}]->(designlab) CREATE (eve)-[:WORKS_AT {role: 'UI Designer', since: 2022}]->(designlab)

CREATE (alice)-[:HAS_SKILL {level: 'expert'}]->(python) CREATE (bob)-[:HAS_SKILL {level: 'intermediate'}]->(javascript) CREATE (carol)-[:HAS_SKILL {level: 'expert'}]->(figma) CREATE (dave)-[:HAS_SKILL {level: 'beginner'}]->(python)

社交推薦查詢

// 推薦 Alice 可能認識的人(朋友的朋友)
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(f)-[:FRIEND]->(recommendation)
WHERE recommendation <> alice
  AND NOT (alice)-[:FRIEND]->(recommendation)
RETURN recommendation.name,
       COUNT(f) AS mutual_friends,
       COLLECT(f.name) AS through
ORDER BY mutual_friends DESC

// 找出和 Alice 在同公司的人 MATCH (alice:Person {name: 'Alice'})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(colleague) WHERE colleague <> alice RETURN colleague.name, company.name

// 找出有相同技能的人 MATCH (alice:Person {name: 'Alice'})-[:HAS_SKILL]->(skill)<-[:HAS_SKILL]-(other) WHERE other <> alice RETURN other.name, skill.name, other.age

// 影響力分析:誰的朋友最多? MATCH (p:Person)-[:FRIEND]-(friend) RETURN p.name, COUNT(friend) AS friend_count ORDER BY friend_count DESC


實戰二:知識圖譜建模

知識圖譜是圖資料庫另一個強大的應用場景。以一個技術知識圖譜為例:

建立知識圖譜

// 建立程式語言節點
CREATE (python:Language {name: 'Python', paradigm: 'multi', year: 1991})
CREATE (js:Language {name: 'JavaScript', paradigm: 'multi', year: 1995})
CREATE (rust:Language {name: 'Rust', paradigm: 'multi', year: 2010})
CREATE (go:Language {name: 'Go', paradigm: 'concurrent', year: 2009})

// 建立框架節點 CREATE (django:Framework {name: 'Django', type: 'web'}) CREATE (flask:Framework {name: 'Flask', type: 'web'}) CREATE (react:Framework {name: 'React', type: 'frontend'}) CREATE (fastapi:Framework {name: 'FastAPI', type: 'api'})

// 建立概念節點 CREATE (orm:Concept {name: 'ORM'}) CREATE (restapi:Concept {name: 'REST API'}) CREATE (async_prog:Concept {name: 'Async Programming'})

// 建立關係 CREATE (django)-[:WRITTEN_IN]->(python) CREATE (flask)-[:WRITTEN_IN]->(python) CREATE (fastapi)-[:WRITTEN_IN]->(python) CREATE (react)-[:WRITTEN_IN]->(js)

CREATE (django)-[:IMPLEMENTS]->(orm) CREATE (django)-[:IMPLEMENTS]->(restapi) CREATE (fastapi)-[:IMPLEMENTS]->(restapi) CREATE (fastapi)-[:IMPLEMENTS]->(async_prog)

CREATE (python)-[:INFLUENCED_BY]->(rust) CREATE (go)-[:INFLUENCED_BY]->(rust)

CREATE (flask)-[:SIMILAR_TO {reason: 'lightweight web'}]->(fastapi)

知識查詢

// 找出 Python 生態系的所有框架
MATCH (f:Framework)-[:WRITTEN_IN]->(python:Language {name: 'Python'})
RETURN f.name, f.type

// 實作了 REST API 的所有框架,以及它們的語言 MATCH (f:Framework)-[:IMPLEMENTS]->(c:Concept {name: 'REST API'}) MATCH (f)-[:WRITTEN_IN]->(lang:Language) RETURN f.name, lang.name

// 從一個概念出發,找出所有相關技術 MATCH (c:Concept {name: 'Async Programming'})<-[:IMPLEMENTS]-(f:Framework) MATCH (f)-[:WRITTEN_IN]->(lang:Language) RETURN c.name AS concept, f.name AS framework, lang.name AS language

// 推薦學習路徑:如果你會 Django,你可能也需要學... MATCH (django:Framework {name: 'Django'})-[:WRITTEN_IN]->(lang) MATCH (related:Framework)-[:WRITTEN_IN]->(lang) WHERE related <> django OPTIONAL MATCH (related)-[:IMPLEMENTS]->(concept) RETURN related.name, COLLECT(DISTINCT concept.name) AS concepts


用 Python 操作 Neo4j

from neo4j import GraphDatabase

class SocialGraph: def __init__(self, uri, user, password): self.driver = GraphDatabase.driver(uri, auth=(user, password))

def close(self): self.driver.close()

def add_person(self, name, age, city): with self.driver.session() as session: session.run( "CREATE (p:Person {name: $name, age: $age, city: $city})", name=name, age=age, city=city )

def add_friendship(self, name1, name2): with self.driver.session() as session: session.run(""" MATCH (a:Person {name: $name1}) MATCH (b:Person {name: $name2}) MERGE (a)-[:FRIEND]->(b) """, name1=name1, name2=name2)

def find_friends_of_friends(self, name): with self.driver.session() as session: result = session.run(""" MATCH (me:Person {name: $name})-[:FRIEND]->(f)-[:FRIEND]->(fof) WHERE fof <> me AND NOT (me)-[:FRIEND]->(fof) RETURN DISTINCT fof.name AS name, COUNT(f) AS mutual ORDER BY mutual DESC """, name=name) return [(record["name"], record["mutual"]) for record in result]

def shortest_path(self, name1, name2): with self.driver.session() as session: result = session.run(""" MATCH path = shortestPath( (a:Person {name: $name1})-[:FRIEND*]-(b:Person {name: $name2}) ) RETURN [node IN nodes(path) | node.name] AS names, length(path) AS distance """, name1=name1, name2=name2) record = result.single() if record: return record["names"], record["distance"] return None, None

# 使用 graph = SocialGraph("bolt://localhost:7687", "neo4j", "secret123456") graph.add_person("Alice", 30, "Taipei") graph.add_person("Bob", 28, "Tokyo") graph.add_friendship("Alice", "Bob")

recommendations = graph.find_friends_of_friends("Alice") for name, mutual in recommendations: print(f"{name} ({mutual} mutual friends)")

names, distance = graph.shortest_path("Alice", "Frank") print(f"Path: {' -> '.join(names)}, Distance: {distance}")

graph.close()


什麼時候該用圖資料庫?

適合的場景

  • 社交網路:好友推薦、影響力分析、社群偵測
  • 知識圖譜:語意搜尋、實體關聯、問答系統
  • 推薦系統:基於關聯的協同過濾
  • 欺詐偵測:找出異常的交易網路模式
  • 網路與 IT 管理:服務依賴關係、影響分析

不適合的場景

  • 簡單的 CRUD 操作
  • 大量聚合運算(這是 OLAP 資料庫的強項)
  • 資料之間沒有明顯的關聯性

小結

圖資料庫代表了一種不同的思考方式——不再把資料硬塞進行和列的框架裡,而是用節點和關係來自然地描述世界。Neo4j 的 Cypher 查詢語言讓這種思維變得非常直覺,幾乎像在畫圖一樣寫查詢。

我的建議是:先用 Neo4j Browser(http://localhost:7474)視覺化地探索你的資料,等到你在白板上開始畫圓圈和箭頭的時候,就是該考慮圖資料庫的時候了。

延伸閱讀