前言
你有沒有想過,Facebook 怎麼知道「你可能認識的人」?Google 搜尋結果右邊的知識面板是怎麼產生的?LinkedIn 怎麼計算你和某個人之間隔了幾度人脈?
這些問題的背後都有一個共通點:關聯性 才是資料中最有價值的部分。而傳統的關聯式資料庫在處理「多層關聯查詢」時,JOIN 的效能會急速下降。圖資料庫就是為了解決這個問題而生的。
今天這篇文章,我會帶你從零開始認識 Neo4j——目前最成熟的圖資料庫,用 Cypher 查詢語言來建模社交網路和知識圖譜。
為什麼需要圖資料庫?
關聯式資料庫的痛點
想像一個社交網路的「朋友的朋友」查詢。在 MySQL 中:
-- 找出 Alice 的朋友的朋友(2 度關聯)
SELECT DISTINCT u3.name
FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id
JOIN users u2 ON f1.friend_id = u2.id
JOIN friendships f2 ON u2.id = f2.user_id
JOIN users u3 ON f2.friend_id = u3.id
WHERE u1.name = 'Alice'
AND u3.id != u1.id;
這才兩層 JOIN。如果要查 6 度分隔呢?JOIN 會爆炸,效能會崩潰。
圖資料庫的優勢
在圖資料庫中,同樣的查詢:
MATCH (alice:Person {name: 'Alice'})-[:FRIEND*2]->(fof:Person)
WHERE fof <> alice
RETURN DISTINCT fof.name
一行搞定,而且效能不會隨著關聯層數增加而急遽下降。這就是圖資料庫的核心優勢:遍歷關聯的效能是常數級別的。
Neo4j 基礎概念
資料模型
圖資料庫的資料模型由三個核心元素組成:
- 節點(Node):實體,例如一個人、一篇文章、一個城市
- 關係(Relationship):節點之間的連線,有方向和類型,例如 FRIEND、WROTE、LIVES_IN
- 屬性(Property):節點和關係上的鍵值對,例如 name: “Alice”, since: 2020
安裝 Neo4j
# docker-compose.yml
version: '3.8'
services:
neo4j:
image: neo4j:5.15-community
ports:
- "7474:7474" # Browser UI
- "7687:7687" # Bolt protocol
environment:
NEO4J_AUTH: neo4j/secret123456
NEO4J_PLUGINS: '["apoc"]'
volumes:
- neo4j_data:/data
volumes:
neo4j_data:
docker compose up -d
# 開啟瀏覽器前往 http://localhost:7474
# 用 neo4j / secret123456 登入
Neo4j 的 Browser UI 是它的一大亮點,可以視覺化地看到節點和關係的圖形,對於除錯和理解資料結構非常有幫助。
Cypher 查詢語言
Cypher 是 Neo4j 專屬的查詢語言,語法直覺到讓人驚艷。它用 ASCII art 的方式來描述圖形模式。
基本語法
-- 節點用圓括號 ()
-- 關係用方括號 [] 和箭頭 --> 或 <--
-- 標籤用冒號 :Label
-- 建立節點
CREATE (alice:Person {name: 'Alice', age: 30, city: 'Taipei'})
CREATE (bob:Person {name: 'Bob', age: 28, city: 'Tokyo'})
CREATE (carol:Person {name: 'Carol', age: 32, city: 'Taipei'})
-- 建立關係
CREATE (alice)-[:FRIEND {since: 2020}]->(bob)
CREATE (bob)-[:FRIEND {since: 2021}]->(carol)
CREATE (alice)-[:FRIEND {since: 2019}]->(carol)
查詢
-- 找出所有 Alice 的朋友
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend:Person)
RETURN friend.name, friend.city
-- 找出住在 Taipei 的人
MATCH (p:Person {city: 'Taipei'})
RETURN p.name, p.age
-- 雙向查詢(不管關係方向)
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]-(friend:Person)
RETURN friend.name
進階查詢
-- 朋友的朋友(排除自己和直接朋友)
MATCH (me:Person {name: 'Alice'})-[:FRIEND]->(friend)-[:FRIEND]->(fof)
WHERE fof <> me
AND NOT (me)-[:FRIEND]->(fof)
RETURN DISTINCT fof.name AS recommendation
-- 最短路徑
MATCH path = shortestPath(
(alice:Person {name: 'Alice'})-[:FRIEND*]-(dave:Person {name: 'Dave'})
)
RETURN path, length(path) AS distance
-- 可變長度路徑(1 到 5 跳)
MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..5]->(connected:Person)
RETURN DISTINCT connected.name,
min(length(shortestPath((alice)-[:FRIEND*]-(connected)))) AS distance
ORDER BY distance
實戰一:社交網路建模
讓我們建一個稍微複雜的社交網路模型:
建立資料
// 建立人物
CREATE (alice:Person {name: 'Alice', age: 30, interests: ['coding', 'hiking']})
CREATE (bob:Person {name: 'Bob', age: 28, interests: ['coding', 'gaming']})
CREATE (carol:Person {name: 'Carol', age: 32, interests: ['design', 'hiking']})
CREATE (dave:Person {name: 'Dave', age: 25, interests: ['coding', 'music']})
CREATE (eve:Person {name: 'Eve', age: 29, interests: ['design', 'cooking']})
CREATE (frank:Person {name: 'Frank', age: 35, interests: ['gaming', 'music']})
// 建立公司
CREATE (techcorp:Company {name: 'TechCorp', industry: 'Technology'})
CREATE (designlab:Company {name: 'DesignLab', industry: 'Design'})
// 建立技能
CREATE (python:Skill {name: 'Python'})
CREATE (javascript:Skill {name: 'JavaScript'})
CREATE (figma:Skill {name: 'Figma'})
// 建立關係
CREATE (alice)-[:FRIEND {since: 2019}]->(bob)
CREATE (alice)-[:FRIEND {since: 2020}]->(carol)
CREATE (bob)-[:FRIEND {since: 2021}]->(dave)
CREATE (carol)-[:FRIEND {since: 2020}]->(eve)
CREATE (dave)-[:FRIEND {since: 2022}]->(frank)
CREATE (alice)-[:WORKS_AT {role: 'Backend Engineer', since: 2020}]->(techcorp)
CREATE (bob)-[:WORKS_AT {role: 'Frontend Engineer', since: 2021}]->(techcorp)
CREATE (carol)-[:WORKS_AT {role: 'UX Designer', since: 2019}]->(designlab)
CREATE (eve)-[:WORKS_AT {role: 'UI Designer', since: 2022}]->(designlab)
CREATE (alice)-[:HAS_SKILL {level: 'expert'}]->(python)
CREATE (bob)-[:HAS_SKILL {level: 'intermediate'}]->(javascript)
CREATE (carol)-[:HAS_SKILL {level: 'expert'}]->(figma)
CREATE (dave)-[:HAS_SKILL {level: 'beginner'}]->(python)
社交推薦查詢
// 推薦 Alice 可能認識的人(朋友的朋友)
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(f)-[:FRIEND]->(recommendation)
WHERE recommendation <> alice
AND NOT (alice)-[:FRIEND]->(recommendation)
RETURN recommendation.name,
COUNT(f) AS mutual_friends,
COLLECT(f.name) AS through
ORDER BY mutual_friends DESC
// 找出和 Alice 在同公司的人
MATCH (alice:Person {name: 'Alice'})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(colleague)
WHERE colleague <> alice
RETURN colleague.name, company.name
// 找出有相同技能的人
MATCH (alice:Person {name: 'Alice'})-[:HAS_SKILL]->(skill)<-[:HAS_SKILL]-(other)
WHERE other <> alice
RETURN other.name, skill.name, other.age
// 影響力分析:誰的朋友最多?
MATCH (p:Person)-[:FRIEND]-(friend)
RETURN p.name, COUNT(friend) AS friend_count
ORDER BY friend_count DESC
實戰二:知識圖譜建模
知識圖譜是圖資料庫另一個強大的應用場景。以一個技術知識圖譜為例:
建立知識圖譜
// 建立程式語言節點
CREATE (python:Language {name: 'Python', paradigm: 'multi', year: 1991})
CREATE (js:Language {name: 'JavaScript', paradigm: 'multi', year: 1995})
CREATE (rust:Language {name: 'Rust', paradigm: 'multi', year: 2010})
CREATE (go:Language {name: 'Go', paradigm: 'concurrent', year: 2009})
// 建立框架節點
CREATE (django:Framework {name: 'Django', type: 'web'})
CREATE (flask:Framework {name: 'Flask', type: 'web'})
CREATE (react:Framework {name: 'React', type: 'frontend'})
CREATE (fastapi:Framework {name: 'FastAPI', type: 'api'})
// 建立概念節點
CREATE (orm:Concept {name: 'ORM'})
CREATE (restapi:Concept {name: 'REST API'})
CREATE (async_prog:Concept {name: 'Async Programming'})
// 建立關係
CREATE (django)-[:WRITTEN_IN]->(python)
CREATE (flask)-[:WRITTEN_IN]->(python)
CREATE (fastapi)-[:WRITTEN_IN]->(python)
CREATE (react)-[:WRITTEN_IN]->(js)
CREATE (django)-[:IMPLEMENTS]->(orm)
CREATE (django)-[:IMPLEMENTS]->(restapi)
CREATE (fastapi)-[:IMPLEMENTS]->(restapi)
CREATE (fastapi)-[:IMPLEMENTS]->(async_prog)
CREATE (python)-[:INFLUENCED_BY]->(rust)
CREATE (go)-[:INFLUENCED_BY]->(rust)
CREATE (flask)-[:SIMILAR_TO {reason: 'lightweight web'}]->(fastapi)
知識查詢
// 找出 Python 生態系的所有框架
MATCH (f:Framework)-[:WRITTEN_IN]->(python:Language {name: 'Python'})
RETURN f.name, f.type
// 實作了 REST API 的所有框架,以及它們的語言
MATCH (f:Framework)-[:IMPLEMENTS]->(c:Concept {name: 'REST API'})
MATCH (f)-[:WRITTEN_IN]->(lang:Language)
RETURN f.name, lang.name
// 從一個概念出發,找出所有相關技術
MATCH (c:Concept {name: 'Async Programming'})<-[:IMPLEMENTS]-(f:Framework)
MATCH (f)-[:WRITTEN_IN]->(lang:Language)
RETURN c.name AS concept, f.name AS framework, lang.name AS language
// 推薦學習路徑:如果你會 Django,你可能也需要學...
MATCH (django:Framework {name: 'Django'})-[:WRITTEN_IN]->(lang)
MATCH (related:Framework)-[:WRITTEN_IN]->(lang)
WHERE related <> django
OPTIONAL MATCH (related)-[:IMPLEMENTS]->(concept)
RETURN related.name, COLLECT(DISTINCT concept.name) AS concepts
用 Python 操作 Neo4j
from neo4j import GraphDatabase
class SocialGraph:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def add_person(self, name, age, city):
with self.driver.session() as session:
session.run(
"CREATE (p:Person {name: $name, age: $age, city: $city})",
name=name, age=age, city=city
)
def add_friendship(self, name1, name2):
with self.driver.session() as session:
session.run("""
MATCH (a:Person {name: $name1})
MATCH (b:Person {name: $name2})
MERGE (a)-[:FRIEND]->(b)
""", name1=name1, name2=name2)
def find_friends_of_friends(self, name):
with self.driver.session() as session:
result = session.run("""
MATCH (me:Person {name: $name})-[:FRIEND]->(f)-[:FRIEND]->(fof)
WHERE fof <> me AND NOT (me)-[:FRIEND]->(fof)
RETURN DISTINCT fof.name AS name, COUNT(f) AS mutual
ORDER BY mutual DESC
""", name=name)
return [(record["name"], record["mutual"])
for record in result]
def shortest_path(self, name1, name2):
with self.driver.session() as session:
result = session.run("""
MATCH path = shortestPath(
(a:Person {name: $name1})-[:FRIEND*]-(b:Person {name: $name2})
)
RETURN [node IN nodes(path) | node.name] AS names,
length(path) AS distance
""", name1=name1, name2=name2)
record = result.single()
if record:
return record["names"], record["distance"]
return None, None
# 使用
graph = SocialGraph("bolt://localhost:7687", "neo4j", "secret123456")
graph.add_person("Alice", 30, "Taipei")
graph.add_person("Bob", 28, "Tokyo")
graph.add_friendship("Alice", "Bob")
recommendations = graph.find_friends_of_friends("Alice")
for name, mutual in recommendations:
print(f"{name} ({mutual} mutual friends)")
names, distance = graph.shortest_path("Alice", "Frank")
print(f"Path: {' -> '.join(names)}, Distance: {distance}")
graph.close()
什麼時候該用圖資料庫?
適合的場景
- 社交網路:好友推薦、影響力分析、社群偵測
- 知識圖譜:語意搜尋、實體關聯、問答系統
- 推薦系統:基於關聯的協同過濾
- 欺詐偵測:找出異常的交易網路模式
- 網路與 IT 管理:服務依賴關係、影響分析
不適合的場景
- 簡單的 CRUD 操作
- 大量聚合運算(這是 OLAP 資料庫的強項)
- 資料之間沒有明顯的關聯性
小結
圖資料庫代表了一種不同的思考方式——不再把資料硬塞進行和列的框架裡,而是用節點和關係來自然地描述世界。Neo4j 的 Cypher 查詢語言讓這種思維變得非常直覺,幾乎像在畫圖一樣寫查詢。
我的建議是:先用 Neo4j Browser(http://localhost:7474)視覺化地探索你的資料,等到你在白板上開始畫圓圈和箭頭的時候,就是該考慮圖資料庫的時候了。
延伸閱讀
- Neo4j 官方文件
- Cypher 語法速查表
- Graph Data Science Library
- Amazon Neptune — AWS 上的託管圖資料庫
- ArangoDB — 多模型資料庫(支援圖、文件、鍵值)