Graph DB — when it earns its keep

Status: decision input Date: 2026-05-30 Scope: v2 migration — DB tier decision

Where graph genuinely earns its keep

Continuity checking across a book. "Every scene where Mira appears with Tomás after he learns the secret but before she does." That's traversing character → appears_in → scene + scene → state_at_time → knowledge_fact. SQL = nightmare of recursive CTEs. Graph = a few lines of Cypher.
Timeline / causality reasoning. "What events did Mira witness before Chapter 12 that contradict her stated belief in Chapter 15?" Walking event → happens_before → event chains is exactly what graph DBs were built for.
Story-bible knowledge state. "Who knows Mira's real name as of Chapter 14?" Each character's knowledge graph evolves scene by scene. Querying it as of a point in time is a graph traversal.
Refactor impact. "If I change Mira's age in chapter 1, where else does the book reference her age?" One-hop is fine in SQL; the value compounds when the reference is indirect (mentioned via her son's age, via a date in her diary, etc.).
Series / multi-book continuity. "What did this character know at the end of Book 1 that's still load-bearing in Book 3?" Graph wins decisively at this scale.
Editorial agent reasoning. Continuity Editor agent walks fact-graphs to find contradictions; that's literally what graph DBs do. Cypher generation by LLM is well-studied.
Lineage in BlockData. Which derived asset came from which sources via which transforms. Classic graph problem, useful for data products.
Hybrid retrieval for chat-over-book. When asked "what does Marlowe think of his father?", walk character → relation → scenes AND vector-search the text. LightRAG-style hybrid retrieval — graph adds real lift here.

Where graph doesn't pay off

Pure text editing, line edits, grammar — vector + SQL is enough.
Per-block operations — SQL wins.
Short docs / academic papers (one POV, linear) — overkill.
Most "find X by attribute Y" queries — SQL is the right answer.
LobeHub-shaped agent runtimes, BlockData schemas, AGChain benchmarks — all flat enough for SQL.

The test

Does the product frequently need to answer questions that require walking 3+ relationship hops where each hop is semantically meaningful (not just a join)?

For the book editor with serious continuity reasoning across 250k–1M words: yes, graph pays off. That's the entire pitch of the "Book Intelligence Layer" in the locked plan — it requires those traversals to deliver the magical demo ("make Mira's voice more restrained across Chapters 4–9" requires walking voice profile → character → dialogue blocks → scene context).

For everything else (BlockData workbench surfaces, AGChain authoring, jsagent runtime): SQL + vector handles it.

V2 decision

Defer graph DB to a Phase 2 add-on. Ship V1 with plain Postgres + Vectorize.

When the graph DB does land, the two real options for Cloudflare Containers hosting are:

AGE on self-hosted Postgres — graph queries inside the same Postgres holding relational data; one DB. Smaller community.
ArangoDB on its own container — separate service, AQL, multi-model (graph + document + KV). Locked direction memory already names it as the AGE escape hatch.

Both require Cloudflare Containers; neither runs on Workers/D1. Neither is supported by Neon or Supabase-as-DB. That's the constraint that pins the host choice when graph lands.

Comparison to similar products

Product	DB	Graph DB?
Docmost	Postgres + Redis	No
BlockNote	bring-your-own	—
Notesnook	MongoDB + MinIO + identity / sync / SSE / monograph	No
LobeHub	Postgres via Drizzle	No

3 of 4 ship without a graph DB. None of these products attempt book-scale continuity reasoning — that's the differentiator that would justify reaching for one in our case, not in theirs.

Where graph genuinely earns its keep

Where graph doesn't pay off

The test

V2 decision

Comparison to similar products

On this page