Graph DB — when it earns its keep
Scenario-grounded analysis of when a graph database (AGE / Neo4j / ArangoDB) pays off vs when plain Postgres + vector is enough, for the book editor + BlockData + agent products.
Status: decision input Date: 2026-05-30 Scope: v2 migration — DB tier decision
Where graph genuinely earns its keep
-
Continuity checking across a book. "Every scene where Mira appears with Tomás after he learns the secret but before she does." That's traversing
character → appears_in → scene + scene → state_at_time → knowledge_fact. SQL = nightmare of recursive CTEs. Graph = a few lines of Cypher. -
Timeline / causality reasoning. "What events did Mira witness before Chapter 12 that contradict her stated belief in Chapter 15?" Walking
event → happens_before → eventchains is exactly what graph DBs were built for. -
Story-bible knowledge state. "Who knows Mira's real name as of Chapter 14?" Each character's knowledge graph evolves scene by scene. Querying it as of a point in time is a graph traversal.
-
Refactor impact. "If I change Mira's age in chapter 1, where else does the book reference her age?" One-hop is fine in SQL; the value compounds when the reference is indirect (mentioned via her son's age, via a date in her diary, etc.).
-
Series / multi-book continuity. "What did this character know at the end of Book 1 that's still load-bearing in Book 3?" Graph wins decisively at this scale.
-
Editorial agent reasoning. Continuity Editor agent walks fact-graphs to find contradictions; that's literally what graph DBs do. Cypher generation by LLM is well-studied.
-
Lineage in BlockData. Which derived asset came from which sources via which transforms. Classic graph problem, useful for data products.
-
Hybrid retrieval for chat-over-book. When asked "what does Marlowe think of his father?", walk
character → relation → scenesAND vector-search the text. LightRAG-style hybrid retrieval — graph adds real lift here.
Where graph doesn't pay off
- Pure text editing, line edits, grammar — vector + SQL is enough.
- Per-block operations — SQL wins.
- Short docs / academic papers (one POV, linear) — overkill.
- Most "find X by attribute Y" queries — SQL is the right answer.
- LobeHub-shaped agent runtimes, BlockData schemas, AGChain benchmarks — all flat enough for SQL.
The test
Does the product frequently need to answer questions that require walking 3+ relationship hops where each hop is semantically meaningful (not just a join)?
For the book editor with serious continuity reasoning across 250k–1M words: yes, graph pays off. That's the entire pitch of the "Book Intelligence Layer" in the locked plan — it requires those traversals to deliver the magical demo ("make Mira's voice more restrained across Chapters 4–9" requires walking voice profile → character → dialogue blocks → scene context).
For everything else (BlockData workbench surfaces, AGChain authoring, jsagent runtime): SQL + vector handles it.
V2 decision
Defer graph DB to a Phase 2 add-on. Ship V1 with plain Postgres + Vectorize.
When the graph DB does land, the two real options for Cloudflare Containers hosting are:
- AGE on self-hosted Postgres — graph queries inside the same Postgres holding relational data; one DB. Smaller community.
- ArangoDB on its own container — separate service, AQL, multi-model (graph + document + KV). Locked direction memory already names it as the AGE escape hatch.
Both require Cloudflare Containers; neither runs on Workers/D1. Neither is supported by Neon or Supabase-as-DB. That's the constraint that pins the host choice when graph lands.
Comparison to similar products
| Product | DB | Graph DB? |
|---|---|---|
| Docmost | Postgres + Redis | No |
| BlockNote | bring-your-own | — |
| Notesnook | MongoDB + MinIO + identity / sync / SSE / monograph | No |
| LobeHub | Postgres via Drizzle | No |
3 of 4 ship without a graph DB. None of these products attempt book-scale continuity reasoning — that's the differentiator that would justify reaching for one in our case, not in theirs.
Non-auth inventory (file by file)
Every code file in the non-auth surface, traced front-to-back, page by page, with backend owners and entanglement flags.
V2 direction — summary of locked and leaning-locked decisions
Consolidated direction from the 2026-05-30 strategy pass. What survived. What's open. No drift, no theorizing.