Migration

Graph DB — when it earns its keep

Scenario-grounded analysis of when a graph database (AGE / Neo4j / ArangoDB) pays off vs when plain Postgres + vector is enough, for the book editor + BlockData + agent products.

Status: decision input Date: 2026-05-30 Scope: v2 migration — DB tier decision


Where graph genuinely earns its keep

  1. Continuity checking across a book. "Every scene where Mira appears with Tomás after he learns the secret but before she does." That's traversing character → appears_in → scene + scene → state_at_time → knowledge_fact. SQL = nightmare of recursive CTEs. Graph = a few lines of Cypher.

  2. Timeline / causality reasoning. "What events did Mira witness before Chapter 12 that contradict her stated belief in Chapter 15?" Walking event → happens_before → event chains is exactly what graph DBs were built for.

  3. Story-bible knowledge state. "Who knows Mira's real name as of Chapter 14?" Each character's knowledge graph evolves scene by scene. Querying it as of a point in time is a graph traversal.

  4. Refactor impact. "If I change Mira's age in chapter 1, where else does the book reference her age?" One-hop is fine in SQL; the value compounds when the reference is indirect (mentioned via her son's age, via a date in her diary, etc.).

  5. Series / multi-book continuity. "What did this character know at the end of Book 1 that's still load-bearing in Book 3?" Graph wins decisively at this scale.

  6. Editorial agent reasoning. Continuity Editor agent walks fact-graphs to find contradictions; that's literally what graph DBs do. Cypher generation by LLM is well-studied.

  7. Lineage in BlockData. Which derived asset came from which sources via which transforms. Classic graph problem, useful for data products.

  8. Hybrid retrieval for chat-over-book. When asked "what does Marlowe think of his father?", walk character → relation → scenes AND vector-search the text. LightRAG-style hybrid retrieval — graph adds real lift here.


Where graph doesn't pay off

  • Pure text editing, line edits, grammar — vector + SQL is enough.
  • Per-block operations — SQL wins.
  • Short docs / academic papers (one POV, linear) — overkill.
  • Most "find X by attribute Y" queries — SQL is the right answer.
  • LobeHub-shaped agent runtimes, BlockData schemas, AGChain benchmarks — all flat enough for SQL.

The test

Does the product frequently need to answer questions that require walking 3+ relationship hops where each hop is semantically meaningful (not just a join)?

For the book editor with serious continuity reasoning across 250k–1M words: yes, graph pays off. That's the entire pitch of the "Book Intelligence Layer" in the locked plan — it requires those traversals to deliver the magical demo ("make Mira's voice more restrained across Chapters 4–9" requires walking voice profile → character → dialogue blocks → scene context).

For everything else (BlockData workbench surfaces, AGChain authoring, jsagent runtime): SQL + vector handles it.


V2 decision

Defer graph DB to a Phase 2 add-on. Ship V1 with plain Postgres + Vectorize.

When the graph DB does land, the two real options for Cloudflare Containers hosting are:

  • AGE on self-hosted Postgres — graph queries inside the same Postgres holding relational data; one DB. Smaller community.
  • ArangoDB on its own container — separate service, AQL, multi-model (graph + document + KV). Locked direction memory already names it as the AGE escape hatch.

Both require Cloudflare Containers; neither runs on Workers/D1. Neither is supported by Neon or Supabase-as-DB. That's the constraint that pins the host choice when graph lands.


Comparison to similar products

ProductDBGraph DB?
DocmostPostgres + RedisNo
BlockNotebring-your-own
NotesnookMongoDB + MinIO + identity / sync / SSE / monographNo
LobeHubPostgres via DrizzleNo

3 of 4 ship without a graph DB. None of these products attempt book-scale continuity reasoning — that's the differentiator that would justify reaching for one in our case, not in theirs.

On this page