From RAG to RAGE

19. November 2024

AIKnowledge ProcessingGraph Systems

RAGE (Recursive, Agentic Graph Embeddings) is an attempt to turn “memory” from a similarity pipeline into a substrate: structured, traversable meaning that can keep provenance and scale intact.

If you’ve worked with modern retrieval stacks, you know the feeling: models get fluent fast, but the moment you try to do sustained work — building context over weeks, tracing why something is believed, navigating contradictions — they collapse back into flat chunk recall. RAGE is the architectural response: retrieval as navigation, not matching.

Failed to load mediasrc: https://kqdcjvdzirlg4kan.public.blob.vercel-storage.com/content/concepts/2024-rage/published/website/images/cover.png

Update 2025

RAGE was written at the end of 2024 as an architectural approach. I spent the better part of 2025 implementing the core ideas as Recurse, a product surface built on top of what I’ve outlined here.

Similarity as gravity well

Most retrieval stacks optimize for one thing: similarity. It works — until it doesn’t. When “relevance” becomes the only objective, systems drift into context collapse: they reinforce priors, smooth out contradiction, and reduce inquiry to a narrow band of “things like what you already asked.”

That dynamic isn’t just a training-data problem. It’s an inference-time failure mode — a convergent system design. I unpack this more directly in Divergence Engines: why similarity-first retrieval becomes a gravity well, and what it takes to engineer useful difference.

RAGE is my earlier architectural response: build retrieval as navigation through multi-scale structure, so the system can move with you — across abstraction levels, across sessions, and across contradictory frames — instead of repeatedly snapping back to flat chunk recall.

Retrieval should feel like movement, not just matching.

The diagnosis: flat memory

Even when connected to documents or knowledge bases, most systems treat information as disconnected fragments. RAG does chunk → embed → top‑k similarity → answer. GraphRAG adds an entity graph, but many implementations still flatten meaning into names + co-occurrence links.

What’s missing is conceptual topology: the way ideas nest, support each other, contradict each other, and change character when you zoom out or dive in.

Missing capability	Why it matters
Recursive depth	Concepts aren’t flat; they nest and refract across abstraction levels.
Adaptive traversal	Retrieval shouldn’t stop at top‑k; it should evolve as the path reveals structure.
Mode sensitivity	The same question asked in different stances should produce a different traversal.

The RAGE approach

RAGE proposes a graph substrate where retrieval can move between scales (overview ↔ detail) without losing the path that got you there; traverse by relationship (up, down, sideways) — including agent-guided traversal strategies — instead of only by embedding proximity; keep provenance as a first-class constraint (why is something believed, and where did it come from?); and support divergence when needed: surfacing contradictions, showing adjacent frames, and keeping variance alive instead of collapsing inquiry into the nearest attractor basin.

If you want the deeper cognitive model behind “recursion, attractors, and closure,” see How We May Think We Think. This page stays focused on the architecture.

The core loop (recursive structure)

The core move is simple: apply the same processing steps at multiple levels of hierarchy, not just at “document” or “chunk.”

For each level (document → section → subsection → paragraph), RAGE embeds the unit, summarizes and re‑embeds it (a second representation at a different scale), connects it into broader ↔ narrower relationships, and cross‑links across levels and across documents — so the same idea can be found as a sentence, a section, or a whole document.

This is less about “remembering more” and more about enabling navigation: letting you land on the right scale, then change direction without losing context.

Agentic retrieval (the feedback loop)

RAGE treats a query as a signal — a glimpse into stance, scale, and intent — not a directive to fetch the nearest matches.

Instead of “answer and stop,” retrieval becomes a feedback loop (what most people would call agentic retrieval today): retrieve a first set of candidates, inspect what associations and paths got activated, decide whether to go deeper/broader/sideways — and, when the system detects premature closure, introduce productive friction (contradiction, counterexamples, adjacent frames) rather than smoothing it away.

This is where RAGE lines up with the argument in Divergence Engines: you don’t fix collapse by polishing similarity search. You fix it by giving retrieval another force besides “closest match.”

A query opens a path. The system should walk with you, not stand still.

Comparison: RAG vs GraphRAG vs RAGE

Capability	Traditional RAG	GraphRAG (typical)	RAGE (approach)
Core structure	Flat chunks + vector search	Entity graph + summaries	Multi‑scale graph with conceptual layering
Traversal	One‑shot top‑k	Static/path-based	Adaptive traversal (up/down/sideways)
Schema	Implicit (chunking)	Often predefined or shallow	Emergent + iteratively refined
Context	Session-bound	Partially persistent	Session-spanning, path-aware, provenance-aware
Failure mode	Premature closure via relevance	Entity flattening	Designed to keep inquiry alive longer

What this enables (in practice)

Capability	What it enables
Semantic zooming	Move between summary and detail without losing the route — like changing altitude without losing your trail.
Mode-aware retrieval	Different stances call for different paths: explanation, synthesis, critique, exploration, comparison. Treat “mode” as a retrieval primitive, not UX polish.
Long-horizon context	Not “memory as storage,” but memory as relevance over time: what stays connected, what becomes peripheral, what needs revisiting because it contradicts the current frame.

Open questions

RAGE is still a proposal shaped by building and debugging real systems. The hard part isn’t stating the idea — it’s making it robust. How do you detect premature closure without turning every query into infinite exploration? What are the right primitives (and metrics) for relevance vs reach in a living graph? How do you keep emergent structure legible to humans — not just traversable by models?

Those questions are exactly where Divergence Engines:
A Technical Framework for Surfacing Useful Differences goes deeper.

The goal isn’t to end the conversation faster. It’s to keep the right questions alive longer.