From RAG to RAGE
RAGE (Recursive, Agentic Graph Embeddings) is an attempt to turn “memory” from a similarity pipeline into a substrate: structured, traversable meaning that can keep provenance and scale intact.
If you’ve worked with modern retrieval stacks, you know the feeling: models get fluent fast, but the moment you try to do sustained work — building context over weeks, tracing why something is believed, navigating contradictions — they collapse back into flat chunk recall. RAGE is the architectural response: retrieval as navigation, not matching.
Similarity as gravity well
Similarity as gravity well
Most retrieval stacks optimize for one thing: similarity. It works — until it doesn’t. When “relevance” becomes the only objective, systems drift into context collapse: they reinforce priors, smooth out contradiction, and reduce inquiry to a narrow band of “things like what you already asked.”
That dynamic isn’t just a training-data problem. It’s an inference-time failure mode — a convergent system design. I unpack this more directly in Divergence Engines: why similarity-first retrieval becomes a gravity well, and what it takes to engineer useful difference.
RAGE is my earlier architectural response: build retrieval as navigation through multi-scale structure, so the system can move with you — across abstraction levels, across sessions, and across contradictory frames — instead of repeatedly snapping back to flat chunk recall.
Retrieval should feel like movement, not just matching.
The diagnosis: flat memory
The diagnosis: flat memory
Even when connected to documents or knowledge bases, most systems treat information as disconnected fragments. RAG does chunk → embed → top‑k similarity → answer. GraphRAG adds an entity graph, but many implementations still flatten meaning into names + co-occurrence links.
What’s missing is conceptual topology: the way ideas nest, support each other, contradict each other, and change character when you zoom out or dive in.
| Missing capability | Why it matters |
|---|---|
| Recursive depth | Concepts aren’t flat; they nest and refract across abstraction levels. |
| Adaptive traversal | Retrieval shouldn’t stop at top‑k; it should evolve as the path reveals structure. |
| Mode sensitivity | The same question asked in different stances should produce a different traversal. |
The RAGE approach
The RAGE approach
RAGE proposes a graph substrate where retrieval can move between scales (overview ↔ detail) without losing the path that got you there; traverse by relationship (up, down, sideways) — including agent-guided traversal strategies — instead of only by embedding proximity; keep provenance as a first-class constraint (why is something believed, and where did it come from?); and support divergence when needed: surfacing contradictions, showing adjacent frames, and keeping variance alive instead of collapsing inquiry into the nearest attractor basin.
If you want the deeper cognitive model behind “recursion, attractors, and closure,” see How We May Think We Think. This page stays focused on the architecture.
The core loop (recursive structure)
The core move is simple: apply the same processing steps at multiple levels of hierarchy, not just at “document” or “chunk.”
For each level (document → section → subsection → paragraph), RAGE embeds the unit, summarizes and re‑embeds it (a second representation at a different scale), connects it into broader ↔ narrower relationships, and cross‑links across levels and across documents — so the same idea can be found as a sentence, a section, or a whole document.
This is less about “remembering more” and more about enabling navigation: letting you land on the right scale, then change direction without losing context.
Agentic retrieval (the feedback loop)
RAGE treats a query as a signal — a glimpse into stance, scale, and intent — not a directive to fetch the nearest matches.
Instead of “answer and stop,” retrieval becomes a feedback loop (what most people would call agentic retrieval today): retrieve a first set of candidates, inspect what associations and paths got activated, decide whether to go deeper/broader/sideways — and, when the system detects premature closure, introduce productive friction (contradiction, counterexamples, adjacent frames) rather than smoothing it away.
This is where RAGE lines up with the argument in Divergence Engines: you don’t fix collapse by polishing similarity search. You fix it by giving retrieval another force besides “closest match.”
A query opens a path. The system should walk with you, not stand still.
Comparison: RAG vs GraphRAG vs RAGE
Comparison: RAG vs GraphRAG vs RAGE
| Capability | Traditional RAG | GraphRAG (typical) | RAGE (approach) |
|---|---|---|---|
| Core structure | Flat chunks + vector search | Entity graph + summaries | Multi‑scale graph with conceptual layering |
| Traversal | One‑shot top‑k | Static/path-based | Adaptive traversal (up/down/sideways) |
| Schema | Implicit (chunking) | Often predefined or shallow | Emergent + iteratively refined |
| Context | Session-bound | Partially persistent | Session-spanning, path-aware, provenance-aware |
| Failure mode | Premature closure via relevance | Entity flattening | Designed to keep inquiry alive longer |
What this enables (in practice)
What this enables (in practice)
| Capability | What it enables |
|---|---|
| Semantic zooming | Move between summary and detail without losing the route — like changing altitude without losing your trail. |
| Mode-aware retrieval | Different stances call for different paths: explanation, synthesis, critique, exploration, comparison. Treat “mode” as a retrieval primitive, not UX polish. |
| Long-horizon context | Not “memory as storage,” but memory as relevance over time: what stays connected, what becomes peripheral, what needs revisiting because it contradicts the current frame. |
Open questions
Open questions
RAGE is still a proposal shaped by building and debugging real systems. The hard part isn’t stating the idea — it’s making it robust. How do you detect premature closure without turning every query into infinite exploration? What are the right primitives (and metrics) for relevance vs reach in a living graph? How do you keep emergent structure legible to humans — not just traversable by models?
Those questions are exactly where Divergence Engines:
A Technical Framework for Surfacing Useful Differences goes deeper.
The goal isn’t to end the conversation faster. It’s to keep the right questions alive longer.