The recall pipeline
When you call recall() (via SDK, MCP, or REST), Memoria runs three retrieval signals in parallel, fuses them, and re-ranks the top candidates with a cross-encoder. The whole pipeline targets sub-second p50 latency.
The three signals
1. Dense (vector similarity)
Each edge's factText is embedded once at write time using a state-of-the-art embedding model. At recall time, Memoria embeds your query with the same model and runs an approximate nearest-neighbour search over the brain's edge embeddings.
Strength: semantic matching. "Where does Stefan live?" finds "Stefan is based in Stockholm" even without keyword overlap.
Weakness: struggles with proper nouns, identifiers, and rare terms that don't ground well in the embedding space.
2. Sparse (BM25 keyword)
Edges are also indexed in a managed full-text search index on factText and relationType. BM25 ranks by traditional term-frequency / inverse-document-frequency scoring.
Strength: exact-match retrieval. Looking up an order ID, a project name, or a specific API endpoint — sparse search nails it.
Weakness: doesn't understand synonyms or paraphrases.
3. Graph (Personalized PageRank)
Memoria extracts entities from your query, resolves them to seeds in the per-brain knowledge graph, and runs Personalized PageRank from those seeds. Top-ranked entities' edges are returned.
Strength: multi-hop reasoning. "What does my Tuesday project depend on?" can find facts about a project even when the query doesn't name it directly.
Weakness: needs entity grounding to work — pure free-text queries with no extractable entities skip this signal.
Fusion
Each signal returns a ranked list. Memoria fuses them via Reciprocal Rank Fusion (RRF):
score(edge) = Σ over signals: 1 / (k + rank_in_signal)
k = 60 (the standard RRF constant). The fused list is naturally robust to scale mismatches between signals — no need to normalise scores per source.
Re-ranking
The top 30 fused candidates go to a cross-encoder re-ranker, which scores each (query, edge) pair directly. The top 10 after re-ranking are the final result.
Why re-rank? RRF is great at combining signals but can't distinguish two edges that all three signals returned at similar ranks. A cross-encoder reads the query and edge together and produces a much sharper relevance score.
Context construction
The top edges and their source episodes are packed into a context string ready to drop into your agent's prompt. The format is stable and includes:
- The fact text
- The relation type
- Event-time validity (
tValidtotInvalid) - A citation reference back to the source episode
Bi-temporal filtering
If you pass asOf: '2024-06-01', every signal applies a bi-temporal filter before scoring:
edge.tValid ≤ asOf AND (edge.tInvalid IS NULL OR edge.tInvalid > asOf)
edge.tIngested ≤ asOf
This gives you "the world as the agent knew it" on a specific date. Without asOf, the filter uses "now."
Tuning
A few recall() parameters you may want to tweak:
| Parameter | Default | What it does |
|---|---|---|
limit | 10 | Number of edges to return after re-ranking. |
asOf | now | Event-time perspective for time-travel queries. |
entityHints | [] | Force-include these entities as seeds for the graph signal. Useful when the agent already knows what the query is about. |
Cost shape
A typical recall() call:
- 1 query embedding
- 1 dense nearest-neighbour lookup
- 1 sparse keyword search
- 1 PPR computation over the cached adjacency list (graph)
- 1 cross-encoder rerank batch (top 30 candidates)
- Total: ~150–400 ms p50, all calls in parallel where independent
See Write pipeline for the ingestion side.