The recall pipeline

When you call recall() (via SDK, MCP, or REST), Memoria runs three retrieval signals in parallel, fuses them, and re-ranks the top candidates with a cross-encoder. The whole pipeline targets sub-second p50 latency.

The three signals

1. Dense (vector similarity)

Each edge's factText is embedded once at write time using a state-of-the-art embedding model. At recall time, Memoria embeds your query with the same model and runs an approximate nearest-neighbour search over the brain's edge embeddings.

Strength: semantic matching. "Where does Stefan live?" finds "Stefan is based in Stockholm" even without keyword overlap.

Weakness: struggles with proper nouns, identifiers, and rare terms that don't ground well in the embedding space.

2. Sparse (BM25 keyword)

Edges are also indexed in a managed full-text search index on factText and relationType. BM25 ranks by traditional term-frequency / inverse-document-frequency scoring.

Strength: exact-match retrieval. Looking up an order ID, a project name, or a specific API endpoint — sparse search nails it.

Weakness: doesn't understand synonyms or paraphrases.

3. Graph (Personalized PageRank)

Memoria extracts entities from your query, resolves them to seeds in the per-brain knowledge graph, and runs Personalized PageRank from those seeds. Top-ranked entities' edges are returned.

Strength: multi-hop reasoning. "What does my Tuesday project depend on?" can find facts about a project even when the query doesn't name it directly.

Weakness: needs entity grounding to work — pure free-text queries with no extractable entities skip this signal.

Fusion

Each signal returns a ranked list. Memoria fuses them via Reciprocal Rank Fusion (RRF):

score(edge) = Σ over signals: 1 / (k + rank_in_signal)

k = 60 (the standard RRF constant). The fused list is naturally robust to scale mismatches between signals — no need to normalise scores per source.

Re-ranking

The top 30 fused candidates go to a cross-encoder re-ranker, which scores each (query, edge) pair directly. The top 10 after re-ranking are the final result.

Why re-rank? RRF is great at combining signals but can't distinguish two edges that all three signals returned at similar ranks. A cross-encoder reads the query and edge together and produces a much sharper relevance score.

Context construction

The top edges and their source episodes are packed into a context string ready to drop into your agent's prompt. The format is stable and includes:

  • The fact text
  • The relation type
  • Event-time validity (tValid to tInvalid)
  • A citation reference back to the source episode

Bi-temporal filtering

If you pass asOf: '2024-06-01', every signal applies a bi-temporal filter before scoring:

edge.tValid ≤ asOf AND (edge.tInvalid IS NULL OR edge.tInvalid > asOf)
edge.tIngested ≤ asOf

This gives you "the world as the agent knew it" on a specific date. Without asOf, the filter uses "now."

Tuning

A few recall() parameters you may want to tweak:

ParameterDefaultWhat it does
limit10Number of edges to return after re-ranking.
asOfnowEvent-time perspective for time-travel queries.
entityHints[]Force-include these entities as seeds for the graph signal. Useful when the agent already knows what the query is about.

Cost shape

A typical recall() call:

  • 1 query embedding
  • 1 dense nearest-neighbour lookup
  • 1 sparse keyword search
  • 1 PPR computation over the cached adjacency list (graph)
  • 1 cross-encoder rerank batch (top 30 candidates)
  • Total: ~150–400 ms p50, all calls in parallel where independent

See Write pipeline for the ingestion side.