---
title: Recall pipeline
description: How Memoria retrieves memories — dense vectors, BM25 keyword search, and graph traversal, fused and re-ranked.
order: 6
---

# The recall pipeline

When you call `recall()` (via SDK, MCP, or REST), Memoria runs three retrieval signals in parallel, fuses them, and re-ranks the top candidates with a cross-encoder. The whole pipeline targets sub-second p50 latency.

## The three signals

### 1. Dense (vector similarity)

Each edge's `factText` is embedded once at write time using a state-of-the-art embedding model. At recall time, Memoria embeds your query with the same model and runs an approximate nearest-neighbour search over the brain's edge embeddings.

**Strength:** semantic matching. "Where does Stefan live?" finds "Stefan is based in Stockholm" even without keyword overlap.

**Weakness:** struggles with proper nouns, identifiers, and rare terms that don't ground well in the embedding space.

### 2. Sparse (BM25 keyword)

Edges are also indexed in a managed full-text search index on `factText` and `relationType`. BM25 ranks by traditional term-frequency / inverse-document-frequency scoring.

**Strength:** exact-match retrieval. Looking up an order ID, a project name, or a specific API endpoint — sparse search nails it.

**Weakness:** doesn't understand synonyms or paraphrases.

### 3. Graph (Personalized PageRank)

Memoria extracts entities from your query, resolves them to seeds in the per-brain knowledge graph, and runs Personalized PageRank from those seeds. Top-ranked entities' edges are returned.

**Strength:** multi-hop reasoning. "What does my Tuesday project depend on?" can find facts about a project even when the query doesn't name it directly.

**Weakness:** needs entity grounding to work — pure free-text queries with no extractable entities skip this signal.

## Fusion

Each signal returns a ranked list. Memoria fuses them via [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf):

```
score(edge) = Σ over signals: 1 / (k + rank_in_signal)
```

`k = 60` (the standard RRF constant). The fused list is naturally robust to scale mismatches between signals — no need to normalise scores per source.

## Re-ranking

The top 30 fused candidates go to a cross-encoder re-ranker, which scores each `(query, edge)` pair directly. The top 10 after re-ranking are the final result.

Why re-rank? RRF is great at combining signals but can't distinguish two edges that all three signals returned at similar ranks. A cross-encoder reads the query and edge together and produces a much sharper relevance score.

## Context construction

The top edges and their source episodes are packed into a `context` string ready to drop into your agent's prompt. The format is stable and includes:

- The fact text
- The relation type
- Event-time validity (`tValid` to `tInvalid`)
- A citation reference back to the source episode

## Bi-temporal filtering

If you pass `asOf: '2024-06-01'`, every signal applies a bi-temporal filter before scoring:

```
edge.tValid ≤ asOf AND (edge.tInvalid IS NULL OR edge.tInvalid > asOf)
edge.tIngested ≤ asOf
```

This gives you "the world as the agent knew it" on a specific date. Without `asOf`, the filter uses "now."

## Tuning

A few `recall()` parameters you may want to tweak:

| Parameter | Default | What it does |
|-----------|---------|--------------|
| `limit` | 10 | Number of edges to return after re-ranking. |
| `asOf` | now | Event-time perspective for time-travel queries. |
| `entityHints` | `[]` | Force-include these entities as seeds for the graph signal. Useful when the agent already knows what the query is about. |

## Cost shape

A typical `recall()` call:

- 1 query embedding
- 1 dense nearest-neighbour lookup
- 1 sparse keyword search
- 1 PPR computation over the cached adjacency list (graph)
- 1 cross-encoder rerank batch (top 30 candidates)
- Total: ~150–400 ms p50, all calls in parallel where independent

See [Write pipeline](/docs/write-pipeline) for the ingestion side.
