Overview

Enrichment is the AI-powered process that transforms raw session events into structured, verified memory items. It is the core intelligence layer of Contox, responsible for extracting factual claims from developer activity and organizing them into a usable knowledge base.

AI model selection

The default AI model is determined by your subscription plan. All paid plans can override the model per-project in Project Settings > AI Model.

Plan	Default	Override	Characteristics
Free	Small	Not available	Faster processing, lower token cost
Personal	Small	Small or Medium	Choose based on project needs
Team	Medium	Small or Medium	Higher quality by default
Business	Medium	Small or Medium	Same as Team
Enterprise	Medium	Small or Medium	Same as Team

Higher-tier models produce more accurate memory items with better confidence scores, but consume more credits. Free plan users see the model options but must upgrade to Personal or higher to change them.

Evidence indexing

Before AI processing begins, the enrichment pipeline indexes all evidence from the session events:

Event scanning -- All events in the session are scanned for content
Diff extraction -- Code diffs are extracted (up to 3000 characters per commit)
Summary collection -- Session summaries and change descriptions are gathered
Evidence truncation -- Large evidence blocks are truncated to 2000 characters to stay within model context limits

This indexed evidence is provided to the AI alongside the raw events.

Pre-classification

Events are pre-classified before AI processing to optimize extraction:

Events are tagged with likely categories (architecture, implementation, bug, etc.)
Related events are grouped together
Metadata (file paths, timestamps, authors) is extracted and structured

This pre-processing helps the AI model focus on extraction rather than organization.

Event chunking

Events are processed in chunks of 10. Each chunk is sent to the AI model as a single request. Chunking provides several benefits:

Keeps context window usage manageable
Allows parallel processing of independent chunks
Provides natural failure boundaries (a failed chunk does not block others)

V16 JSON schema

The AI model outputs memory items conforming to the V16 JSON schema. This schema defines:

json

{
  "items": [
    {
      "title": "string - Short descriptive title",
      "content": "string - Detailed markdown content",
      "type": "string - architecture|convention|implementation|decision|bug|todo",
      "confidence": "number - 0 to 1",
      "files": ["string - Related file paths"],
      "quote": "string - Source quote from evidence",
      "schemaKey": "string - Brain hierarchy path"
    }
  ]
}

Each field is validated against the schema. Malformed items are rejected.

Quote verification

Quote verification is a critical quality control step that rejects hallucinated memory items:

Every extracted item must include a quote field referencing specific evidence
The system programmatically verifies that the quote exists in the source evidence
Items with fabricated or inaccurate quotes are rejected
This prevents the AI from inventing facts not supported by actual session data

This mechanism significantly improves memory accuracy by ensuring every item is grounded in real evidence.

Context-aware processing

Enrichment is context-aware. The AI model receives the existing project brain as additional input:

Avoids duplication -- The model knows what already exists and avoids extracting redundant items
Maintains consistency -- New items are consistent with established conventions and architecture
Enables updates -- The model can identify when new evidence updates or supersedes existing items
Preserves structure -- Items are assigned schema keys that fit the existing brain hierarchy

Pipeline stages

After AI extraction, items pass through additional stages:

Embedding

Each extracted item is converted into a vector embedding for semantic search. These embeddings power context packs and the relevant-scope brain assembly.

Deduplication

New items are compared against existing items using both semantic similarity (embeddings) and structural matching (schema keys, titles). Duplicates are merged, with the higher-confidence version taking precedence.

Drift check

Existing brain items are evaluated against new evidence. If a new session contradicts an existing item (e.g., a technology was replaced, a convention changed), the existing item is flagged for review or automatically deprecated.

Monitoring enrichment in the Dashboard

Track enrichment progress in real time from the Memory page:

Switch to the Sessions tab to see all sessions for the current project
Find the session being enriched — its status updates live (active, enriching, completed, failed)
Click the session to open the detail view:
- Pipeline Timeline — A visual progress indicator showing the current stage (Enrich → Embed → Dedup → Drift Check) with duration for each completed stage
- Events tab — Browse all raw events captured during the session (save, scan, git digest, etc.)
- Jobs tab — Full job history with per-stage timing, item counts, and error details

Pipeline Progress2/4 steps

Enrich12s3,240 tokens

Embed4s

Dedup

Drift Check

Once enrichment completes, switch to the Brain tab to see the newly extracted memory items

Retrying failed enrichment

If a stage fails, click Retry on the failed job. The pipeline restarts from the failed stage, preserving progress from earlier stages. Common failure causes:

Temporary LLM API errors or rate limiting
Network timeouts (mitigated by automatic retry with direct Mistral fallback)
Malformed evidence data

Configuring the AI model

To change the enrichment model for a project:

Open the Projects page
Click the gear icon on the project card
In the General tab, find the AI Model section
Select Small (fast, cost-effective) or Medium (higher quality)
Save changes — the next enrichment run will use the selected model

Info

Free plan users see the model options but must upgrade to Personal or higher to change them. Setting the model to "Default" uses your plan's default (Small for Personal, Medium for Team+).

Deep Enrichment (GitHub)

When a project has a linked GitHub repository, the pipeline adds a deep enrichment stage between Enrich and Embed. Instead of working only from commit messages and session summaries, deep enrichment reads the actual source files to produce implementation-level memory.

How it works

The pipeline collects file paths from newly created memory items, prioritized by importance score
Source files are fetched from GitHub at the session's headCommitSha (max 20 files, 500 KB total)
Pass 1 (Extract) -- Devstral Small analyzes each file and extracts structured facts by category:
- Routes: HTTP method, path, params, Zod validation schema, response shape, middleware
- Models/Schemas: field names, types, constraints, relations, indexes
- Functions: signatures with types, logic steps, error cases, async behavior
- Auth: flow description, token format, permission checks
- Config: env vars, external services, feature flags
Pass 2 (Compile) -- Facts are compiled into concise markdown (max 3,500 chars) and written to the item's facts field

Setup

Open Project Settings > Repository tab
Connect via GitHub OAuth (recommended) or paste a Personal Access Token with repo scope
Select the repository
Ensure the Deep Enrichment toggle is enabled (on by default)

Resource limits

Deep enrichment enforces strict limits to control costs and latency:

Limit	Value
Files per session	20
Per-file size	100 KB
Total fetch budget	500 KB
Concurrent requests	5
File types	Code, config, docs only (`.ts`, `.py`, `.go`, `.rs`, `.json`, `.yaml`, `.sql`, etc.)

Tip

Deep enrichment degrades gracefully -- if a file fetch fails or the repo is unreachable, the item keeps its original facts from the standard enrichment stage. No data is lost.

Auto-Enrichment

By default, enrichment is triggered automatically when the VS Code extension captures git commits. This is controlled by the Auto-Learn from commits toggle in Project Settings > General.

When enabled:

Every commit captured by VS Code triggers the full enrichment pipeline
Memory items are created and enriched without manual intervention
Credits are consumed automatically

When disabled:

Commits are still captured and stored as raw events
You must manually click Generate Memory on a session to trigger enrichment
No credits are consumed until you choose to enrich

Tip

If you're concerned about credit usage, disable auto-enrichment and manually enrich only the sessions that matter. You can always re-enable it later.

Staleness Detection

After each enrichment run, Contox automatically checks whether new commits have made existing memory items outdated. This is a zero-cost check (no LLM calls) that works by comparing modified file paths:

Files modified in the current commit are collected
Existing memory items referencing those files are identified
Items with 2+ referenced files modified are flagged as review
A problem document is created linking the item to the commit

Flagged items appear in the Review tab where you can:

Accept -- The item is still accurate despite the file changes
Edit -- Update the item's content to reflect the changes
Archive -- The item is outdated and should be removed from the brain

Next steps

V2 Pipeline -- The full event-to-memory flow
Hygiene Agent -- Post-enrichment cleanup
V2 Sessions API -- Trigger enrichment programmatically

V2 Pipeline

Hygiene Agent

On this page