Skip to content

Overview

The V2 pipeline is the core data flow in Contox. It captures raw events from your AI coding tools, stores them, and processes them through an AI-powered enrichment pipeline to extract structured memory items.

Pipeline diagram

mermaid
sequenceDiagram
    participant Tool as MCP / CLI / VSCode
    participant Ingest as POST /api/v2/ingest
    participant Store as Event Store
    participant Blob as Blob Storage
    participant Session as Session Manager
    participant User as Dashboard User
    participant Enrich as Enrichment Pipeline
    participant Brain as Project Brain

    Tool->>Ingest: Send event (HMAC signed)
    Ingest->>Store: Store raw event
    Ingest->>Blob: Upload blobs (diffs, content)
    Ingest->>Session: Create or reuse session (4h window)
    Ingest-->>Tool: 202 Accepted (eventId, sessionId)

    Note over Session: Events accumulate in session

    User->>Enrich: Click "Generate Memory"
    Enrich->>Store: Fetch session events
    Enrich->>Enrich: Chunk events (10 per chunk)
    Enrich->>Enrich: AI extracts memory items
    Enrich->>Enrich: Quote verification
    Enrich->>Enrich: Dedup against existing items
    Enrich->>Brain: Store approved items
    Enrich-->>User: Enrichment complete

    Note over Brain: Brain document updated

    Tool->>Brain: GET /api/v2/brain
    Brain-->>Tool: Assembled markdown + metadata

Stage 1: Event capture

Events flow into Contox from three client-side sources:

SourceTransportEvents
MCP serverV2 ingest (HMAC)Session saves, context updates, memory operations
CLIV2 ingest (HMAC)Session saves, scan results, git digests
VS Code extensionV2 ingest (HMAC)Session saves, file changes, git activity

All events are sent to POST /api/v2/ingest using HMAC-SHA256 authentication. See HMAC Signing for details.

Event structure

json
{
  "event": "session_save",
  "payload": {
    "summary": "Implemented JWT authentication",
    "changes": [
      {
        "category": "implementation",
        "title": "JWT auth middleware",
        "content": "Added auth middleware at src/middleware/auth.ts"
      }
    ]
  }
}

Stage 2: Storage and session management

When an event is ingested:

  1. Raw event stored -- The complete event payload is persisted immediately
  2. Blobs uploaded -- Large data (diffs, file contents) is stored in blob storage
  3. Session association -- The event is linked to an existing session or a new session is created

Session windowing

Sessions use a 4-hour window. If an event arrives within 4 hours of the last event in an active session, it is added to that session. Otherwise, a new session is created. This groups related work together naturally.

Stage 3: Enrichment

Enrichment is user-triggered. When you click Generate Memory in the dashboard (or call POST /api/v2/sessions/[id]/enrich), the pipeline begins:

3a. Chunking

Events are grouped into chunks of 10 for processing. This keeps each AI call focused and manageable.

3b. AI extraction

Each chunk is processed by an AI model that extracts structured memory items. The model receives:

  • The event data (summaries, changes, diffs)
  • The existing project brain (for context)
  • A V16 JSON schema defining the expected output format

The AI model tier depends on your plan:

PlanModel
Free / PersonalSmall
Team / Business / EnterpriseMedium

3c. Quote verification

Every extracted item is verified against the source evidence. The system checks that claims made in memory items can be traced back to actual content in the events. Items with hallucinated quotes are rejected. This ensures memory quality.

3d. Deduplication

New items are compared against existing memory items to detect duplicates. Duplicate items are merged, preserving the most confident and most recent information.

3e. Drift check

Existing brain items are checked for consistency with new evidence. If new events contradict an existing memory item, the item is flagged for review.

Stage 4: Brain assembly

After enrichment, approved items are assembled into the project brain document. The brain is served via GET /api/v2/brain with ETag caching and token budgeting.

See Brain Assembly for the complete assembly process.

Monitoring the pipeline

Track pipeline progress in the dashboard:

  1. Go to Sessions
  2. Click the session being enriched
  3. View the Jobs tab for stage-by-stage progress

Each stage shows its status (queued, processing, completed, failed) and duration.

Next steps