Skip to content

Overview

The git digest feature reads your repository's commit history and provides structured evidence for the enrichment pipeline. It tracks progress using SHA-based ranges, ensuring each digest captures only new commits since the last session.

How it works

The git digest reads commits from your local git repository and packages them as evidence for enrichment:

  1. Determine range -- Find the starting SHA from the last session's sourceRef field
  2. Read commits -- Walk the commit log from the starting SHA to HEAD
  3. Extract diffs -- Capture code changes for each commit (up to 3000 characters per commit)
  4. Package evidence -- Structure the data for the enrichment pipeline

SHA-based range tracking

Instead of using dates (which can be unreliable across timezones and clock drift), git digest uses SHA-based ranges:

  • The base SHA is read from the last session entry's sourceRef field
  • The head SHA is the current HEAD of the repository
  • Only commits between these two SHAs are included

This ensures:

  • No commits are missed between sessions
  • No commits are processed twice
  • The digest is deterministic and reproducible

If no previous session exists, the digest captures the most recent commits up to the configured limit.

Running git digest

Via CLI

bash
contox git-digest

Via MCP

Use the contox_git_digest tool:

contox_git_digest(directory: "/path/to/repo")

Options

OptionDefaultDescription
directoryCurrent directoryPath to the git repository root
limit20Maximum number of commits to return
modefirst-parentCommit traversal mode

Traversal modes

first-parent (default)

Follows only the first parent of merge commits. This produces a clean "shipping journal" showing the main branch history without merge noise.

bash
contox git-digest --mode first-parent

all

Includes all commits, including those within merged branches. This provides an exhaustive history but can be noisier.

bash
contox git-digest --mode all

Diff capture

For each commit, the digest captures:

  • SHA -- The full commit hash
  • Message -- The commit message
  • Author -- Who made the commit
  • Timestamp -- When the commit was made
  • Files changed -- List of modified files with change type (added, modified, deleted)
  • Diff stats -- Lines added and removed per file
  • Smart patches -- Actual code changes, truncated to 3000 characters per commit

The 3000-character diff limit per commit balances detail against token budget. The most significant changes are prioritized.

WIP evidence

The git digest also captures work-in-progress evidence:

  • Uncommitted changes -- Modified files that have not been committed
  • Staged changes -- Files staged for the next commit
  • Untracked files -- New files not yet added to git

This WIP evidence provides additional context about what the developer was working on during the session.

Output structure

json
{
  "commits": [
    {
      "sha": "abc123def456",
      "message": "feat: add JWT authentication middleware",
      "author": "Jane Doe",
      "timestamp": "2025-01-20T14:00:00Z",
      "files": [
        { "path": "src/middleware/auth.ts", "status": "added", "additions": 45, "deletions": 0 },
        { "path": "src/lib/jwt.ts", "status": "added", "additions": 32, "deletions": 0 }
      ],
      "patch": "diff --git a/src/middleware/auth.ts b/src/middleware/auth.ts\n..."
    }
  ],
  "headSha": "abc123def456",
  "baseSha": "789ghi012jkl",
  "wipEvidence": {
    "modified": ["src/routes/api.ts"],
    "staged": [],
    "untracked": ["src/utils/helpers.ts"]
  }
}

Next steps