Skip to content
Architecture Deep Dive

How Contox Works

A deep dive into the memory layer that powers AI-native development. From capture to enrichment, from brain to action.

Scroll to explore
Capture Pipeline

From Code to Context

Every commit and file save is captured in real-time by the VS Code extension, signed, and stored as structured session data.

VS Code Extension

Git watcher + file save listener

onDidChange + polling

Git Watch

Commit extraction

HMAC-SHA256 signed

Ingest API

POST /api/v2/ingest

Upsert session

Sessions

Reuse if < 4h old

Enrichment Pipeline

Raw Data to Structured Memory

A multi-stage LLM pipeline transforms raw commits into structured, searchable memory items with embeddings for semantic retrieval.

Job Runner

Consumes pending jobs

enrich

LLM Processing

3-model cascade

deep_enrich → embed

Memory Items

Structured knowledge

post-hooks

Post-Hooks

Zero LLM cost

embed

Embeddings

Cosine similarity search

Queue Worker

Reliable Job Processing

A Redis-backed priority queue with exponential backoff ensures every enrichment job completes — even under failure.

Bridge

Appwrite → BullMQ

enqueue

Redis Queue

Priority queue

process

BullMQ Worker

Concurrency: 10

sync

Appwrite Sync

Source of truth

7

Priority Levels

5

Retry Attempts

10

Concurrent Jobs

60s

Exp. Backoff

Brain System

Hierarchical Project Knowledge

All memory is organized in a tree structure, from high-level architecture decisions down to individual bug fixes, each with confidence scores and semantic links.

Project Brain

Root context · Always loaded

6 categories·Confidence scoring·Semantic links·Auto-staleness detection
MCP Integration

One Protocol, Every AI Tool

The Model Context Protocol (MCP) enables any AI agent to read and write project memory through a standardized interface. One integration, universal access.

MCP Server

8 tools · Universal protocol

Cursor
Copilot
Windsurf
Claude Code
Cline
CLI
REST API
8 MCP tools·CLI + REST API·Any AI agent
Genesis Scanner

Deep Codebase Analysis

Genesis scans your entire codebase in 7 analysis layers, extracting architecture patterns, security findings, and conventions in a single pass.

Codebase

Local or GitHub repo

walk

File Walker

Smart file discovery

chunk

Chunk Builder

Analyzable segments

analyze

LLM Analysis

7 parallel layers

7 Analysis Layers
ProductNext.js 14 App Router
ArchitectureAppwrite BaaS pattern
DependenciesZustand client state
ConventionsStrict TypeScript config
Data ModelRollup usage tracking
Entry PointsSSE realtime events
SecurityHMAC-SHA256 verification
Security Pipeline

12-Phase Security Scanner

Full-stack SAST, SCA, secret detection, taint analysis, license compliance, malware detection, AI/LLM security, and SBOM generation. 10 of 12 phases are zero-LLM cost.

12Phases
150+Secret Patterns
1,114SAST Rules
74+IaC Rules
10/12Zero-LLM

Scan & Detect

Phases 1-7

analyze

Deep Analysis

Phases 8-10

filter

Dedup & Feedback

Phase 11

emit

Security Center

Phase 12

12-Phase Execution Pipeline
Detection
Phases 1-7 - Zero LLM cost
01
Fetch Repo

GitHub tarball download, file scoring by security relevance + repo posture check (branch protection, CODEOWNERS)

02
Secret Scan

150+ regex patterns + Shannon entropy (hex > 3.0, base64 > 4.2)

03
Secret Verify

Cross-ref against false positives: UUIDs, hashes, test fixtures

04
Git History

Compare API diffs to find secrets added then removed in commits

05
Dependency SCA

OSV.dev CVEs, EPSS + CISA KEV enrichment, typosquat detection, license compliance (AGPL/GPL/unknown)

06
Config Audit

74+ rules: Docker, K8s, Terraform, GitHub Actions, CI/CD env isolation, CORS, headers

07
SAST + Routes

1,114 SAST rules (11 languages + OWASP LLM Top 10 + malware/obfuscation) + route analysis

Analysis
Phases 8-10 - Deep inspection
08
Attack SurfaceLLM

LLM maps endpoints + auth boundaries + shadow API detection vs OpenAPI/Swagger spec

09
Taint Analysis

3-pass tracking: tainted sources -> propagation -> dangerous sinks

10
Deep OWASPLLM

5x focused LLM passes on OWASP Top 10, cross-validated with SAST

Output
Phases 11-12 - Report & score
11
Dedup & Feedback

5-pass filter: slug dedup, exclusion, user feedback, cross-source merge

12
Assemble + SBOM

Weighted score, CycloneDX 1.5 SBOM, critical alert auto-creation

Key Capabilities
Secret Detection
150+ patterns

Regex patterns + Shannon entropy with charset-specific thresholds for unknown secrets.

AWS, GCP, Azure, Stripe, GitHub tokens
Shannon entropy: hex > 3.0, base64 > 4.2
Context-aware: skips hashes, boosts credentials
SAST + Route Analysis
1,114 rules

1,114 deterministic regex rules across 11 languages with zero LLM cost.

JS/TS, Python, Java, Go, C/C++, PHP, Ruby, C#, Swift, Dart, Rust
Injection, XSS, SSRF, crypto, auth, deserialization, XXE
Route checks: auth, rate limit, validation, CSRF, headers
Dependency SCA
EPSS + KEV

OSV.dev CVE lookup enriched with real-world exploit data and license compliance.

EPSS probability: chance of exploit in 30 days
CISA KEV: confirmed active exploitation
License compliance: AGPL, GPL, unknown deps flagged
AI/LLM Security
OWASP LLM Top 10

Dedicated SAST rules targeting AI-specific attack vectors in LLM-powered applications.

Prompt injection: user input in system prompts
PII in RAG pipelines: unsanitized vector DB inserts
Uncontrolled tool calls, insecure LLM output rendering
Malware & Obfuscation
7 signatures

Detects intentional malicious patterns that vulnerability scanners miss.

eval(atob(...)) and hex-decoded payload execution
JavaScript obfuscator output (_0x variable pattern)
Crypto miners, PHP web shells, malicious npm scripts
License Compliance
AGPL / GPL / BUSL

Scans npm production deps for restrictive or unknown licenses that create legal risk.

AGPL-3.0: network copyleft triggers on any SaaS use
GPL-2/3: forces open-sourcing of derivative works
UNLICENSED / unknown: legal grey area by default
IaC Config Audit
74+ rules

Infrastructure-as-code security across Docker, K8s, Terraform, and CI/CD.

Docker: root user, missing healthcheck, ADD urls
K8s: privileged pods, wildcard RBAC, no probes
GitHub Actions: env dump, workflow_dispatch injection, secret scope leak
Taint Analysis
12 sources, 8 sinks

Lightweight source-to-sink data flow tracking for web applications.

Sources: req.query, req.body, searchParams, cookies
Sinks: exec, eval, query, innerHTML, redirect
Tracks 1-level variable propagation
SBOM Generation
CycloneDX 1.5

Full software bill of materials with vulnerability cross-references.

Every dependency with version and license
CVE references linked per component
Downloadable JSON via API endpoint
5-Pass DedupSlug, semantic, hard exclusion, feedback, cross-source merge
Feedback LoopArchive/boost signals train per-project suppression profiles
Critical AlertingAuto-creates problem docs for high/critical findings
Repo PostureBranch protection, force push policy, CODEOWNERS via GitHub API
Transparency

What We Don't Cover

Contox is a static analysis platform. We read your code via GitHub API without deploying it. These are capabilities that require running your application or specialized tooling.

No DAST

We do not deploy or run your application. No HTTP fuzzing, no runtime endpoint testing, no live vulnerability exploitation.

Static route analysis detects missing auth, rate limits, input validation, and CSRF without deployment.
No Container Image Scanning

We audit Dockerfiles for misconfigurations but do not scan built container images for OS-level vulnerabilities.

Config audit catches root user, missing healthchecks, insecure base images, and exposed ports in Dockerfiles.
No IAST

No runtime instrumentation or agent-based monitoring. We cannot observe your application behavior in production.

Taint analysis tracks data flow statically from user input sources to dangerous sinks across your codebase.
No License Compliance

SBOM lists all components with versions but does not score license risks (GPL, AGPL copyleft propagation).

CycloneDX 1.5 SBOM includes license fields. Export and use with dedicated compliance tools like FOSSA or Snyk.
JS/TS Focus

SAST rules and taint analysis target JavaScript and TypeScript only. Other languages get basic config rules.

IaC config audit covers Docker, Kubernetes, Terraform, and GitHub Actions regardless of application language.
No Penetration Testing

Static analysis only. No active exploitation attempts, no manual security testing, no red team simulation.

Deep OWASP analysis uses 5 focused LLM passes cross-validated against deterministic SAST results.
Ask AI

Question Your Codebase

Ask natural language questions about your project. Semantic search finds relevant memory items, then an LLM synthesizes a cited answer.

Question

Natural language query

embed query

Semantic Search

Embeddings-based retrieval

top-k results

LLM Synthesis

Gemini 2 Flash

cited response

Cited Answer

Markdown with [Source N]

Ask AI
You

How does the auth middleware work?

The auth middleware uses Appwrite server SDK to verify session cookies. It checks for a valid session token in the request headers and extracts the user ID for downstream handlers. [Source 1] [Source 2]

[1]src/middleware.ts
[2]src/lib/appwrite-server.ts
Built by a Solo Developer

This Is What One Developer Can Build

Full-stack product from capture to enrichment, from real-time sync to semantic search. Built with Next.js, TypeScript, Appwrite, and AI. Designed for scale from day one.

50+

API Endpoints

8

MCP Tools

7

Genesis Layers

3

LLM Models