Architecture Deep Dive

How Contox Works

A deep dive into the memory layer that powers AI-native development. From capture to enrichment, from brain to action.

Scroll to explore

Capture Pipeline

From Code to Context

Every commit and file save is captured in real-time by the VS Code extension, signed, and stored as structured session data.

VS Code Extension

Git watcher + file save listener

onDidChange + polling

Git Watch

Commit extraction

HMAC-SHA256 signed

Ingest API

POST /api/v2/ingest

Upsert session

Sessions

Reuse if < 4h old

Enrichment Pipeline

Raw Data to Structured Memory

A multi-stage LLM pipeline transforms raw commits into structured, searchable memory items with embeddings for semantic retrieval.

Job Runner

Consumes pending jobs

enrich

LLM Processing

3-model cascade

deep_enrich → embed

Memory Items

Structured knowledge

post-hooks

Post-Hooks

Zero LLM cost

embed

Embeddings

Cosine similarity search

Queue Worker

Reliable Job Processing

A Redis-backed priority queue with exponential backoff ensures every enrichment job completes — even under failure.

Bridge

Appwrite → BullMQ

enqueue

Redis Queue

Priority queue

process

BullMQ Worker

Concurrency: 10

sync

Appwrite Sync

Source of truth

Priority Levels

Retry Attempts

Concurrent Jobs

60s

Exp. Backoff

Brain System

Hierarchical Project Knowledge

All memory is organized in a tree structure, from high-level architecture decisions down to individual bug fixes, each with confidence scores and semantic links.

Project Brain

Root context · Always loaded

6 categories·Confidence scoring·Semantic links·Auto-staleness detection

MCP Integration

One Protocol, Every AI Tool

The Model Context Protocol (MCP) enables any AI agent to read and write project memory through a standardized interface. One integration, universal access.

MCP Server

8 tools · Universal protocol

Cursor

Copilot

Windsurf

Claude Code

Cline

CLI

REST API

8 MCP tools·CLI + REST API·Any AI agent

Genesis Scanner

Deep Codebase Analysis

Genesis scans your entire codebase in 7 analysis layers, extracting architecture patterns, security findings, and conventions in a single pass.

Codebase

Local or GitHub repo

walk

File Walker

Smart file discovery

chunk

Chunk Builder

Analyzable segments

analyze

LLM Analysis

7 parallel layers

7 Analysis Layers

ProductNext.js 14 App Router

ArchitectureAppwrite BaaS pattern

DependenciesZustand client state

ConventionsStrict TypeScript config

Data ModelRollup usage tracking

Entry PointsSSE realtime events

SecurityHMAC-SHA256 verification

Security Pipeline

12-Phase Security Scanner

Full-stack SAST, SCA, secret detection, taint analysis, license compliance, malware detection, AI/LLM security, and SBOM generation. 10 of 12 phases are zero-LLM cost.

12Phases

150+Secret Patterns

1,114SAST Rules

74+IaC Rules

10/12Zero-LLM

Scan & Detect

Phases 1-7

analyze

Deep Analysis

Phases 8-10

filter

Dedup & Feedback

Phase 11

emit

Security Center

Phase 12

12-Phase Execution Pipeline

Detection

Phases 1-7 - Zero LLM cost

Fetch Repo

GitHub tarball download, file scoring by security relevance + repo posture check (branch protection, CODEOWNERS)

Secret Scan

150+ regex patterns + Shannon entropy (hex > 3.0, base64 > 4.2)

Secret Verify

Cross-ref against false positives: UUIDs, hashes, test fixtures

Git History

Compare API diffs to find secrets added then removed in commits

Dependency SCA

OSV.dev CVEs, EPSS + CISA KEV enrichment, typosquat detection, license compliance (AGPL/GPL/unknown)

Config Audit

74+ rules: Docker, K8s, Terraform, GitHub Actions, CI/CD env isolation, CORS, headers

SAST + Routes

1,114 SAST rules (11 languages + OWASP LLM Top 10 + malware/obfuscation) + route analysis

Analysis

Phases 8-10 - Deep inspection

Attack SurfaceLLM

LLM maps endpoints + auth boundaries + shadow API detection vs OpenAPI/Swagger spec

Taint Analysis

3-pass tracking: tainted sources -> propagation -> dangerous sinks

Deep OWASPLLM

5x focused LLM passes on OWASP Top 10, cross-validated with SAST

Output

Phases 11-12 - Report & score

Dedup & Feedback

5-pass filter: slug dedup, exclusion, user feedback, cross-source merge

Assemble + SBOM

Weighted score, CycloneDX 1.5 SBOM, critical alert auto-creation

Key Capabilities

Secret Detection

150+ patterns

Regex patterns + Shannon entropy with charset-specific thresholds for unknown secrets.

AWS, GCP, Azure, Stripe, GitHub tokens

Shannon entropy: hex > 3.0, base64 > 4.2

Context-aware: skips hashes, boosts credentials

SAST + Route Analysis

1,114 rules

1,114 deterministic regex rules across 11 languages with zero LLM cost.

JS/TS, Python, Java, Go, C/C++, PHP, Ruby, C#, Swift, Dart, Rust

Injection, XSS, SSRF, crypto, auth, deserialization, XXE

Route checks: auth, rate limit, validation, CSRF, headers

Dependency SCA

EPSS + KEV

OSV.dev CVE lookup enriched with real-world exploit data and license compliance.

EPSS probability: chance of exploit in 30 days

CISA KEV: confirmed active exploitation

License compliance: AGPL, GPL, unknown deps flagged

AI/LLM Security

OWASP LLM Top 10

Dedicated SAST rules targeting AI-specific attack vectors in LLM-powered applications.

Prompt injection: user input in system prompts

PII in RAG pipelines: unsanitized vector DB inserts

Uncontrolled tool calls, insecure LLM output rendering

Malware & Obfuscation

7 signatures

Detects intentional malicious patterns that vulnerability scanners miss.

eval(atob(...)) and hex-decoded payload execution

JavaScript obfuscator output (_0x variable pattern)

Crypto miners, PHP web shells, malicious npm scripts

License Compliance

AGPL / GPL / BUSL

Scans npm production deps for restrictive or unknown licenses that create legal risk.

AGPL-3.0: network copyleft triggers on any SaaS use

GPL-2/3: forces open-sourcing of derivative works

UNLICENSED / unknown: legal grey area by default

IaC Config Audit

74+ rules

Infrastructure-as-code security across Docker, K8s, Terraform, and CI/CD.

Docker: root user, missing healthcheck, ADD urls

K8s: privileged pods, wildcard RBAC, no probes

GitHub Actions: env dump, workflow_dispatch injection, secret scope leak

Taint Analysis

12 sources, 8 sinks

Lightweight source-to-sink data flow tracking for web applications.

Sources: req.query, req.body, searchParams, cookies

Sinks: exec, eval, query, innerHTML, redirect

Tracks 1-level variable propagation

SBOM Generation

CycloneDX 1.5

Full software bill of materials with vulnerability cross-references.

Every dependency with version and license

CVE references linked per component

Downloadable JSON via API endpoint

5-Pass DedupSlug, semantic, hard exclusion, feedback, cross-source merge

Feedback LoopArchive/boost signals train per-project suppression profiles

Critical AlertingAuto-creates problem docs for high/critical findings

Repo PostureBranch protection, force push policy, CODEOWNERS via GitHub API

Transparency

What We Don't Cover

Contox is a static analysis platform. We read your code via GitHub API without deploying it. These are capabilities that require running your application or specialized tooling.

No DAST

We do not deploy or run your application. No HTTP fuzzing, no runtime endpoint testing, no live vulnerability exploitation.

Static route analysis detects missing auth, rate limits, input validation, and CSRF without deployment.

No Container Image Scanning

We audit Dockerfiles for misconfigurations but do not scan built container images for OS-level vulnerabilities.

Config audit catches root user, missing healthchecks, insecure base images, and exposed ports in Dockerfiles.

No IAST

No runtime instrumentation or agent-based monitoring. We cannot observe your application behavior in production.

Taint analysis tracks data flow statically from user input sources to dangerous sinks across your codebase.

No License Compliance

SBOM lists all components with versions but does not score license risks (GPL, AGPL copyleft propagation).

CycloneDX 1.5 SBOM includes license fields. Export and use with dedicated compliance tools like FOSSA or Snyk.

JS/TS Focus

SAST rules and taint analysis target JavaScript and TypeScript only. Other languages get basic config rules.

IaC config audit covers Docker, Kubernetes, Terraform, and GitHub Actions regardless of application language.

No Penetration Testing

Static analysis only. No active exploitation attempts, no manual security testing, no red team simulation.

Deep OWASP analysis uses 5 focused LLM passes cross-validated against deterministic SAST results.

Ask AI

Question Your Codebase

Ask natural language questions about your project. Semantic search finds relevant memory items, then an LLM synthesizes a cited answer.

Question

Natural language query

embed query

Semantic Search

Embeddings-based retrieval

top-k results

LLM Synthesis

Gemini 2 Flash

cited response

Cited Answer

Markdown with [Source N]

Ask AI

You

How does the auth middleware work?

The auth middleware uses Appwrite server SDK to verify session cookies. It checks for a valid session token in the request headers and extracts the user ID for downstream handlers. [Source 1] [Source 2]

[1]src/middleware.ts

[2]src/lib/appwrite-server.ts

Built by a Solo Developer

This Is What One Developer Can Build

Full-stack product from capture to enrichment, from real-time sync to semantic search. Built with Next.js, TypeScript, Appwrite, and AI. Designed for scale from day one.

50+

API Endpoints

MCP Tools

Genesis Layers

LLM Models

Get Started View on GitHub Contact