AI Stack

Built for Code Intelligence

Fine-tuned models trained on real engineering workflows. Combined with a model-agnostic architecture that lets you bring your own keys, your own providers, your own rules.

2 proprietary models5 external providers200+ models via OpenRouterONNX edge inference

Proprietary Models

Fine-tuned for engineering

Not generic LLMs. Models trained specifically on engineering workflows, code conventions, and security patterns.

contox-deep

Code Analysis

14B

What changed: Dataset doubled with real GitHub code reviews and PR discussions.

Next generation. 2x the training data with real GitHub code reviews, enriched task diversity, and improved architecture understanding.

Qwen2.5-Coder-14B-Instruct

LoRA r=64, alpha=128

275M trainable / 8.4B total

Capabilities

Convention validation against project brain

Brain memory enrichment from code changes

Architecture pattern detection

Deep code review with context awareness

Model Quality

Train loss

0.186

Eval loss

0.273

Loss ratio

1.5x

Convergence

Stable

Training Dataset

Built from real GitHub code reviews and PR discussions, combined with the original v2 brain workflow data.

Samples

10,226

Format

ChatML (system/user/assistant)

Source

GitHub reviews + brain workflows

New

Real-world code review patterns

Training

QLoRA (4-bit quantized base + LoRA adapters)

GPU

NVIDIA A100 80GB

Epochs

Learning rate

1.2e-4 (cosine decay)

Batch size

16 (effective)

Throughput

1.65 samples/sec

Training time

~3h26

Target modules

q/k/v/o/gate/up/down_proj

Inference

Quantization

4-bit bitsandbytes

GPU

NVIDIA L4 24GB

Max context

4,096 tokens

Serving

vLLM (OpenAI-compat API)

contox-security

Security Analysis

32B

What changed: Balanced dataset (50% vuln / 50% clean) to reduce false positive rate.

Balanced 50/50 dataset (vulnerable + clean code) to reduce false positives. Same base model as v1, better precision on real-world codebases.

Qwen2.5-Coder-32B-Instruct

LoRA r=64, alpha=128

32B parameters

Capabilities

OWASP Top 10 vulnerability detection

CVE pattern matching and analysis

Dependency security audit

Security score computation per module

Model Quality

Eval loss

0.458

Dataset

20K balanced

Balance

50/50 vuln/clean

vs v1

Fewer false positives

Training Dataset

Balanced dataset with equal vulnerable and clean code samples to teach the model when code is safe, not just when it is vulnerable.

Samples

20,000

Balance

50% vulnerable / 50% clean

Sources

CVE DB + OWASP + real repos

Format

Code + label (vuln or safe)

Training

LoRA (bf16 full-precision base + LoRA adapters)

GPU

NVIDIA H200 143GB

Precision

bf16 (full precision)

Target modules

q/k/v/o/gate/up/down_proj

Adapter size

~2.1 GB (safetensors)

Inference

Quantization

4-bit bitsandbytes

GPU

NVIDIA L4 24GB

Max context

4,096 tokens

Serving

vLLM (OpenAI-compat API)

contox-index

Intent Classification

184M

Not every task needs an LLM. Contox Index uses a DeBERTa encoder to classify developer intent and rank context relevance. Zero token cost, no external API calls, runs on CPU.

DeBERTa-v3-base

Full fine-tune (fp32)

184M trainable / 184M total

Capabilities

Intent classification (13 engineering categories)

Cross-encoder relevance scoring for context ranking

Silent keyword fallback if Index is unavailable

Powers the Agent Hub context routing layer

Model Quality

Accuracy

85.6%

F1 Macro

0.838

F1 Weighted

0.857

Eval loss

0.585

Training Dataset

Curated dataset of developer messages classified into 13 engineering categories. Balanced with class-weighted training.

Train samples

31,631

Eval samples

3,515

How our models work

Your code flows through a layered intelligence pipeline. Each model handles what it does best.

Your Code

Repo, PRs, commits

Contox Intelligence

Contox Index

Intent classification + context ranking

contox-deep-v2

Deep code analysis + brain enrichment

contox-security-v1

Vulnerability detection + security scoring

Intelligence Output

Enriched Brain

Security Reports

Health Scores

Convention Validation

Model-Agnostic

Your provider, your rules

Contox works with any major AI provider. Bring your own API keys, configure fallback chains, switch providers without changing a line of code.

Google Gemini

gemini-2.0-flash

1.1M tokens

OpenAI

gpt-4.1-mini

1M tokens

Anthropic Claude

claude-sonnet-4

200K tokens

Mistral AI

mistral-small-latest

131K tokens

OpenRouter

200+ models

Variable

Bring Your Own Keys

Use your own API keys. Your tokens, your billing, your choice of provider.

Automatic Fallback

Configure a fallback chain. If one provider fails, the next takes over seamlessly.

Team Configuration

Each team sets their own provider, model, and fallback order. No lock-in.

Benchmarks

Numbers, not marketing

Real metrics from our training pipeline. contox-deep-v2 training details.

Training Config

Training Samples5,420

Epochs3

Learning Rate1.5e-4

Throughput5.06 samples/sec

Trainable Params275M / 8.4B

Training Time~53 min

Inference Setup

Quantization4-bit (bitsandbytes)

GPUNVIDIA L4 (24 GB)

Max Context4,096 tokens

GPU Memory Usage~90%

LoRA Configuration

Rank (r)

Alpha

128

Model Quality

Train Loss0.0175

Converged, no overfitting signal

Eval Loss0.091

Healthy generalization gap

Loss Ratio5.2x

Eval/Train ratio within norms

QLoRA Fine-tuning

Parameter-efficient training. Only 3.3% of total parameters are updated, keeping the base model capabilities intact while specializing for code analysis.