Skip to content
AI Stack

Built for Code Intelligence

Fine-tuned models trained on real engineering workflows. Combined with a model-agnostic architecture that lets you bring your own keys, your own providers, your own rules.

2 proprietary models5 external providers200+ models via OpenRouterONNX edge inference
Proprietary Models

Fine-tuned for engineering

Not generic LLMs. Models trained specifically on engineering workflows, code conventions, and security patterns.

contox-deep

Code Analysis
14B

What changed: Dataset doubled with real GitHub code reviews and PR discussions.

Next generation. 2x the training data with real GitHub code reviews, enriched task diversity, and improved architecture understanding.

Qwen2.5-Coder-14B-Instruct
LoRA r=64, alpha=128
275M trainable / 8.4B total

Capabilities

Convention validation against project brain
Brain memory enrichment from code changes
Architecture pattern detection
Deep code review with context awareness

Model Quality

Train loss

0.186

Eval loss

0.273

Loss ratio

1.5x

Convergence

Stable

Training Dataset

Built from real GitHub code reviews and PR discussions, combined with the original v2 brain workflow data.

Samples

10,226

Format

ChatML (system/user/assistant)

Source

GitHub reviews + brain workflows

New

Real-world code review patterns

Training

QLoRA (4-bit quantized base + LoRA adapters)

GPU

NVIDIA A100 80GB

Epochs

2

Learning rate

1.2e-4 (cosine decay)

Batch size

16 (effective)

Throughput

1.65 samples/sec

Training time

~3h26

Target modules

q/k/v/o/gate/up/down_proj

Inference

Quantization

4-bit bitsandbytes

GPU

NVIDIA L4 24GB

Max context

4,096 tokens

Serving

vLLM (OpenAI-compat API)

contox-security

Security Analysis
32B

What changed: Balanced dataset (50% vuln / 50% clean) to reduce false positive rate.

Balanced 50/50 dataset (vulnerable + clean code) to reduce false positives. Same base model as v1, better precision on real-world codebases.

Qwen2.5-Coder-32B-Instruct
LoRA r=64, alpha=128
32B parameters

Capabilities

OWASP Top 10 vulnerability detection
CVE pattern matching and analysis
Dependency security audit
Security score computation per module

Model Quality

Eval loss

0.458

Dataset

20K balanced

Balance

50/50 vuln/clean

vs v1

Fewer false positives

Training Dataset

Balanced dataset with equal vulnerable and clean code samples to teach the model when code is safe, not just when it is vulnerable.

Samples

20,000

Balance

50% vulnerable / 50% clean

Sources

CVE DB + OWASP + real repos

Format

Code + label (vuln or safe)

Training

LoRA (bf16 full-precision base + LoRA adapters)

GPU

NVIDIA H200 143GB

Precision

bf16 (full precision)

Target modules

q/k/v/o/gate/up/down_proj

Adapter size

~2.1 GB (safetensors)

Inference

Quantization

4-bit bitsandbytes

GPU

NVIDIA L4 24GB

Max context

4,096 tokens

Serving

vLLM (OpenAI-compat API)

contox-index

Intent Classification
184M

Not every task needs an LLM. Contox Index uses a DeBERTa encoder to classify developer intent and rank context relevance. Zero token cost, no external API calls, runs on CPU.

DeBERTa-v3-base
Full fine-tune (fp32)
184M trainable / 184M total

Capabilities

Intent classification (13 engineering categories)
Cross-encoder relevance scoring for context ranking
Silent keyword fallback if Index is unavailable
Powers the Agent Hub context routing layer

Model Quality

Accuracy

85.6%

F1 Macro

0.838

F1 Weighted

0.857

Eval loss

0.585

Training Dataset

Curated dataset of developer messages classified into 13 engineering categories. Balanced with class-weighted training.

Train samples

31,631

Eval samples

3,515

Categories

13 intent classes

Source

Synthetic + curated workflows

Training

Full fine-tune (fp32, class-weighted CrossEntropy)

GPU

NVIDIA RTX 3080 10GB

Epochs

5

Learning rate

2e-5 (cosine decay)

Batch size

32 (effective)

Training time

~1h30

Best category

code_review (F1: 0.96)

Inference

Runtime

ONNX (CPU)

Latency

< 2 seconds

Cost

$0 / request

Timeout

2s hard limit

Pipeline

How our models work

Your code flows through a layered intelligence pipeline. Each model handles what it does best.

Your Code

Repo, PRs, commits

Contox Intelligence

Contox Index

Intent classification + context ranking

contox-deep-v2

Deep code analysis + brain enrichment

contox-security-v1

Vulnerability detection + security scoring

Intelligence Output

Enriched Brain

Security Reports

Health Scores

Convention Validation

Model-Agnostic

Your provider, your rules

Contox works with any major AI provider. Bring your own API keys, configure fallback chains, switch providers without changing a line of code.

G

Google Gemini

gemini-2.0-flash

1.1M tokens

O

OpenAI

gpt-4.1-mini

1M tokens

C

Anthropic Claude

claude-sonnet-4

200K tokens

M

Mistral AI

mistral-small-latest

131K tokens

R

OpenRouter

200+ models

Variable

Bring Your Own Keys

Use your own API keys. Your tokens, your billing, your choice of provider.

Automatic Fallback

Configure a fallback chain. If one provider fails, the next takes over seamlessly.

Team Configuration

Each team sets their own provider, model, and fallback order. No lock-in.

Benchmarks

Numbers, not marketing

Real metrics from our training pipeline. contox-deep-v2 training details.

Training Config

Training Samples5,420
Epochs3
Learning Rate1.5e-4
Throughput5.06 samples/sec
Trainable Params275M / 8.4B
Training Time~53 min

Inference Setup

Quantization4-bit (bitsandbytes)
GPUNVIDIA L4 (24 GB)
Max Context4,096 tokens
GPU Memory Usage~90%

LoRA Configuration

Rank (r)

64

Alpha

128

Model Quality

Train Loss0.0175

Converged, no overfitting signal

Eval Loss0.091

Healthy generalization gap

Loss Ratio5.2x

Eval/Train ratio within norms

QLoRA Fine-tuning

Parameter-efficient training. Only 3.3% of total parameters are updated, keeping the base model capabilities intact while specializing for code analysis.