AI coding tools give every engineer the same productivity boost — write more code, faster. This produces diminishing returns because the bottleneck was never typing speed.
The real bottleneck is everything around the code: translating requirements into tasks, enforcing quality across iterations, detecting regressions, deciding when to ship. Today, humans do all of that manually. AI writes a function, a human reviews it. AI fixes a bug, a human verifies it. The agent is fast, but the human is still the loop.
The shift that matters is moving from "AI writes code, humans verify" to "humans specify intent and constraints, the system handles the rest." The engineer who can define better constraints and build better observability gets exponentially more from AI. Everyone else is left competing on typing speed.
Open Forge is infrastructure for that shift. An agent-first harness engineering system for fully autonomous software development — not a developer tool with AI features, but an autonomous system where agents are the primary actors.
You describe what you want built. The system plans, implements, tests, reviews, and ships working code. It self-corrects through quality gates, enforces cost and safety constraints via telemetry, and closes the feedback loop with statistical evaluation of agent outputs — enabling autonomous decisions about model changes, prompt tuning, and regression detection.
The differentiator is not the agents. It's the harness: constraints that prevent waste, observability that makes agent behavior measurable, and evaluation that makes output quality a data-driven decision instead of a human judgment call.
Early development. The architecture is defined and core packages are scaffolded, but most implementations are in progress. Contributions and feedback are welcome — expect breaking changes.
| Package | Status |
|---|---|
@open-forge/pipeline |
Core types, roadmap generation, handoff management, and helper utilities implemented. Orchestrator in progress. |
@open-forge/mcp |
Functional — MCP server with spec tools, prompts, and change tracking. |
@open-forge/telemetry |
Type definitions exported. Implementations (event emitting, constraint evaluation) not yet started. |
@open-forge/evaluations |
Type definitions exported. Implementations planned — blocked on telemetry for §3.6/§3.7 integration. |
| Package | Description |
|---|---|
@open-forge/pipeline |
An OpenCode plugin that turns natural language requirements into a fully built application — autonomously, with minimal human intervention |
@open-forge/mcp |
MCP server implementation for OpenSpec archives |
@open-forge/telemetry |
Telemetry and constraint evaluation for open-forge pipeline stages |
@open-forge/evaluations |
Evaluation framework for measuring agent output quality, regression detection, and statistical analysis |
Open Forge is an autonomous software development pipeline. You give it a description of what you want built, and it runs a multi-phase agent workflow to produce working code with quality enforcement at every step.
Requirements (text)
│
▼
┌─────────────┐
│ Intent Router│ ── Classifies: new-project / feature / bug-fix / refactor / migration
└──────┬──────┘
│
▼
┌──────────────┐
│ ROADMAP Gen │ ── Produces phased, dependency-ordered task plan
└──────┬───────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Pipeline Orchestrator (per phase) │
│ │
│ 1. Read HANDOFF.md (restore context) │
│ 2. Generate & select plan (critic agent) │
│ 3. Write tests (test author — firewalled) │
│ 4. Implement (implementer — firewalled) │
│ 5. Quality gates (verify → QA → security → │
│ architect → code review → integration) │
│ 6. Cleanup agent (entropy management) │
│ 7. Update HANDOFF.md (persist context) │
│ │
│ On failure: retry up to 3x → drift sentinel │
│ On success: advance to next phase │
├──────────────────────────────────────────────────┤
│ ⚡ Telemetry (@open-forge/telemetry) │
│ │
│ • Captures structured events from every stage │
│ • Evaluates constraints (token budget, retries, │
│ duration limits) after each stage │
│ • Halts pipeline on constraint violations │
│ • Generates audit trails for human review │
└──────────────────────────────────────────────────┘
│
▼
ROADMAP_COMPLETE (or PIPELINE-ISSUES.md for blockers)
│
▼
┌──────────────────────────────────────────────────┐
│ 📊 Evaluations (@open-forge/evaluations) │
│ │
│ • Runs datasets against agent outputs │
│ • Deterministic, LLM-judge, and custom evals │
│ • Statistical analysis (confidence intervals, │
│ significance testing, regression detection) │
│ • Baseline vs candidate comparison reports │
│ • Consumes telemetry events + constraint results │
│ • CLI and programmatic API │
└──────────────────────────────────────────────────┘
- Context firewalls — The test-writing agent and the implementing agent are deliberately isolated from each other to prevent hallucination reinforcement
- Self-correction loops — When quality gates fail, the system retries up to 3x before escalating to a human via
PIPELINE-ISSUES.md - Drift detection — If an agent gets stuck in a loop, a sentinel file is written and execution halts rather than burning tokens
- Constraint enforcement — Telemetry evaluates token budgets, retry limits, and duration caps after every stage, providing a hard stop before costs spiral
- Lessons feedback — Recurring mistakes from gate failures are captured in
LESSONS.mdand fed to future agents - Evaluation-driven quality — Post-pipeline evaluation measures agent output quality with statistical rigor, enabling regression detection and data-driven decisions on model/prompt changes
For the full design — including context firewall rules, gate sequences, and guardrails — see the pipeline REQUIREMENTS.md. For telemetry constraint definitions, see the telemetry requirements. For evaluation framework details, see the evaluations requirements.
# Install dependencies
npm install
# Build all packages
npx nx run-many -t build
# Run tests
npx nx run-many -t test
# Lint all projects
npx nx run-many -t lint
# Visualize the project graph
npx nx graphnpx nx graph # Interactive dependency graph
npx nx show project <name> --web # View project details
npx nx run-many -t build # Build all projects
npx nx run-many -t test --parallel=3 # Test in parallel
npx nx affected -t build # Build only affected projects
npx nx release --dry-run # Preview release changes
npx nx release # Create a new releaseThis workspace is connected to Nx Cloud for remote caching, distributed task execution, and self-healing CI.